DETAILED ACTION
DETAILED ACTION

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1- 20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
As to claims 1, 11 and 18, 
Step 2A, Prong One
	The claims recite in part:
pruning a layer of a neural network using a threshold, the neural network comprising multiple layers; and
repeatedly pruning of the layer of the neural network using a different threshold for each iteration of repeated pruning until a pruning error of the layer equals a pruning error allowance for the layer
Under the broadest reasonable interpretation, these limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

Accordingly, at Step 2A, Prong One, the claim is directed to an abstract idea.

Step 2A, Prong Two
The claim further recites “the pruning error of an iteration being based on an amount of error resulting from the iteration and a number of weights pruned during the iteration.” These elements are recited at a high-level of generality and amounts to no more than adding the words “apply it” to the judicial exception.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).   These limitations also amount to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).  The claim further recites a generic computer which is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)). 

Accordingly, at Step 2A, Prong Two, the additional elements individually or in combination do no integrate the judicial exception into a practical application.

Step 2B
The limitations “the pruning error of an iteration being based on an amount of error resulting from the iteration and a number of weights pruned during the iteration”  are recited at a high-level of generality and amounts to no more than adding the words “apply it” to the judicial exception.   These limitations also amount to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).  The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”).  The computer is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).  	The recitation of congestion or crowding of services amounts to generally linking the use of the judicial exception to a particular environment of field of use (See MPEP 2106.05(h)).  
Accordingly, at Step 2B the additional elements individually or in combination do not amount to significantly more than the judicial exception.

As to claims 2, 12, and 19, Under the broadest reasonable interpretation, the limitations “the pruning error is further determined as a quantity of a length of a vector of pruned weights of the layer for a current iteration minus a vector of weights of the layer for a previous iteration divided by a quantity of a total length of a vector of initial weights of the layer minus a length of the vector of pruned weights of the layer for the current iteration” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claims 3 and 20, Under the broadest reasonable interpretation, the limitations “for each layer of the neural network: selecting the pruning error allowance for the layer; selecting the threshold for the layer; and until a length of a vector of remaining non-zero weights to the total length of the vector of initial weights for the layer is within a range of a predetermined percentage for the layer, repeatedly: pruning the layer based on the threshold selected for the layer, determining the pruning error for the layer, and changing the threshold for the layer based the pruning error being less than the pruning error allowance for the layer until the pruning error is within a predetermined range of the pruning error allowance for the layer; and determining a percentage of the length of the vector of remaining non-zero weights to the total length of the vector of initial weights for the layer; and  changing the pruning error allowance for the layer until the percentage of the length of the vector of remaining non-zero weights to the total length of the vector of initial weights for the layer is within the range of the predetermined percentage for the layer” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.



As to claim 4, Under the broadest reasonable interpretation, the limitations “different types of layers of the neural network have different ranges for a percentage of the length of a vector of remaining non-zero weights to the total length of the vector of initial weights for the layer” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claims 5 and 12, Under the broadest reasonable interpretation, the limitations “repeatedly pruning the layer using a different pruning error allowance for at least one iteration until a percentage of pruned weights to initial weights for the layer is within a range of a predetermined percentage for the layer” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claim 6, Under the broadest reasonable interpretation, the limitations “pruning the layer comprises setting a weight to zero based on a magnitude of the weight being less than the threshold” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claim 7, Under the broadest reasonable interpretation, the limitations “pruning the layer comprises setting a weight to zero based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claim 8, Under the broadest reasonable interpretation, the limitations “pruning the layer comprises setting a weight to zero based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claim 9, Under the broadest reasonable interpretation, the limitations “predetermined scale factor comprises a standard deviation of weights of the layer” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claim 10, Under the broadest reasonable interpretation, the limitations “pruning the layer comprises setting a weight to zero based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claim 13, Under the broadest reasonable interpretation, the limitations “the threshold used for each iteration of pruning a first-type layer of the neural network is based on a predetermined percentage of non- zero weights to initial weights for the layer” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claim 14, Under the broadest reasonable interpretation, the limitations “wherein the threshold used for each iteration of pruning a second-type layer of the neural network comprises a fixed value threshold, the second-type layer being different from the first-type layer” are process steps that cover Mathematical Concepts.  If a claim, under its broadest reasonable interpretation, covers a mathematical concept, then it falls within the “Mathematical Concepts” grouping of abstract ideas.

As to claim 15, Under the broadest reasonable interpretation, the limitations “retraining the neural network using weights remaining after pruning first-type layers of the neural network and weights remaining after pruning second-type layers of the neural network” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.

As to claim 16, Under the broadest reasonable interpretation, the limitations “fixing weights of first type layers of the neural network; and repeatedly pruning second type layers of the neural network, and retraining the neural network using weights remaining after pruning second type layers” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.




As to claim 17, Under the broadest reasonable interpretation, the limitations “adjusting a dropout rate for the retraining in response to a pruning rate of the pruning” are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the human mind or with the aid of pencil and paper.  If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 5 – 7, 10, 	1``11, and 18 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Yu et al (US 2013/0138598).
As to claim 1, Yu et al teaches a method, comprising: 
pruning (paragraph [0022]... connection pruning) a layer of a neural network (paragraph [0003]... Deep Neural Network (DNN)) using a threshold (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold), the neural network comprising multiple layers (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural network);
repeatedly (paragraph [0024]... foregoing process is then repeated a number of times) pruning of the layer of the neural network using a different threshold for each iteration (paragraph [0030]...identify each interconnection associated with each layer of the fully-connected and initially trained DNN whose interconnection weight value does not exceed a first weight threshold (process action 200) ; paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208)) of repeated pruning until a pruning error (paragraph [0032]...for example 50%)  of the layer equals a pruning error allowance for the layer (paragraph [0032]...for example range between 20% and 80%) (paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208). In one implementation, the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used ; paragraph [0033]...the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold), the pruning error of an iteration being based on an amount of error resulting from the iteration and a number of weights pruned during the iteration (paragraph [0033]... Process actions 206 through 210 are then repeated a number of times to produce the trained DNN. To this end, it is determined if process actions 206 through 210 have been repeated a desired number of times (process action 212). If not, then actions 206 through 210 are repeated. This continues until it is determined the process has been repeated the desired number of times. In one implementation, the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold. In another implementation, process actions 206 through 210 are repeated a prescribed number of times (e.g., between 5 and 50 which is task dependent)).

As to claim 5, Yu et al teaches the method, further comprising repeatedly (paragraph [0033]...repeated) pruning (paragraph [0022]... connection pruning) the layer (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) using a different pruning error allowance (paragraph [0025]... In another implementation, the prescribed maximum number of interconnections ranges between 10% and 40% of all interconnections) for at least one iteration until a percentage of pruned weights to initial weights for the layer is within a range of predetermined percentage for the layer (paragraph [0027]...the desired number of times action 108 is repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold).

As to claim 6, Yu et al teaches the method, wherein pruning (paragraph [0022]... connection pruning) the layer (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) comprises setting a weight to zero based on a magnitude of the weight being less than the threshold (paragraph [0028]...the sparseness constraint in the continued training involves rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning). Note that only weights smaller than the minimum weight threshold are rounded down to zero--instead of those smaller than .theta.. This is because the weights may shrink and be suddenly removed, and it is desirable to keep the effect of this removal to minimum without sacrificing the degree of sparseness).


As to claim 7, Yu et al teaches a method, wherein pruning (paragraph [0022]... connection pruning) the layer (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) comprises setting a weight to zero (paragraph [0028]... rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning)) based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor (paragraph [0028]...theta).

As to claim 10, Yu et al teaches a method, further comprising iteratively (paragraph [0029]...next training iteration) pruning and retraining (paragraph [0029]...future training iteration) the neural network using the threshold (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold)  after the pruning error (paragraph [0032]...for example 50%)   of a currently pruned layer equals the pruning error allowance (paragraph [0032]...for example range between 20% and 80%) (paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208). In one implementation, the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used ; paragraph [0033]...the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold). 




Claim 11 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 






As to claim 3, Yu et al teaches a method, further comprising: 
repeating (paragraph [0033]...repeated), until a percentage of pruned weights for the layer is within a range for the layer (paragraph [0027]...the desired number of times action 108 is repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold), the repeating of the pruning of the layer using a different pruning error allowance (paragraph [0025]... In another implementation, the prescribed maximum number of interconnections ranges between 10% and 40% of all interconnections).

As to claim 2, Yu et al teaches a method, further comprising for each of the layers of the neural network:
initializing (paragraph [0003]...DNN is trained by initially training a fully interconnected DNN ; paragraph [0030]... initially trained DNN) the pruning error allowance (paragraph [0032]...for example range between 20% and 80%); 
initializing the threshold (paragraph [0030]...identify each interconnection associated with each layer of the fully-connected and initially trained DNN whose interconnection weight value does not exceed a first weight threshold (process action 200)); 
and repeating (paragraph [0033]...repeated), until a percentage of pruned weights for the layer is within a range for the layer (paragraph [0032]... the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used) ; (paragraph [0033]... the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold):
repeating (paragraph [0033]...repeated), until the pruning error (paragraph [0032]...for example 50%) reaches the pruning error allowance (paragraph [0032]...for example range between 20% and 80%): pruning the layer (paragraph [0022]... connection pruning) of the neural network using the threshold (paragraph [0033]... training threshold); 
calculating the pruning error (paragraph [0016]...error signal equation) for the pruned layer; 
comparing the pruning error with the pruning error allowance (paragraph [0032]... those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208)); and 
changing the threshold in response to the comparison (paragraph [0028]... continued training involves rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning)); 
calculating the percentage of pruned weights of the pruned layer; and changing the pruning error allowance in response to the percentage of pruned weights (paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208). In one implementation, the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used).

As to claim 3, Yu et al teaches a method, further comprising: 
repeating (paragraph [0033]...repeated), until a percentage of pruned weights for the layer is within a range for the layer (paragraph [0027]...the desired number of times action 108 is repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold), the repeating of the pruning of the layer using a different pruning error allowance (paragraph [0025]... In another implementation, the prescribed maximum number of interconnections ranges between 10% and 40% of all interconnections).

As to claim 4, Yu et al teaches a method, wherein different types of layers of the neural network have different ranges for the percentage of pruned weights (paragraph [0022]...this leads to a simple yet effective procedure for training a "sparse" DNN. Generally, a fully connected DNN is trained by sweeping through the full training set a number of times. Then, for the most part, only the interconnections whose weight magnitudes are in top q are considered in further training. Other interconnections are removed from the DNN. It is noted that the training is continued after pruning the interconnections because the log conditional probability value D is reduced due to connection pruning, especially when the degree of sparseness is high (i.e., q is small). However, the continued DNN training tends to converge much faster than the original).

As to claim 5, Yu et al teaches a method, wherein pruning (paragraph [0022]... connection pruning) the layer of the neural network (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural network) comprises setting a weight to zero if a magnitude of the weight is less than the threshold a layer of a neural network having multiple layers (paragraph [0028]...the sparseness constraint in the continued training involves rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning). Note that only weights smaller than the minimum weight threshold are rounded down to zero--instead of those smaller than .theta.. This is because the weights may shrink and be suddenly removed, and it is desirable to keep the effect of this removal to minimum without sacrificing the degree of sparseness)

As to claim 6, Yu et al teaches a method, wherein pruning (paragraph [0022]... connection pruning) the layer of the neural network (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) comprises setting a weight to zero (paragraph [0028]... rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning)) if a magnitude of the weight is less than the threshold scaled by a scale factor (paragraph [0028]...theta).

As to claim 10, Yu et al teaches a method, further comprising performing the repeating (paragraph [0033]...repeated) of the (paragraph [0022]... connection pruning) the layer of the neural network (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural network) with each layer (paragraph [0027]...interconnection weights associated with the each hidden layer) of the neural network.

As to claim 11, Yu et al teaches a method, further comprising iteratively (paragraph [0029]...next training iteration) pruning and retraining (paragraph [0029]...future training iteration) the neural network using the threshold (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold) after the pruning error (paragraph [0032]...for example 50%)  of the pruned layer reached the pruning error allowance (paragraph [0032]...for example range between 20% and 80%) (paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208). In one implementation, the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used ; paragraph [0033]...the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold). 

As to claim 12, Yu et al teaches a method, comprising: 
repeating (paragraph [0033]...repeated):
pruning (paragraph [0022]... connection pruning) a plurality of layers of a neural network (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural network)  using automatically (paragraph [0034]...To speed up the calculation, in one implementation, the indexes and actual weights are stored in adjacent groups so that they can be retrieved efficiently with good locality. A slightly different but almost equally efficient data structure implementation, pairs of indexes and weights are grouped. With the proposed data structure, each column can be multiplied with the input vector in parallel. To further speed up the calculation, parallelization can also be exploited within each column) determined thresholds (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold); and
retraining (paragraph [0028]...continued training ; paragraph [0029]...future training iteration) the neural network using only weights remaining after pruning (paragraph [0031]...the value of each of these identified interconnections is then set to zero (process action 202), and the interconnection weight value of the remaining non-zero valued interconnections having the smallest value is identified (process action 204). Each data entry is input one by one into the input layer until all the data entries have been input once to produce a current refined DNN (process action 206)).

As to claim 13, Yu et al teaches a method, wherein the pruning (paragraph [0022]... connection pruning) of the layers of the neural network (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural network)  comprises pruning layers of the neural network having a first type using automatically determined thresholds (paragraph [0030]...identify each interconnection associated with each layer of the fully-connected and initially trained DNN whose interconnection weight value does not exceed a first weight threshold (process action 200)).

As to claim 14, Yu et al teaches a method, further comprising:
fixing weights (paragraph [0030]...setting interconnection weight values via the error back-propagation procedure) of layers of the neural network having a second type (paragraph [0032]...second weight threshold are identified (process action 208))  different from the first type (paragraph [0030]... first weight threshold);
repeating (paragraph [0033]...repeated):
the pruning (paragraph [0022]... connection pruning) of the layers having the first type (paragraph [0030]... first weight threshold); and
the retraining (paragraph [0028]...continued training ; paragraph [0029]...future training iteration) the neural network using only weights remaining after pruning (paragraph [0023]...produce the interimly trained DNN or a refined DNN); 
fixing weights of layers of the neural network having the first type; and repeating:
pruning the layers of the neural network having the second type; and
the retraining (paragraph [0028]...continued training ; paragraph [0029]...future training iteration) the neural network using only weights remaining after pruning (paragraph [0033]...the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold is then set to zero (process action 210). Process actions 206 through 210 are then repeated a number of times to produce the trained DNN).

As to claim 15, Yu et al teaches the method, further comprising fixing weighs (paragraph [0030]...setting interconnection weight values via the error back-propagation procedure) of layers of the neural network having a second type (paragraph [0032]...second weight threshold are identified (process action 208))   different from the first type (paragraph [0030]... first weight threshold).

As to claim 16, Yu et al teaches the method, further comprising
fixing weights (paragraph [0030]...setting interconnection weight values via the error back-propagation procedure) of layers of the neural network having the first type (paragraph [0030]... first weight threshold); and repeating (paragraph [0033]...repeated):
pruning (paragraph [0022]... connection pruning) the layers of the neural network having the second type (paragraph [0032]...second weight threshold are identified (process action 208))  ; and
the retraining (paragraph [0028]...continued training ; paragraph [0029]...future training iteration) the neural network using only weights remaining after pruning (paragraph [0033]...the value of each of the identified interconnections whose interconnection weight value does not exceed the second weight threshold is then set to zero (process action 210). Process actions 206 through 210 are then repeated a number of times to produce the trained DNN).

As to claim 17, Yu et al teaches the method, further comprising generating the automatically (paragraph [0034]...To speed up the calculation, in one implementation, the indexes and actual weights are stored in adjacent groups so that they can be retrieved efficiently with good locality. A slightly different but almost equally efficient data structure implementation, pairs of indexes and weights are grouped. With the proposed data structure, each column can be multiplied with the input vector in parallel. To further speed up the calculation, parallelization can also be exploited within each column) determined thresholds (paragraph [0030]... first weight threshold ; paragraph [0032]...second weight threshold are identified (process action 208)) before retraining (paragraph [0028]...continued training ; paragraph [0029]...future training iteration)  the neural network.

Claim 11 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 has similar limitations as claim 1. Therefore, the claim is rejected for the same reasons as above. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2, 12, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598)
 	As to claim 2, Yu et al discloses the claimed invention except for “pruning error is further determined as a quantity of a length of a vector of pruned weights of the layer for a current iteration minus a vector of weights of the layer for a previous iteration divided by a quantity of a total length of a vector of initial weights of the layer minus a length of the vector of pruned weights of the layer for the current iteration.
It would have been obvious to one having ordinary skill in the art at the time the invention was made to pruning error is further determined as a quantity of a length of a vector of pruned weights of the layer for a current iteration minus a vector of weights of the layer for a previous iteration divided by a quantity of a total length of a vector of initial weights of the layer minus a length of the vector of pruned weights of the layer for the current iteration, since it has been held that discovering an optimum value of a result effective variable involves only routine skill in the art. In re Boesch, 617 F.2d 272, 205 USPQ 215 (CCPA 1980).

Claim 12 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 19 has similar limitations as claim 2. Therefore, the claim is rejected for the same reasons as above. 

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Lee (US 2003/0069652).
As to claim 8, Yu et al teaches a method, wherein pruning (paragraph [0022]... connection pruning) the layer (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) comprises setting a weight to zero (paragraph [0028]... rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning)) based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor (paragraph [0028]...theta).
Yu et al fails to explicitly show/teach that the scale factor is the standard deviation of the weights of the layer.
However, Lee teaches a scale factor is the standard deviation of the weights of the layer (Lee paragraph [0063]...determine the distance-to-threshold (d.sub.i.sup.n) values for each of the training sample i associated with this node 102. The weighted mean (.mu..sub.d.sup.n) and standard deviation (.sigma..sub.d.sup.n) for the distance values are derived from the training sample distance values 104 and stored in the node for the classification of new samples. The weighting factors are the weights associated with each training sample. Weights can be associated with samples on a variety of basis such as with the confidence of representation or accuracy of data acquisition, significance to a class determination, or other emphasis criteria. Equal weights can be applied if no additional information is available. Weights can also be automatically determined by a process such as described for tree focusing in section V. In one embodiment of the invention, a simple method accumulates the weighted distance value using the following rule).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s scale factor to be the standard deviation of the weights of the layer, as in Lee, for the purpose of the confidence of representation or accuracy of data acquisition.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Deangelis (US 5,787,408).
As to claim 9, Yu et al teaches a method, comprising: determined thresholds (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold).
Yu et al fails to explicitly show/teach, to generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance.
However, Deangelis teaches generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance (Deangelis column 5, lines 1 – 20... Training controller 32 tracks and analyzes the network error, E.sub.N, from generator 30 to determine whether to generate a trained ANN 18b as output or to signal tuner 34 to adjust weights and biases. Training controller 32 compares E.sub.N generated by generator 30 with a threshold error, E.sub.T. If E.sub.N is less than E.sub.T, the input ANN 18a has been trained to the desired level of accuracy, and controller 32 passes the input ANN to pruning processor 20 as a fully trained ANN 18b . If E.sub.N is greater than E.sub.T, input ANN 18a has not been fully trained to the desired level of accuracy and controller 32 then must determine if a timeout has occurred. If a timeout has occurred, controller 32 passes the partially trained ANN 18b along with the network error E.sub.N for the partially trained ANN to processor 20. If a timeout has not occurred, controller 32 directs network tuner 34 to adjust the weights and biases of the ANN. Preferably, controller 32 determines whether a timeout has occurred by monitoring the change in E.sub.N calculated by generator 30 over time and indicating that a timeout has occurred if the improvement (reduction) in E.sub.N over a fixed number of epochs is less than a threshold reduction. ; and column 5, lines 45 – 55... If the classification performance is better than a random guess (better than 50% for the male/female voice classification system), controller 40 adjusts the erros threshold E.sub.T to equal the network error E.sub.N of the partially trained ANN 18b and passes the updated error threshold E.sub.T to training controller 32.).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s to generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance, as in Deangelis, for the purpose of determining node utilization and functionality.



Allowable Subject Matter
Claims 3, 4, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and if the 101 Rejection is Overcome



The method of claim 1, wherein the pruning error is further determined as a quantity of a length of a vector of pruned weights of the layer for a current iteration minus a vector of weights of the layer for a previous iteration divided by a quantity of a total length of a vector of initial weights of the layer minus a length of the vector of pruned weights of the layer for the current iteration.




Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Lee (US 2003/0069652).
As to claim 8, Yu et al teaches a method, wherein pruning (paragraph [0022]... connection pruning) the layer (paragraph [0023]...input layer, output layer, plurality of hidden layers of a neural) comprises setting a weight to zero (paragraph [0028]... rounding interconnection weight values with magnitude below a prescribed minimum weight threshold to zero (e.g., min[0.02, .theta./2] where .theta. is the minimal weight magnitude that survived the pruning)) based on a magnitude of the weight being less than the threshold scaled by a predetermined scale factor (paragraph [0028]...theta).
Yu et al fails to explicitly show/teach that the scale factor is the standard deviation of the weights of the layer.
However, Lee teaches a scale factor is the standard deviation of the weights of the layer (Lee paragraph [0063]...determine the distance-to-threshold (d.sub.i.sup.n) values for each of the training sample i associated with this node 102. The weighted mean (.mu..sub.d.sup.n) and standard deviation (.sigma..sub.d.sup.n) for the distance values are derived from the training sample distance values 104 and stored in the node for the classification of new samples. The weighting factors are the weights associated with each training sample. Weights can be associated with samples on a variety of basis such as with the confidence of representation or accuracy of data acquisition, significance to a class determination, or other emphasis criteria. Equal weights can be applied if no additional information is available. Weights can also be automatically determined by a process such as described for tree focusing in section V. In one embodiment of the invention, a simple method accumulates the weighted distance value using the following rule).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s scale factor to be the standard deviation of the weights of the layer, as in Lee, for the purpose of the confidence of representation or accuracy of data acquisition.

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Stoffel et al (US 2004/0024773).
As to claim 8, Yu et al teachesa a pruning error (paragraph [0032]...for example 50%) of the pruned level reaches a pruning error allowance (paragraph [0032]...for example range between 20% and 80%) (paragraph [0032]...those interconnections associated with each hidden layer of the last produced refined DNN whose interconnection weight value does not exceed a second weight threshold are identified (process action 208). In one implementation, the second weight threshold is the lesser of a prescribed minimum weight value (e.g., 0.02) or a prescribed percentage of the previously-identified smallest non-zero interconnection weight value (which percentage for example can range between 20% and 80%). In tested embodiments, 50 percent of the identified smallest non-zero interconnection weight value was used ; paragraph [0033]...the desired number of times actions 206 through 210 are repeated is established by determining when the interconnection weights associated with the each hidden layer do not vary between iterations by more than a prescribed training threshold). 
Yu et al teaches fails to explicitly show/teach calculating the pruning error by dividing a magnitude of weight errors by a number of weights pruned.
However, Stoffel et al teaches calculating the pruning error (paragraph [0106]...cost-complexity pruning) by dividing a magnitude of weight errors by a number of weights pruned (paragraph [0106]... A technique, called minimal cost complexity pruning and developed by Breiman [BFO84] considers the predicted error rate as the weighted sum of tree complexity and its error on the training cases, with the separate cases used primarily to determine an appropriate weighting. The C4.5 algorithm uses another technique, called pessimistic pruning, that use only the training set from which the tree was built. The predicted error rate in a leaf is estimated as the upper confidence limit for the probability of error (E/N, E-number of errors, N-number of covered training cases) multiplied by N. For our project, the lack of a priori knowledge about the "right size" of the tree, as demanded by the first strategy, makes the approach used by the C4.5 algorithm the better choice for our project).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s to calculate the pruning error by dividing a magnitude of weight errors by a number of weights pruned, as in Stoffel et al, for the purpose of selecting the right sized training tree.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Deangelis (US 5,787,408).
As to claim 9, Yu et al teaches a method, comprising: determined thresholds (paragraph [0004]...interconnections associated with each layer of the initially trained DNN whose current weight value exceeds a minimum weight threshold).
Yu et al fails to explicitly show/teach, to generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance.
However, Deangelis teaches generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance (Deangelis column 5, lines 1 – 20... Training controller 32 tracks and analyzes the network error, E.sub.N, from generator 30 to determine whether to generate a trained ANN 18b as output or to signal tuner 34 to adjust weights and biases. Training controller 32 compares E.sub.N generated by generator 30 with a threshold error, E.sub.T. If E.sub.N is less than E.sub.T, the input ANN 18a has been trained to the desired level of accuracy, and controller 32 passes the input ANN to pruning processor 20 as a fully trained ANN 18b . If E.sub.N is greater than E.sub.T, input ANN 18a has not been fully trained to the desired level of accuracy and controller 32 then must determine if a timeout has occurred. If a timeout has occurred, controller 32 passes the partially trained ANN 18b along with the network error E.sub.N for the partially trained ANN to processor 20. If a timeout has not occurred, controller 32 directs network tuner 34 to adjust the weights and biases of the ANN. Preferably, controller 32 determines whether a timeout has occurred by monitoring the change in E.sub.N calculated by generator 30 over time and indicating that a timeout has occurred if the improvement (reduction) in E.sub.N over a fixed number of epochs is less than a threshold reduction. ; and column 5, lines 45 – 55... If the classification performance is better than a random guess (better than 50% for the male/female voice classification system), controller 40 adjusts the erros threshold E.sub.T to equal the network error E.sub.N of the partially trained ANN 18b and passes the updated error threshold E.sub.T to training controller 32.).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s to generate the different threshold: increasing the threshold if the pruning error is less than the pruning error allowance; and decreasing the threshold if the pruning error is greater than the pruning error allowance, as in Deangelis, for the purpose of determining node utilization and functionality.

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al (US 2013/0138598) in view of Huddleston et al (US 2004/0215430).
As to claim 18, Yu et al teaches pruning (paragraph [0022]... connection pruning) a layer of a neural network.
Yu et al teaches further comprising adjusting a dropout rate for the retraining in response to a pruning rate of the pruning.
However, Deangelis adjusting a dropout rate (paragraph [0129]...adjusted error rate) for the retraining in response to a pruning rate of the pruning (paragraph [0129]...CART uses an adjusted error rate function [AE(t)=E(t)+a*LeafCount(t- )] to generate a pool of candidate subtrees. The-first candidate is selected as follows. The adjusted error rate is calculated for the possible subtrees containing the root node, as a parameter is gradually increased. When the adjusted error rate of a subtree becomes greater than that for the root node, then that subtree is pruned. The second candidate is chosen by repeating this process starting with the first candidate subtree. The process continues until only the root node remains. A validation set of data, which was not used in the training data, is used to select among the pool of pruned candidate subtrees. The subtree with the lowest overall error rate on the validation set is declared the winner. Sometimes a cost function (e.g., some weight multiplied by the probability of misclassification) is applied along with the error rate to evaluate the best subtree. A third test data set, which is exclusive of the training and validation set, may be used to gauge the prediction or classification capabilities of the final subtree).
Therefore, it would have been obvious for one having ordinary skill in the art, at the time the invention was made for, Yu et al’s to adjust a dropout rate for the retraining in response to a pruning rate of the pruning, as in Huddleston et al, for the purpose of improving accuracy of a prediction. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075.  The examiner can normally be reached on Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/BRANDON S COLE/Primary Examiner, Art Unit 2122