Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 26,27,33-36,42-45 are rejected under 35 U.S.C. 103 as being unpatentable over Molchanov et al (“Pruning Convolutional Neural…).in view of Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) 

As per claim 26, Molchanov et al (“Pruning Convolutional Neural…) teaches a computer device for evaluating a convolutional neural network (CNN), the computer device comprising circuitry to (para 0017 of applicants spec, defines the evaluation of a CNN as a corresponding process to determine the pruning of the CNN; Molchanov et al (“Pruning Convolutional Neural…) discloses such a process in form of a method for pruning a CNN that employs a 'new criterion based on Taylor expansion that approximates the cost function induced by pruning network parameters’, -- abstract, lines 1-6; see also figure 1, section 2, and (ili) executed on a computing device comprising circuitry such as a central processing unit (CPU) or a graphical processing unit (GPU) on pp 10, table 2 and section 3.6):
 receive, with a plurality of channels of a mask layer, input data from a first layer of the CNN, wherein a configuration of the mask layer is based on a plurality of values which each correspond to a different respective channel of the plurality of channels, 

perform a gradient descent evaluation based on each of a loss L of the CNN, and a fraction w of the plurality of channels, wherein the loss L and the fraction w each correspond to the mask configuration; (see Figure 1, the step 'Evaluate importance of neurons’ in figure 1; the subsection 'Taylor expansion’ of section 2.2 on pages 4 and 5, in particular paragraphs 4 and 5 starting with ‘Finally, by substituting ...'; and section 2, paragraph 3, the loss is referred to as ‘cost function' and is denoted by 'C'.)

determine updated parameters of the CNN based on the gradient descent evaluation; and based on the updated parameters, signal that a channel is to be pruned from the CNN (see figure 1, the step 'Remove the least important neuron’; also the last paragraph on page 2 in conjuncture with footnote 1; based on the updated parameters; and with respect to pruning, see 'Continue pruning?’, ‘yes’, and 'no' in figure 1in conjuncture with (i) the first paragraph of section 2, (ii) the last paragraph on page 2, (iii) the first paragraph on page 3, and (iv) section 3, the first paragraph, the last three lines thereof: In (the device), the channel that is to be pruned pertains to a corresponding pruning action in next 'pruning’- iteration.)".
	Molchanov et al (“Pruning Convolutional Neural…) teaches the removal/zeroing of the least important values (see pp2, last half), however, does not explicitly teach calculating probability values to maintain the channel; Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) teaches the use of probability/random measures in starting a sparsity calculation to perform the pruning process (section 3.1, third paragraph).  Therefore, it would have been obvious to one of ordinary skill in the art of neural network pruning to modify the pruning technique of Molchanov et al (“Pruning Convolutional Neural…) with the probability/random measures in performing sparsity calculation to determine further pruning steps, as taught by Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) because it would advantageously further reduce computational cost compared to removal of the lowest valued parameters (Anwar, section 3.1, third paragraph; also section 2, and section 3).
   
As per claim 27, the combination of Molchanov et al (“Pruning Convolutional Neural…) in view of Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) teaches the computer device of claim 26, further comprising circuitry to perform multiple iterations with a plurality of mask layers each coupled between a respective two layers of the CNN (Molchanov et al (“Pruning Convolutional Neural…) as, the feedback loop in Fig. 1, wherein the feedback loop continues to remove least important neuron; -- the pruning occurs layer-to-layer – see pp 3, first paragraph);
wherein the multiple iterations each comprise: for each mask layer of a plurality of mask layers, the mask layer to: receive, with a plurality of channels of the mask layer, respective input data from the CNN (Molchanov et al (“Pruning Convolutional Neural…) as the convolutional layer applies the convolution(the “C” in CNN) producing an output zeta zl – pp3, first paragraph); 
and based on a current mask configuration of the mask layer, communicate an at least partially masked version of the respective input data from the mask layer to the CNN (Molchanov et al (“Pruning Convolutional Neural…) as, the partially masked version is the pruning on a subset of the kernel size – pp 3, first paragraph – see the variables with the kernel parameterization), 
wherein the current mask configuration is based on a plurality of values which each indicate a respective probability that a corresponding channel of the mask layer is to be maintained (Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) as; calculating the strided sparsity within each kernel – section 3.1, third paragraph, 
 and evaluation logic to perform a gradient descent evaluation based on each of a respective loss of the CNN and a respective amount of the processing resource, wherein the respective loss and the respective amount of the processing resource correspond to a combination of the respective current mask configurations of the plurality of mask layers ( Molchanov et al (“Pruning Convolutional Neural…), (see Figure 1, the step 'Evaluate importance of neurons’ in figure 1; the subsection 'Taylor expansion’ of section 2.2 on pages 4 and 5, in particular paragraphs 4 and 5 starting with ‘Finally, by substituting ...'; and section 2, paragraph 3, the loss is referred to as ‘cost function' and is denoted by 'C'.)); 

As per claim 33, the combination of Molchanov et al (“Pruning Convolutional Neural…) in view of Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) teaches the computer device of claim 26, wherein circuitry to perform the gradient descent evaluation comprises circuitry to determine an adjusted loss value based on a product of the amount F and a Lagrange multiplier λ.sub.F, the computer device further comprising circuitry to update the Lagrange multiplier λ.sub.F based on the updated parameters of the CNN (Molchanov et al (“Pruning Convolutional Neural…), p2, section 2, the Cost function “C(*); further, looking at the FLOPS pruning on ppp5, section 2.4, the equation is in the form of “1-” wherein  controls the amount of regularization, mapping to the well known Lagrange relationship structure, wherein  is the multiplier).

As per claim 34, the combination of Molchanov et al (“Pruning Convolutional Neural…) in view of Anwar et al (Structure Pruning of Deep Convolutional Neural Networks) teaches the computer device of claim 26, wherein the first layer and the second layer are each a respective one of a convolutional layer or a fully connected layer ( Molchanov et al (“Pruning Convolutional Neural…), as convolutional layer – pp3, equation 2, and explanation above equation 2 – convolutional layers 1,2….L).

Claims 35,36,42,43 are computer readable storage medium claims whose steps are performed by the device claims 26,27,33, 34 above and as such, claims 35,36,42, and 43 are similar in scope and content to claims 26,27,33,34 and therefore, claims 35,36,42,43 are rejected under similar rationale as presented against claims 26,27,33, and 34 above.

Claims 44-45 are method claims whose steps are performed by the device claims 26,27,33, 34 above and as such, claims 44-45 are similar in scope and content to claims 26,27,33,34 and therefore, claims 44-45 are rejected under similar rationale as presented against claims 26,27,33, and 34 above.

Allowable Subject Matter

Claims 28-32, 37-41 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The claim limitations, toward, determine a second fraction of a second plurality of channels of a second mask layer coupled between a respective two layers of the CNN, the second fraction corresponding to another mask configuration of the second mask layer, wherein circuitry to perform the gradient descent evaluation based on the fraction w includes circuitry to determine, based on the fraction w and the second fraction, an amount F of a processing resource of the CNN; is not explicitly taught by the prior art of record.  In summary, Molchanov et al teaches operations on a subset of the kernels/matrix, as shown above; Anwar et al teaches the use of sparsity calculations to determine which submatrix elements are pruned; and Yihui He et al (Channel Pruning…) teaches the use of Lagrange structures via lasso, pp2 subsection 3.1, to prune networks including branch networks and residual branches – pp3-4. 

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yihui He et al (Channel Pruning…) teaches the use of Lagrange structures via lasso, pp2 subsection 3.1, to prune networks including branch networks and residual branches – pp3-4.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        09/07/2022