Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Remarks
This office action is responsive to Applicants’ Amendment Filed on February 11th 2021, in which claims 1-2, 4-7, and 9-11 are amended.  Claims 1-11 are currently pending.

Response to Arguments
The rejections to claims 1-11 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
The rejections to claims 2 and 7 under 35 U.S.C. § 112(b) are hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections.
Applicant’s arguments with respect to rejection of claims 1, 6 and 11 under 35USC103(a) based on amendment have been considered and persuasive. The argument is moot in view of a new ground of rejection set forth below in view of Lin et al (“Network in Network”) and Anwar et al. (“Structured Pruning of Deep Convolutional Neural Networks”).
Applicant’s arguments with respect to rejection of claims 5 and 10 under 35USC103(a) based on amendment have been considered and persuasive. The argument is moot in view of a new ground of rejection set forth below in further view of Gorinevski et al (DE 10208682 A1).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (“Network in Network”) and Anwar et al. (“Structured Pruning of Deep Convolutional Neural Networks”).
Regarding claim 1,
Lin teaches a method in a convolutional neural network (CNN) with more than three layers, the method comprising: operating the CNN during an inference phase by utilizing a learning kernel activation module (LKAM) which is inserted between a first and a second convolutional layer in the CNN, wherein said LKAM is a small CNN with one or two layers and wherein the second CNN has as inputs feature that are output from the first convolutional layer and has as output a vector of numbers (Lin shows in figure 2, small 2 layer neural networks placed in between layers of a larger neural network: 
    PNG
    media_image1.png
    392
    1090
    media_image1.png
    Greyscale

While the diagram shows a fully connected network as the micro networks this implementation choice and not a key feature.   Since Lin covers CNNs it would have been obvious at the time of the claimed invention that the micro networks could be implemented as a CNN. (In paragraph 2 of the introduction on page 1, Lin recites in part “In NIN, the GLM is replaced with a ”micro network” structure which is a general nonlinear function approximator. In this work, we choose multilayer perceptron [3] as the instantiation of the micro network, which is a universal function approximator and a neural network trainable by back-propagation.”  Also note that the outputs of the micro networks are a vector of numbers.)
Lin does not teach Indicating whethe-r the convolutional kernels in the second convolutional layer are on or off and switching off during the inference phase at least one convolutional kernel of the CNN
Anwar teaches indicating whethe-r the convolutional kernels in the second convolutional layer are on or off and switching off during the inference phase at least one convolutional kernel of the CNN based on the output vector of the second CNN.  (Anwar in the abstract, recites in part “For the same network, we can see that kernel level pruning performs better. We can achieve 70% sparsity with kernel level pruning. This is attributed to the fact that kernel pruning is finer and hence it achieves higher ratios. Further kernel pruning may ultimately prune a feature map if all the incoming kernels are pruned. However at inference time, we need to define the kernel connectivity pattern which can simply be done with a binary flag” Where the ‘binary flags’ indicate if the kernels are on or off.  Since kernels are being pruning which involves switching them off (or removing them if in software).)
Lin teaches based on the output vector of the second CNN (where the output of the two layer micro networks shown in figure 2 above have vectors of outputs that could be used as the ‘binary flags’ from Anwar.)
It would have been obvious to anyone of ordinary skill in the art at the time of the claimed invention to combine the teachings of Lin with the teachings of Anwar with the motivation of using kernel pruning in order to enable in sparsity (Anwar in section 5.1.1, recites in part “We can achieve 70% sparsity with kernel level pruning.”)
Regarding claim 6,
The limitations of claim 6 are sustainably identical to the limitations of claim 1, as such all of the rejections of claim 1 also apply to claim 6.
Regarding claim 11,
The limitations of claim 11 are sustainably identical to the limitations of claim 1, as such all of the rejections of claim 1 also apply to claim 11.



Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over the Lin/Anwar combination from claim 1 and in further view of Alpert (US 2016/0132767 A1)
Regarding claim 2, 
The Lin/Anwar combination has taught the method of claim 1, (refer to the discussion of claim 1 above).  The Lin/Anwar combination does not teach wherein level of engagement is determined through optimization of a cost function that utilizes a regularization term proportional to number of kernels that are engaged in each forward propagation step.
Alpert, in the same field of artificial neural networks, teaches wherein a level of engagement is determined through optimization of a cost function that utilizes a regularization term proportional to a number of kernels that are engaged in each forward propagation step.  (Alpert, in claim 15, recites in part “modeling power consumption of a neurosynaptic network as wire length, the neurosynaptic network comprising a plurality of neurosynaptic cores; determining an arrangement of the neurosynaptic cores by minimizing the wire length.” And also paragraph 42 with equation 1:

    PNG
    media_image2.png
    207
    681
    media_image2.png
    Greyscale

 Where the wire length is proportional to the number of neurosynaptic cores (chips) and hence the number of kernels inside thus it is modeling power in a cost function based on wire length (between kernels) which is dependent (in part) on the number of nodes/kernels, and thus is also teaches that the level of engagement is determined through the optimization of a cost function that utilizes a regularization term proportional to the number of kernels that are engaged in each forward propagation step.
  It would have been obvious to a person skilled in the art before the effective filing date of the claimed invention to substitute the number of nodes or kernels as a simpler proxy for wire length (particularly in software where there isn’t a wire length value to use) in order to lower the power consumption of the neural network.  It would have been obvious to anyone of ordinary skill in the art at time of the claimed invention to combine the teachings of the Lin/Anwar combination with the teachings of Alpert with the motivation of saving power (Alpert in paragraph 0014 recites in part “Power consumption and heat dissipation are major barriers to exascale computing. Arrays of extremely low power neurosynaptic processing units, called neurosynaptic cores, provide an architecture to solve exascale big data problems.”).
Regarding claim 7,
The limitations of claim 7 are substantially identical to the limitations of claim 2, as such all of the rejections of claim 2 also apply to claim 7.
Claims 3 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over the Lin/Anwar/Alpert combination from claim 2 and in further view of Nawi et al (“THE EFFECT OF GAIN VARAITION IN PROVING LEARNING SPEED OF BACK PROPAGATION NEURAL NETWORK ALOGORITHM ON CLASSIFICATION PROBLEMS”) and Annema et al (“FEED-FORWARD NEURAL NETWORKS”).
Regarding claim 3, the method of claim 2, has been satisfied by the Lin/Anwar/Alpert combination (see the discussion of claim 2 above), but the combination has not taught wherein the cost function is Laug= Gi /2m*∑I |swi|, where swi are the elements of SW vector which is the output of LKAM, Gg is a gain factor and m is the length of the SW vector.   (Alpert has already shown in the discussion of claim 2 that the absolute value of the wire length (a proxy for the number of kernels since wire length is proportional to the number of kernels) is summed with a sub term *∑I |swi| (the Hp term is summed as part of the second part of equation 1 as shown in claim 2).
But Alpert does not cover the Gi/2m term.
Nawi, in the same field of artificial neural networks, teaches the Gi term (Nawi, in the abstract, recites in part “In this paper, the influence of the adaptive gain on the learning ability of a neural network is analysed. Multilayer feed forward neural networks have been assessed. Physical interpretation of the relationship between the gain value and the learning rate and weight values is given. Instead of a constant ‘gain’ value, we proposed an algorithm to change the gain value adaptively for each node.”  Where the claim defines Gi as a gain term that is adapted for each node (since the Gi has ‘i’  as a subscript and hence can be a different value for each node).
It would have been obvious to a person skilled in the art before the effective filing date of the claimed invention to simplify and modify Alpert’s equation 1 and combine it with teachings of Nawi in order to improve the speed of learning during back propagation (Nawi, in the abstract, recites in part “Instead of a constant ‘gain’ value, we proposed an algorithm to change the gain value adaptively for each node. The efficiency of the proposed method is verified by means of simulation on three classification problems. The results show that the proposed method significantly improves the learning speed of the general back-propagation algorithm.”).
 Annema, in the same field of artificial neural networks, teaches the /2m term  discloses the equation 10.8 (on page 173) which is “MSEE=                        
                            
                                
                                    1
                                
                                
                                    P
                                    e
                                    f
                                    f
                                
                            
                        
                                             
                            
                                
                                    ∑
                                    
                                        p
                                        e
                                        f
                                        f
                                        =
                                        1
                                    
                                    
                                        P
                                        e
                                        f
                                        f
                                    
                                
                                
                                    (
                                    D
                                    p
                                    e
                                    f
                                    f
                                    -
                                    Y
                                    p
                                    e
                                    f
                                    f
                                    )
                                    ^
                                    2
                                
                            
                        
                    ” it is clear that the                         
                            
                                
                                    1
                                
                                
                                    P
                                    e
                                    f
                                    f
                                
                            
                        
                     term is the same as the m variable used in claims 3 and 8 as the Peff term also represents the length of the vector. The number 2 is a constant and is thus not inventive material. )
It would have been obvious to a person skilled in the art before the effective filing date of the claimed invention to simplify and modify Alpert’s equation 1 which has been combined with the teachings of Nawi and to further add the teachings of Annema to arrive at the equation of claims 7 in order more accurately model and lower the power consumption of the neural network without being stuck at a local minimum of power reduction (Annema at the top of page 173 recites in part “A first estimation on getting stuck in a minimum can be obtained by using the mean squared error over only training examples that result in non-zero weight adaptation. The examples that result in weight adaptation are denoted as effective examples, and hence the signal is denoted as the Mean Squared Effective Error signal (MSEE). In formula:” where the formula is the equation 10.8 cites above). 
Regarding claim 8,
The limitations of claim 8 are sustainably identical to the limitations of claim 3, as such all of the rejections of claim 3 also apply to claim 8.

Claims 4 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over the combination from claim 1, and in further view of Butt et al (“A Static Power Model for Architects”).
Regarding claim 4,
The Lin/Anwar combination has taught the method of claim 1, where the convolutional kernels (see the discussion of claim 1 above), but the Lin/Anwar combination does not explicitly teach where the convolutional kernels that are off are electrically switched-off when said deep neural networks are implemented in VLSI.
Butts, in the field of silicon hardware power savings, teaches that are off are electrically switched-off when said deep neural networks are implemented in VLSI.  (Butts, in figure 8, 

    PNG
    media_image3.png
    265
    473
    media_image3.png
    Greyscale
  
how to turn off a generic electrical component in VLSI when it is not in use, and thus this teaches how to implement convolutional kernels  that are electrically switched-off when said deep neural networks are implemented in VLSI.) 	
It would have been obvious to a person skilled in the art before the effective filing date of the claimed invention to modify the system of the Lin/Anwar combination with this technique taught by Butts to electrically turn off the kernels when implemented in VLSI in order to reduce power consumption.
Regarding claim 9,
The limitations of claim 9 are sustainably identical to the limitations of claim 4, as such all of the rejections of claim 4 also apply to claim 9.

Claims 5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over the  Lin/Anwar combination from claim 1 and in further view of Gorinevski et al (DE 10208682 A1).
Regarding claim 5,
The Lin/Anwar combination has taught the method of claim 1, (as shown above in the discussion of claim 1) but they do not teach wherein an external parameter is used to control an amount of computations of the CNN during the inference phase. 
 Gorinevski, in the same field of artificial neural networks, discloses an external parameter is used to control an amount of computations of the CNN during the inference phase. (Gorinevski, in paragraph 0060, recites in part “A forecast module 507 for suitable scaling and - possibly - previous evaluation of the predetermined analyst opinions 504 and for the actual forecast of the expected returns 511 on the basis of the economic data 501, whereby this module is also characterized by some of the system parameters 509 that come from the external parameters 518 in its function is controlled, the z. For example, describe the nature and connection strengths of a neural network for forecasting returns from economic data 501 or the information coefficients in a forecasting method taking into account analyst recommendations 504;” Where ‘the nature and connection strengths of a neural network’ is interpreted as describing the number of connections in the neural network which correlates to the ‘amount of computations of the CNN during the inference phase.’  Also the inference phase is inherent as the system of Gorinevski is used for forecasting which inherently requires the use of the inference phase.)
It would have been obvious to anyone of ordinary skill in the art at the time of the claimed invention to combine the teachings of the combination of Lin/Anwar with the teachings of Gorinevski with the motivation of having the greater levels of optimization allowed by the use of parameters (“In its further advantageous configurations, the invention relates to a method and a system for parameter optimization in said method for investor support, a procedure and a system for determining the customer behavior from existing histories,[…]”).
Regarding claim 10,
The limitations of claim 10 are sustainably identical to the limitations of claim 5, as such all of the rejections of claim 5 also apply to claim 10.

Conclusion	
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL EDWARD SHIPLEY whose telephone number is (408) 918-7530.  The examiner can normally be reached on Monday-Thursday and alternate Fridays 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/P.E.S./Examiner, Art Unit 2124                                                                                                                                                                                                        




/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124