DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The following claims is/are pending in this office action: 1-20
The following claim(s) is/are amended: 1-20
The following claim(s) is/are new: None
The following claim(s) is/are cancelled: None
Claim(s) rejected: 1-20. The rejection is FINAL

Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in IN201641044431 filed in Republic of India on 12/27/2016. It is noted, however, that applicant has not filed a certified copy of the IN201641044431 application as required by 37 CFR 1.55.

Previous Objections Withdrawn
Objections to the specification is withdrawn based on the amendments.

Previous Rejections Withdrawn
Rejections to claims 17-20 under 35 U.S.C. 101 are withdrawn based on the
amendments.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wen et al. (“Learning structured sparsity in deep neural networks”; hereinafter “Wen”) in view of Tom’ et al. (“Reduced memory region based deep convolutional neural network detection”; hereinafter “Tom’”) further in view of Liu et al. (“Sparse convolutional neural networks”; hereinafter “Liu”).

Regarding claim 1, Wen teaches a method for generating a sparsified convolutional neural network (CNN) (Section 1 para 2; Section 2 para 1: sparsification on convolutional network is done using weights sparsity on convolutional layers) the method comprising: training the CNN to generate coefficient values of filters of convolution layers (Section 1 last para, Section 3.1 first para; Structured Sparsity Learning (SSL) method learns as compressed structure of deep CNN during training. The method adjusts multiple structure of filters, channels, filter shapes within each layer and structure of depth beyond the layers, which will also include the weights of the convolutional layer)
and performing sparsified fine tuning on the convolution layers to generate the sparsified CNN, (Section 4.1 para 3 ResNet; after Structured Sparsity Learning converges, layers with all zero weights are removed and the network is finally fine-tuned with a base learning rate. Section 4.3 para 1 AlexNet; In SSL, AlexNet is first trained with structure regularization; when it converges, zero groups are removed to obtain a DNN with the new structure; finally, the network is fine-tuned without SSL to regain the accuracy.) wherein the sparsified fine tuning ca uses selected nonzero coefficient values of the filters to be set to zero. (Section 4.1 para 2 ConvNet; when SSL is applied, half of convl filters in ConvNet 2 can be zeroed out without accuracy drop.)
Wen does not explicitly teach the sparsified fine tuning comprising: determining a sparsity target for a subset of filters of a convolution layer; determining a maximum sparsity threshold for the subset
Tom’ teaches the sparsified fine tuning comprising: determining a sparsity target for a subset of filters of a convolution layer (Sparsity target is defined as "the desired ratio of the number of zero coefficients to the total number of coefficients in the layer. Any suitable value may be used for the sparsity target" see spec para 0058. The specification states that any suitable value can be used for sparsity target so the value is not limited. Section IV para 1; sparsity target is achieved by changing the proportion of non-zero weights using different threshold values.)
(Section IV Scalar quantization, Pruning).
Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the sparsified CNN of Wen with the sparsity tuning and threshold of Tom’ to reduce the complexity of CNN while maintaining the accuracy levels (Tom’ Section IV para 1, pruning, page 1 abstract).
Neither Wen nor Tom’ explicitly teach determining a current sparsity threshold for the subset based on the sparsity target and the maximum sparsity threshold; using the current sparsity threshold to identify nonzero coefficient values of the subset to be set to zero; after the using the current sparsity threshold, changing the sparsity threshold to a changed sparsity threshold based on the maximum sparsity threshold and a comparison of a sparsity of the subset to the sparsity target; and repeating the using the current sparsity threshold, using the changed sparsity threshold as the current sparsity threshold.
Liu, however, teaches determining a current sparsity threshold for the subset based on the sparsity target and the maximum sparsity threshold (Page 813: “In our cascade model, we set the thresholds of first stage so that the precision for each class equals 0.05. Approximately 80% of candidate windows are pruned for each image…” Abstract: “Maximum sparsity is obtained by exploiting both inter-channel and intra-channel redundancy, with a fine-tuning step that minimize the recognition loss caused by maximizing sparsity.” Page 807 para1: “We are able to zero out more than 90% of the convolutional kernel parameters of the network in [14] with relatively small number of bases while keeping the drop of accuracy to less than 1%.” Maximum sparsity threshold and sparsity target is set such that model accuracy drop is less than 1%. See also Figure 4 for sparsity % at different level of accuracy.)
using the current sparsity threshold to identify nonzero coefficient values of the subset to be set to zero (Section 5 para 3: “A lower threshold retains high recall so that the overall accuracy is not affected while a higher threshold removes more candidates to achieve higher efficiency. In our case, we found that a threshold with a corresponding precision equals 0.05 is a balanced tradeoff.” Current threshold is selected corresponding to precision 0.05, which will result in removing non-zero candidates accordingly.)
after the using the current sparsity threshold, changing the sparsity threshold to a changed sparsity threshold based on the maximum sparsity threshold (Page 807 para1: “In the fine tuning…accuracy to less than 1%.” During tuning different sparsity values or thresholds may be used to minimize the training error. The maximum threshold is one that can give less than 1% accuracy drop. At this level maximum sparsity achieved is above 90%.) and a comparison of a sparsity of the subset to the sparsity target (Figure 4. shows the comparison of sparsity % with the sparsity target. Sparsity target is achieved when it gives less than one percent accuracy drop.)
and repeating the using the current sparsity threshold, using the changed sparsity threshold as the current sparsity threshold (Section 5 para 3: “A lower threshold retains…a balanced tradeoff.” Current threshold is set which gives precision 0.05. This threshold can be changed if different precision is desired. In that case the new or changed threshold will become the current threshold).
(Liu, Page 807 para 2, Abstract).

Regarding claim 2, Wen, Tom’, and Liu teach the method of claim 1.
Liu also teaches wherein the current sparsity is initially determined as an initial sparsity threshold (Section 6.8 Para 2: “In our cascade model, we set the thresholds of first stage so that the precision for each class equals 0.05. Approximately 80% of candidate windows are pruned for each image.” Initial threshold is set to give a specific precision)
and wherein the changing the sparsity threshold includes increasing the current sparsity threshold by an amount less than a difference between the initial sparsity threshold and the maximum sparsity threshold (Figure 4 shows sparsity threshold is varied to find maximum sparsity that gives less than 1% accuracy drop. The figure shows first iteration sparsity is about 65% and second iteration it is about 80%. Maximum sparsity is above 90%. It clearly shows sparsity increase from first iteration to second iteration is less than the difference between first sparsity and the maximum sparsity. The sparsity increase from first to second sparsity iteration is about 15% whereas the difference from first sparsity iteration to maximum sparsity is about 30%).
Same motivation to combine the teachings of Wen, Tom’ and Liu as claim 1.
Regarding claim 3, Wen, Tom’ and Liu teach the method of claim 1.
(Section 3.2 para 2; includes all filters in the                         
                            
                                
                                    l
                                
                                
                                    t
                                    h
                                
                            
                        
                     layer.).

Regarding claim 4, Wen, Tom’ and Liu teach the method of claim 1. 
Wen further teaches where in the subset comprises all filters corresponding to an output feature map of the convolution layer (Page 5 section MLP: the method is enforced on all the input (or output connections of each neuron.).

Regarding claim 5, Wen, Tom’ and Liu teach the method of claim 1. 
Tom further teaches determining a maximum sparsity threshold further comprises: determining a maximum of absolute values of coefficients in the subset (Page 3 section Scalar quantization, Pruning: maximum achievable compression rate is used to find absolute compressed weights)
and setting the maximum sparsity threshold to a fraction of the maximum (Page 3 section pruning; the threshold is set as the pth percentile of the weight distribution, where p = 100. (1-1/f) and f is the compression factor).
Same motivation to combine the teachings of Wen, Tom’ and Liu as claim 1.

Regarding claim 6, Wen, Tom’ and Liu teach the method of claim 1. 
Liu further teaches wherein the using the current sparsity threshold and the changing the sparsity threshold are repeated until the sparsity is equal to or greater than the sparsity target (Page 813 last para: “In our cascade model, we set the thresholds of first stage so that the precision for each class equals 0.05.” Surprisingly, high sparsity…less than 1%. The highest level of sparsity that gives than 1% accuracy drop is a maximum sparsity target. Figure 4 shows multiple iterations are done until maximum sparsity is achieved under less than 1% forecast drop).
Same motivation to combine the teachings of Wen, Tom’ and Liu as in claim 1.

Regarding claim 7, Wen, Tom’ and Liu teach the method of claim 1.
Tom’ further teaches wherein the sparsity target is for all filters of all convolution layers of the CNN (Section IV: the distribution of individual weights and quantization is done for each layer).
Same motivation to combine the teachings of Wen, Tom’ and Liu as in claim 1.

Regarding claim 8, Wen, Tom’ and Liu teach the method of claim 1. 
Liu further teaches wherein the sparsity target is increased for a first subset of filters of a first convolution layer in response to second subset of filters of a second convolution layer not meeting the sparsity target (“As initial decompositions, the above method can only obtain limited sparsity… The underlying reason of why fine-tuning can increase sparsity is that the original network is trained without any sparsity constraint.” The para suggest that sparsity target is increased gradually using fine tuning. Table 1 and 2 shows sparsity % of different convolution layers. The sparsity % target of first convolution layer can be increased if it helps second layer meeting its sparsity target).
Same motivation to combine the teachings of Wen, Tom’ and Liu as in claim 1.

Regarding claim 9, Wen teaches a computer system comprising: a processor; and a memory storing software instructions that, when executed by the processor, cause the processor to (Section 4 Para 1; Page 2 Para 1, Section 4.3 Para 2).
All other limitations are substantially similar to claim 1 and are rejected in the same manner, the same art and reasoning applying.

Regarding claim 17, Wen teaches a non-transitory computer readable medium storing software instructions that, when executed by one or more processors (Section 4 Para 1; Page 2 Para 1, Section 4.3 Para 2).

Regarding claim 10-16, they are substantially similar to claims 2-8 respectively, and are rejected in the same manner, the same art and reasoning applying.

Regarding 18, 19, and 20, they are substantially similar to claims 2, 5 and 8 respectively, and are rejected in the same manner, the same art and reasoning applying.

Remarks
Claims 1-20 have been amended by the applicant. New amendments have been added in 103 rejections and relevant citations have been provided. Examiner adds a new reference (Liu et al.) to address new amendments. All claims remain rejected.
Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
An inquiry concerning this communication or earlier communication from the examiner should be directed QAMAR IQBAL whose telephone number is 571-272-2563. The examiner can normally be reached on M-F 10-6pm (EST). 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 

/Q.I/ 
Examiner 
Art unit 2123
02/10/2021

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123