DETAILED ACTION
Claims 1-24 have been examined.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 9, and 17 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 28 of copending Application No. 16/305626 (‘626 application).  Although the claims at issue are not identical, they are not patentably distinct from each other.  As shown by the table below, although claim 28 of ‘626 application does not use identical claim language as claim 1 of the instant application, the features of claim 1 of the instant application, pruning a specified set of parameters from each layer, retraining the second neural network model, and retraining based on a target sparsity rate are present in claim 28 of ‘626 application. Claims 9 and 17 of the instant application recite similar limitations as claim 1 of instant application.  Claim 9 of the instant application is a method claim and claim 17 of the instant application is a medium claim, while claim 28 of the ‘626 application is an apparatus claim.  These differences are obvious because a method or medium claim of the instant application could be implement on an apparatus as recited in the ‘626 application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claim 1 of  instant application
Claim 28 of ‘626 application
1. An apparatus comprising: 
a layer-wise pruning module to prune a specified set of parameters from each layer of a reference neural network model to generate a second neural network model having a higher sparsity rate than the reference neural network model; 
26. An apparatus comprising: 
. . . ; 
a pruner communicatively coupled to the importance metric generator, the pruner to set a subset of the plurality of parameters to zero based on the importance measurement to obtain a pruned neural network, wherein one or more parameters in the subset is to be less than one or more of the plurality of parameters that are not in the subset; 

28. The apparatus of claim 26, wherein the trained neural network is to include a plurality of layers, the importance measurement is to be conducted on a per-layer basis and the subset of the plurality of parameters is to be set to zero on a per-layer basis.
a retraining module to retrain the second neural network model in accordance with a set of training data to generate a retrained second neural network model; and
an accuracy enhancer communicatively coupled to the pruner, the accuracy enhancer to re-train the pruned neural network;
the retraining module to output the retrained second neural network model as a final neural network model if a specified target sparsity rate has been reached and to provide the retrained second neural network model to the layer-wise pruning model for additional pruning if the specified target sparsity rate has not been reached.
and an iteration manager, wherein the importance metric generator is to iteratively conduct the importance measurement, the pruner is to iteratively set the subset of the plurality of parameters to zero and the accuracy enhancer is to iteratively re-train the pruned neural network until the iteration manager detects that the pruned neural network satisfies a sparsity condition.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8-12, 16-20, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Han et al. “Learning both weights and connections for efficient neural network” (hereinafter Han), in view of Tang et al. “A pruning based method to learn both weights and connections for LSTM” (hereinafter Tang).

As per claim 1, Han teaches an apparatus comprising: 
a layer-wise pruning module to prune a specified set of parameters from each layer of a reference neural network model to generate a second neural network model having a higher sparsity rate than the reference neural network model (i.e., the second step is to prune the low weight connections, all connections with weights below a threshold are removed from the network - converting a dense network into a sparse network, see at least page 3, Figure 2, section 3, pages 4-6, section 4); 
a retraining module to retrain the second neural network model in accordance with a set of training data to generate a retrained second neural network model (i.e., final step retrains the network to learn the final weights for the remaining sparse connections, see at least page 3, Figure 2, section 3, pages 4-6, section 4); and 
the retraining module to output the retrained second neural network model as a final neural network model if a target sparsity rate has been reached and to provide the retrained second neural network model to the layer-wise pruning model for additional pruning if the target sparsity rate has not been reached (i.e., pruning followed by a retraining is one iteration, after many such iterations the minimum number connections could be found, see at least page 3, Figure 2, section 3, page 4, section 3.4).
Han does not explicitly teach a specified target sparsity rate.
Tang teaches output the retrained second neural network model as a final neural network model if a specified target sparsity rate has been reached or to provide the retrained second neural network model for additional pruning if the specified target sparsity rate has not been reached (i.e., we plan to prune 90% of the weights, we can prune and retrain repeatedly until we reach the model with only 10% of weights remain, see at least page 2, section 2.1, page 3, section 2.3, page 5, section 3.4).
It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to have modified Han such that the target sparsity rate is specified as similarly taught by Tang because setting a pruning rate can allow different pruning rates to be investigated to find a pruning rate that does not result in significant decrease in performance (see at least page 3, section 2.3, page 5, section 3.4 of Tang).

As per claim 2, Han teaches the layer-wise pruning module to perform additional pruning on the retrained second neural network model to generate a third neural network model (i.e., the second step is to prune the low weight connections, all connections with weights below a threshold are removed from the network - converting a dense network into a sparse network, pruning followed by a retraining is one iteration, after many such iterations the minimum number connections could be found, see at least pages 3-6, sections 3 and 4); and 
the retraining module to retrain the third neural network model in accordance with the set of training data to generate a retrained third neural network model (i.e., final step retrains the network to learn the final weights for the remaining sparse connections, pruning followed by a retraining is one iteration, after many such iterations the minimum number connections could be found, see at least pages 3-6, sections 3 and 4); 
the retraining module to output the retrained third neural network model as a final neural network model if the target sparsity rate has been reached or to provide the retrained third neural network model to the layer-wise pruning model for additional pruning if the target sparsity rate has not been reached (i.e., pruning followed by a retraining is one iteration, after many such iterations the minimum number connections could be found, see at least pages 3-6, sections 3 and 4).
Han does not explicitly teach the specified target sparsity rate.
Tang teaches a specified target sparsity rate (i.e., we plan to prune 90% of the weights, we can prune and retrain repeatedly until we reach the model with only 10% of weights remain, see at least page 2, section 2.1, page 3, section 2.3, page 5, section 3.4).
It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to have modified Han such that the target sparsity rate is specified as similarly taught by Tang because setting a pruning rate can allow different pruning rates to be investigated to find a pruning rate that  does not result in significant decrease in performance (see at least page 3, section 2.3, page 5, section 3.4 of Tang).

As per claim 3, Han teaches the retraining module to continue to provide each subsequent retrained neural network model to the layer-wise pruning module for additional pruning until the target sparsity rate has been reached and to output the subsequent retrained neural network model as a final neural network model when the target sparsity rate has been reached (i.e., pruning followed by a retraining is one iteration, after many such iterations the minimum number connections could be found, see at least pages 3-6, sections 3 and 4).
	Han does not explicitly teach the specified target sparsity rate.
	Tang teaches a specified target sparsity rate (i.e., we plan to prune 90% of the weights, we can prune and retrain repeatedly until we reach the model with only 10% of weights remain, see at least page 2, section 2.1, page 3, section 2.3, page 5, section 3.4).
	It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to have modified Han such that the target sparsity rate is specified as similarly taught by Tang because setting a pruning rate can allow different pruning rates to be investigated to find a pruning rate that  does not result in significant decrease in performance (see at least page 3, section 2.3, page 5, section 3.4 of Tang).

As per claim 4, Han teaches wherein the neural network models comprise deep neural network (DNN) models (see at least page 1, section 1).

As per claim 8, Han teaches wherein the reference neural network model is generated by pre-training an initial dense deep neural network (DNN) architecture configuration (i.e., initial training phase, converting a dense network, see at least page 2, paragraph 1, page 3, section 3).

As per claims 9-12 and 16, these are the method claims of claims 1-4 and 8.  Therefore, claims 9-12 and 16 are rejected using the same reasons as claims 1-4 and 8.

As per claims 17-20 and 24, these are the medium claims of claims 1-4 and 8. Therefore, claims 17-20 and 24 are rejected using the same reasons as claims 1-4 and 8.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5-7, 13-15, and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Han, in view of Tang, further in view of Sun et al. “Sparsifying Neural Network Connections for Face Recognition” (hereinafter Sun).

As per claim 5, Han does not explicitly teach wherein the pruning is done in accordance with a Joint Feed-forward and Backward Propagation Approximation (JFBPA).
Sun teaches pruning is done in accordance with a Joint Feed-forward and Backward Propagation Approximation (JFBPA) (i.e., each time before forward-propagation, weights in the given layer are first updated by dot-multiplying the dropping matrix, then the following forward- and back-propagation operations could be done in the same way as a normal while the model would behave as a sparsely-connected one, the dropped weights being updated after back-propagation would be clipped to zero again before next forward-propagation, see at least page 3, left column, last paragraph – right column, first paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han such that the pruning is done in accordance with a Joint Feed-forward and Backward Propagation Approximation (JFBPA) as similarly taught by Sun because feed forward and backward propagation are typically performed in neural networks and it is known in the art that pruning is performed on feed forward and backward propagation neural networks (see at least page 3, left column, last paragraph – right column, first paragraph of Sun).

As per claim 6, Han does not explicitly teach wherein pruning the specified set of parameters from each layer includes performing a dot product within matrix-by-matrix or matrix-by-vector multiplication.
Sun teaches pruning specified set of parameters from each layer includes performing a dot product within matrix-by-matrix or matrix-by-vector multiplication (i.e., we use a binary matrix (referred to as dropping matrix) of 0s and 1s with the same size as the weight matrix of a layer to specify the dropped or reserved weights of the given layer, each time before forward-propagation, weights in the given layer are first updated by dot-multiplying the dropping matrix, see at least page 3, left column, last paragraph).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Han such that the pruning the specified set of parameters from each layer includes performing a dot product within matrix-by-matrix or matrix-by-vector multiplication as similarly taught by Sun because Han teaches pruning is implemented by modifying Caffe a mask which disregards pruned parameters and it would have been obvious to use known technique in the same framework to achieve pruning such as disclosed by Sun which prunes in Caffe by using a dropping matrix and dot-multiplying (see at least page 3, left column, last paragraph of Sun).

As per claim 7, Han teaches wherein the layers include convolution layers and fully connected layers (see at least pages 4-6, section 4).

As per claims 13-15, these are the method claims of claims 5-7.  Therefore, claims 13-15 are rejected using the same reasons as claims 5-7.
As per claims 21-23, these are the medium claims of claims 5-7. Therefore, claims 21-23 are rejected using the same reasons as claims 5-7.

Response to Arguments
Rejection of claims under §103: 
Applicant argued that Han does not describe “a layer-wise pruning module to prune a specified set of parameters from each layer of a reference neural network model to generate a second neural network model having a higher sparsity rate than the reference neural network model.”  Applicant argued that Han fails to describe that such connection-pruning is layer-wise and that a specified set of parameters is pruned from each layer. Rather, the weight-based pruning method disclosed in Han does not necessarily prune connections from each layer. For example, if every connection in a layer is above the threshold, then no pruning would be performed on that layer
Applicant’s arguments have been fully considered, but Examiner respectfully disagrees. Han teaches pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer’s weights (page 4, section 4).  Han teaches CONV and FC layers can be pruned with different sensitivity (page 6, paragraph 4), and Figure 6 shows how accuracy drops as parameters are pruned on a layer-by-layer basis.  As shown in Figure 6, accuracy loss does not occur when 0% of parameters are pruned for a layer, so there is no reason to set the threshold such that every connection in a layer is above the threshold and no pruning is performed on that layer. Further, Examiner notes that the claim recites “prune a specified set of parameters from each layer.” There is no requirement that this specified set of parameters must always contain at least one member as a set is known in the art to also include an empty set. 
Applicant argued that Han does not describe “the retraining module to output the retrained second neural network model as a final neural network model if a specified target sparsity rate has been reached and to provide the retrained second neural network model to the layer-wise pruning model for additional pruning if the specified target sparsity rate has not been reached.” Applicant argued that in Han, repeat iterations of pruning and retraining are performed until the minimum number of connections is found, which presumable means until the number of connections stops decreasing or when there are no more prune-able connections. As such, there was no specified target number. 
	Applicant’s arguments have been fully considered, but Examiner respectfully disagrees that “a specified target sparsity rate” recited in the claims needs to be a specific target number. A minimum number of connection is reasonably interpreted as a target sparsity rate. Han is further modified with Tang to teach a target sparsity rate can be specified. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Liu et al. (US 2016/0328643) is cited to teach iterating until a target percentage of weights in each filter are set to zero.

Applicant’s amendment necessitated the new ground(s) of rejection presented in this office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP §706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jue Louie whose telephone number is 571-270-1655.  The examiner can normally be reached on M-F 9:30 am - 5:00pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Jue Louie/
Primary Examiner
Art Unit 2121