DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: the neural network 100 … input layer 110, a first inner layer 120, a second inner layer 130, and an output layer 140 … one or more neurons 150 … synapses 160.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The specification filed on 06/28/2018 is accepted.

Information Disclosure Statement


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-2 are rejected under U.S.C. 102 (a) (2) as being anticipated by Ji et al. (US Pub. 2018/0232640).
	As per claim 1, Ji teaches a method for pruning a neural network [abstract, “a method, comprising: pruning a layer of a neural network having multiple layers using a threshold”], the method comprising: 
initializing a plurality of threshold values respectively corresponding 5 to a plurality of layers included in the neural network [paragraph 0005, “pruning a plurality of layers of a neural network using automatically determined thresholds”; paragraphs 0019-0020, “set a good threshold for each layer of neural networks to prune the networks as much as possible but in the meanwhile maintaining original performance … a threshold for pruning a layer of a neural network is initialized … the initial threshold may be the same for all layers … the threshold may be different for some or each of the layers”];
selecting one of the plurality of layers [paragraph 0006, “prune a layer of a neural network having multiple layers using a threshold”]; 
adjusting the threshold value of the selected layer [paragraph 0023, “If the pruning error has not reached the pruning error allowance, the threshold is changed”; paragraph 0006, “repeat the pruning of the layer of the neural network using a different threshold”]; and 
adjusting a plurality of weights respectively corresponding to a plurality of synapses included in the neural network [paragraph 0019, “the neural networks may be pruned so as to make many of parameters to be zero”; paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero”].10  

As per claim 2, Ji teaches the method of claim 1.
Ji further teaches
the neural network is a convolutional neural network, and wherein the plurality of layers includes an input layer and one or more inner layers [abstract, “a method, comprising: pruning a layer of a neural network having multiple layers using a threshold”; paragraph 0049, “convolution-type (CONV) layers (inner layers) are pruned using the automatically determined thresholds for those layers”; It can be understood that a convolutional neural network includes an input layer].

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Kubo et al. (US Patent 10,515,312).
As per claim 3, Ji teaches the method of claim 1.
Ji teaches in paragraph 0051 “some of the non-zero weights … may have fallen below the pruning threshold for the associated layer. Accordingly, those weights may be set to zero”.
Ji does not teach
a value of a neuron in the selected layer that is less than the threshold value is set to be 0.
Kubo teaches  
a value of a neuron [activation probability] in the selected layer that is less than the threshold value is set to be 0 [claims 1, 10 and 12, “deactivate a first node of a first layer of the plurality of internal layers … to deactivate the first node is based at least partly on the activation probability satisfying a criterion, wherein the criterion comprises a value of the activation probability falling below a threshold … wherein deactivating at least the first node of the artificial neural network comprises multiplying a value of the first node by zero”]. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of setting a value of a neuron that is less than the threshold .

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Pang et al. (US Pub. 2015/0326695).
As per claim 4, Ji teaches the method of claim 1.
Ji further teaches 
determining a pruning ratio of the neural network “…” after the plurality of weights corresponding to the plurality of synapses have been adjusted [paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero (adjusting weights)”; paragraph 0026, “the percentage of non-zero weights is calculated and comparted to the acceptable percentage or range of percentages … the percentage of pruned weights may be used and compared to a corresponding range or value”; paragraph 0017, “If the percentage of pruned weights is not within a range for the layer, in 112, the pruning error allowance is changed”; paragraph 0028, “The subsequent operations may be performed until the percentage of pruned weights is within the range”].
Ji does not explicitly teach
determining whether a pruning ratio of the neural network is less than a target value;
Pang teaches
determining whether a pruning ratio of the neural network is less than a target value [paragraph 0108, “if the terminal detects that compression rates of several consecutive compressed packets are less than a set threshold, the terminal stops performing compression”; since Ji teaches after setting some of the weights to zero, the system compares the pruning ratio with a desired percentage or a range of percentage to determine if the subsequent operations are needed to be performed for a selected layer, when the pruning ratio is within a desired range, process the next layer of the neural network. Ji, however, is silent of determining if the pruning ratio is less than a desired value/threshold, combining Pang is to fill in the missing element, Pang teaches determining if the compression rate is less than a threshold, therefore, the combination of Ji and Pang teach the claim limitation];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of determining whether a pruning ratio is less than a target value  Attorney Docket No. 16-2719-63953of Pang into the method of pruning and retraining of neural networks of Ji. Doing so would help reducing the network complexity and improving utilization of a network resource (Pang, abstract).

As per claim 5, Ji and Pang teach the method of claim 4.
Pang teaches in paragraph 0108, “detects that compression rates of several consecutive compressed packets are less than a set threshold, the terminal stops performing compression on the application layer packet”.
Ji further teaches 
when the pruning ratio is less than the target value, selecting a second layer among the plurality of layers [paragraph 0026, “the percentage of pruned weights may be used and compared to a corresponding range or value”; paragraph 0028, “The subsequent operations may be performed until the percentage of pruned weights is within the range of an acceptable amount … the next layer may be processed similarly. Accordingly, the process may repeat for each layer of the neural network”; since Ji teaches the pruning ratio is compared with a value or a range, when the pruning ratio is within a desired value, the pruning process is done for the selected layer, and the next layer of the network will be processed with the similar steps, Ji does not explicitly teach when the pruning ratio is less than a value, stop processing the current layer and select the next layer to process, while Pang teaches when the compression rate is less than a value, stop performing compression, therefore, the combination of Ji and Pang read on the claim limitation], adjusting the threshold value of the second layer [paragraph 0023, “If the pruning error has not reached the pruning error allowance, the threshold is changed”; paragraph 0006, “repeat the pruning of the layer of the neural network using a different threshold”], and adjusting the plurality of weights respectively corresponding to the plurality of synapses after the threshold value of the 5 second layer has been adjusted [paragraph 0019, “the neural networks may be pruned so as to make many of parameters to be zero”; paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero”].10    

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Li et al. (Pruning Filters for Efficient ConvNets).
As per claim 6, Ji teaches the method of claim 1.
Ji further teaches 
adjusting the plurality of threshold values respectively corresponding 10 to the plurality of layers [paragraphs 0019-0020, “set a good threshold for each layer of neural networks to prune the networks as much as possible but in the meanwhile maintaining original performance; paragraphs 0005-0006, “pruning a plurality of layers of a neural network using automatically determined thresholds … repeat the pruning of the layer of the neural network using a different threshold until a pruning error of the pruned layer reaches a pruning error allowance”];
determining reduction rates of accuracy respectively for the plurality of layers based on the adjusted plurality of threshold values [Fig. 6, paragraphs 0061-0062 disclose the results of pruning techniques and list of layers of the neural network, “The results show that pruning as described herein can achieve an accuracy comparable to the unpruned network with fewer weights than the prefixed thresholds, Fig. 6 shows the GoogLeNet pruned has the smallest size of weight parameters and the pruned training neural networks achieves over 89% top-5 accuracy with the smallest size of weight parameters (89.12% compare to the unpruned 89.15%), where, different thresholds are used in the pruning process until a pruning error of the pruned layer reaches a pruning error allowance]; and 
Ji teaches the process of pruning a neural network based in part on adjusting the thresholds, and it can be understood that pruning a NN would generate a compact NN which requires less storage space and fewer calculations results to reducing the complexity level. However,
Ji does not teach
determining reduction rates of computational complexity;
selecting a layer among the plurality of layers according to the 15 reduction rates of accuracy and the reduction rates of computational complexity corresponding to the plurality of layers.
 Li teaches
determining reduction rates of computational complexity [page 4, 2nd paragraph, “Magnitude-based weight pruning may prune away whole filters when all the kernel weights of a filter are lower than a given threshold. However, it requires a careful tuning of the threshold and it is difficult to predict the exact number of filters that will eventually be pruned”; Fig 2, tables 1-2, page 7, “each of the convolutional layers with 512 feature maps can drop at least 60% of filters without affecting the accuracy … With 50% of the filters being pruned in layer 1 and from 8 to 13, we achieve 34% FLOP reduction for the same accuracy; page 13, section 6.2, “FLOP is a commonly used measure to compare the computation complexities of CNNs”];
selecting a layer among the plurality of layers according to the 15 reduction rates of accuracy and the reduction rates of computational complexity corresponding to the plurality of layers [Fig 2, table 2, page 7, the first layer is robust to pruning as compared to the next few layers … Even when 80% of the filters from the first layer are pruned, the number of remaining filters (12) is still larger than the number of raw input channels. However, when removing 80% filters from the second layer, the layer corresponds to a 64 to 12 mapping, which may lose significant information from previous layers, thereby hurting the accuracy. With 50% of the filters being pruned in layer 1 and from 8 to 13, we achieve 34% FLOP reduction for the same accuracy (selecting the layer(s) to prune based on the percent of FLOP and accuracy reduction)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of determining reduction rates of computational complexity, selecting a layer among the plurality of layers according to the 15 reduction rates of accuracy and the reduction rates of computational complexity corresponding to the plurality of layers  Attorney Docket No. 16-2719-63953of Li into the method of pruning and retraining of neural networks of Ji. Doing so would help pruning some single layers to improve the performance of the neural network.

Claims 8-9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Yao et al. (US Pub. 2019/0188567).
As per claim 8, Ji teaches the method of claim 1.
Ji does not teach
adjusting the plurality of 25 weights of the plurality of synapses comprises:24 
calculating a loss function based on an output vector corresponding to output data, the output data being output from the neural network when input data is provided to the neural network; 
calculating a plurality of partial differential equations of the loss 5 function corresponding to the plurality of weights of the plurality of synapses; and

Yao teaches
adjusting the plurality of 25 weights of the plurality of synapses comprises:24 
calculating a loss function based on an output vector corresponding to output data, the output data being output from the neural network when input data is provided to the neural network [paragraphs 0065-0066, “the current DNN model may be applied to the mini-batch and the result from the output layer may be compared to the known ground truth output for the mini-batch to determine the network loss … backward propagation may be applied to the current DNN model and, based on known node outputs, the loss function gradient may be determined”]; 
calculating a plurality of partial differential equations of the loss 5 function corresponding to the plurality of weights of the plurality of synapses [paragraph 0051, equation 4, “weights at the current iteration may be provided as updates to weights from the previous iterations such that updates include a product of a learning rate and a partial differential of the network loss for the weight”]; and
adjusting the plurality of weights according to a plurality of current weights of the plurality of synapses and the plurality of partial differential equations [paragraph 0069, equation 4, “the connection weight matrix for the current layer of the DNN model may be updated. The connection weight matrix for the current layer may be updated using any suitable technique or techniques … each connection weight may be provided based on a positive learning rate parameter and the gradient of the loss function … each connection weight may be provided as a product of the positive learning rate parameter and a partial derivative of the loss function”; paragraph 0081, “updating the previous matrix of connection weights to a current matrix of connection weights based on the previous matrix of connection weights and the loss function gradient”].10  


As per claim 9, Ji and Yao teach the method of claim 8.
Yao further teaches 
the loss function includes a variable corresponding to a distance between the output vector and a truth vector, the truth vector being given as a truth corresponding to the input data [paragraph 0065, “the result from the output layer may be compared to the known ground truth output for the mini-batch to determine the network loss”; it can be seen that the loss is based on the difference (distance) between the output and the ground truth; paragraph 0037, “correction or reweighting (weights updating) … is provided based on the difference (e.g., network loss) between the result from the current iteration of the DNN model and the ground truth result based on the known training set”].15  
Claim 9 is rejected using the same rationale as claim 8.

As per claim 11, Ji and Yao teach the method of claim 8.
Yao further teaches 
decreasing a weight of a synapse when the partial differential equation corresponding to the weight is positive [paragraphs 0050-0051, “updating weights 115 (e.g., W.sub.k) … an update operation for W.sub.k may be provided as shown in Equation (4): 
    PNG
    media_image1.png
    43
    364
    media_image1.png
    Greyscale
where .beta. indicates a positive learning weight”; 
It can be seen that when the partial differential 5 equation corresponding to the weight is positive, the second term of the equation (4) becomes negative, thus wk is increased”]; and 
increasing the weight of the synapse when the partial differential 5 equation corresponding to the weight is negative [paragraphs 0050-0051, “updating weights 115 (e.g., W.sub.k) … an update operation for W.sub.k may be provided as shown in Equation (4): 
    PNG
    media_image1.png
    43
    364
    media_image1.png
    Greyscale
where .beta. indicates a positive learning weight”; 
It can be seen that when the partial differential 5 equation corresponding to the weight is negative, the second term of the equation (4) becomes positive, thus wk is increased”].  
Claim 11 is rejected using the same rationale as claim 8.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Yao et al. and further in view of Qiao et al. (US Pub. 2016/0140437).
As per claim 10, Ji and Yao teach the method of claim 8.
Ji and Yao do not teach
each of the plurality of partial differential equations comprises a partial differential equation of a synapse connecting a start neuron with an end neuron, the start neuron being included in a lower level layer than the end neuron, the partial differential 20 equation of the synapse connecting the start neuron with the end neuron being based on a value of the start neuron and a partial differential equation of a synapse that connects the end neuron with a neuron in a higher level layer than the end neuron.25  
Qiao teaches
each of the plurality of partial differential equations comprises a partial differential equation of a synapse connecting a start neuron with an end neuron, the start neuron being included in a lower [claim 1, “ 
    PNG
    media_image2.png
    46
    188
    media_image2.png
    Greyscale
, where wmk1(t) is connecting weight between mth node in the input layer and kth node in the hidden layer at time t], the partial differential 20 equation of the synapse connecting the start neuron with the end neuron being based on a value of the start neuron [claim 1, “ 
    PNG
    media_image2.png
    46
    188
    media_image2.png
    Greyscale
, where 
    PNG
    media_image3.png
    51
    181
    media_image3.png
    Greyscale
, 
    PNG
    media_image4.png
    52
    128
    media_image4.png
    Greyscale
, 
    PNG
    media_image5.png
    57
    209
    media_image5.png
    Greyscale
, and u1(t) is the value of input variable 1, u2(t) is the value of input variable 2, etc., thus, the partial differential equation 
    PNG
    media_image2.png
    46
    188
    media_image2.png
    Greyscale
comprises the value of the input node] and a partial differential equation of a synapse that connects the end neuron with a neuron in a higher level layer than the end neuron [claim 1, “ 
    PNG
    media_image2.png
    46
    188
    media_image2.png
    Greyscale
, where 
    PNG
    media_image3.png
    51
    181
    media_image3.png
    Greyscale
, 
    PNG
    media_image4.png
    52
    128
    media_image4.png
    Greyscale
, and wk3(t) is connecting weight between kth node in the hidden layer and the node in the output layer at time t, therefore, the partial differential equation 
    PNG
    media_image2.png
    46
    188
    media_image2.png
    Greyscale
comprises a partial differential equation of a synapse that connects the end neuron (hidden node) with a neuron in a higher level layer (output node)].25  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have included a partial differential equation of a synapse connecting a start neuron with an end neuron, the partial differential 20 equation of the synapse connecting the start neuron with the end neuron being based on a value of the start neuron and a partial differential equation of a synapse that connects the end neuron with a neuron in a higher level layer than the end neuron Attorney Docket No. 16-2719-63953of Qiao into the method of pruning and retraining of neural networks of Ji. Doing so would help determining the .

Claims 12-14 are rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Brothers et al. (US Pub. 2016/0358070).
As per claim 12, Ji teaches a device for pruning a neural network [paragraph 0006, “a system, comprising: a memory; and a processor coupled to the memory and configured to: prune a layer of a neural network”], the device comprising: 
a computing circuit [Fig. 8, paragraph 0063, “A system 800 includes a processor”; claim 20, “the processor is further configured to … calculating the pruning error … calculating the percentage of pruned weights …”] configured to perform a convolution operation including an addition operation [paragraph 0016, “systems may operate according to other methods having different and/or additional operations and operations in different orders”; paragraph 0049, “convolution-type (CONV) layers (inner layers) are pruned using the automatically determined thresholds for those layers”];10 
an input signal generator configured to generate input data [paragraph 0063, “FIG. 8 is a system according to some embodiments. A system 800 includes a processor 802 … The processor 802 may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit, a microcontroller, a programmable logic device, discrete circuits, a combination of such devices”; claim 20, “the processor is further configured to, for each of the layers of the neural network: initialize the pruning error allowance; initialize the threshold”; since the processor may include circuits, controller … or a combination of the devices, and the processor initialized error allowance, initialize the threshold, etc. which are parts of the input 806, therefore, it can be seen that the processor comprises a device that generates the input (input signal generator)], and to input the input data to the computing circuit [paragraph 0065, “the system 800 may be configured to receive inputs 806 such as a neural network, initial thresholds, initial pruning error allowances, acceptable pruning percentage ranges, or the like”]; 
a threshold adjusting circuit [claim 20, “the processor is further configured to, for each of the layers of the neural network: changing the threshold in response to the comparison”; since the processor may include circuits, controller … or a combination of the devices, and the processor is configured to adjust the threshold, therefore, it can be seen that the processor comprises a threshold adjusting circuit] configured to adjust a threshold value of a selected layer among a plurality of layers in the neural network [claim 20, “, “the processor is further configured to, for each of the layers of the neural network: changing the threshold in response to the comparison”]; 
a weight adjusting circuit [claim 20, “the processor is further configured to, for each of the layers of the neural network: pruning the layer of the neural network using the threshold”; paragraph 0019, “the neural networks may be pruned so as to make many of parameters to be zero”; paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero”; since the processor may include circuits, controller … or a combination of the devices, and the processor is configured to pruning the layer of the neural network using the threshold (adjusting the weights by setting some of the weights to zero), therefore, it can be seen that the processor comprises a weight adjusting circuit] configured to adjust a plurality of weights 15 of a plurality of synapses, respectively, which are included in the neural network [claim 20, “the processor is further configured to, for each of the layers of the neural network: pruning the layer of the neural network using the threshold”; paragraph 0019, “the neural networks may be pruned so as to make many of parameters to be zero”; paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero”]; and
[paragraph 0063, “A system 800 includes a processor 802 … The processor 802 may be … a microcontroller] configured to prune the neural network by controlling the threshold adjusting circuit and the weight adjusting circuit [claim 20, “the processor is further configured to, for each of the layers of the neural network: pruning the layer of the neural network using the threshold … changing the threshold in response to the comparison”; since the processor may include controller, and the processor is configured to pruning the layer of the neural network using the threshold (adjusting the weights by setting some of the weights to zero) … changing the threshold in response to the comparison, and as explained above, the processor may include circuits, controller … or a combination of the devices which are configured to adjust a threshold value and adjusting the weights, therefore, it can be seen that the processor can be a controller configured to control the threshold adjusting circuit and the weight adjusting circuit to perform the thresholds and weights adjusting]. 20  
Ji does not teach 
a computing circuit configured to perform a convolution operation including an addition operation and a multiplication operation.
Brothers teaches
a computing circuit configured to perform a convolution operation including an addition operation and a multiplication operation [paragraph 0047, “in a first iteration, the neural network analyzer may analyze the first neural network to perform pruning … in a second iteration, the neural network analyzer may analyze the (now second) neural network to perform convolution kernel substitution and apply the substituted convolution kernels to selected portions, then scaling, and so forth”; paragraphs 0083-0086, “neural network 102 includes a convolution layer in which input feature maps A, B, C, and D are processed by convolution kernels K1, K2, K3, and K4 respectively. The neural network analyzer has determined that convolution kernels K1, K2, and K3 are similar and formed a group 920 … output feature map 925 may be represented by the expression: A*K1+B*K2+C*K3+D*K4 … the convolution layer is modified so that each of input feature maps A, B, and C, which belong to group 920, is multiplied by a scaling factor shown as SF1, SF2, and SF3, respectively. The scaled results are summed … rather than perform a separate convolution operation for each of input feature maps A, B, and C, the input feature maps are scaled and summed to generate composite feature map 935, that may then be convolved with base convolution kernel 930”; paragraph 0027, “perform operations as part of executing a neural network … where the operations include … calculations (e.g., multiply, add, and so forth)”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of performing a convolution operation including an addition operation and a multiplication operation  Attorney Docket No. 16-2719-63953of Brothers into the method of pruning and retraining of neural networks of Ji. Doing so would help tuning the neural network by in part performing the convolution operations.

As per claim 13, Ji and Brothers teach the device of claim 12.
Brothers further teaches
a memory device configured to store initial weights of the plurality of synapses, the input data, or both [Fig. 1, paragraph 0025, “receive or accesses neural network 102 as an input … Neural networks 102, 106 can be received as an electronic signal … and stored in a file or in memory”; paragraph 0026, “"neural network," as used within this disclosure, means a programmatic description or definition of a neural network. The neural network programmatically defines parameters, connection weights, or other specifics of the architecture such as the number of neurons contained therein or the connectivity among the neurons”; paragraph 0027, “neural network 102 may be trained to a point where the weights of the neural network have converged or substantially converged”; It can be seen that the memory stored the neural network 102 which includes the connection weights].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have included a memory device configured to store initial weights of the plurality of synapses  Attorney Docket No. 16-2719-63953of Brothers into the method of pruning and retraining of neural networks of Ji. Doing so would help the system extracting and modifying the connection weights of the input neural network to generate a new compact neural network which requires less storage space and fewer calculations results to reducing the complexity level.

As per claim 14, Ji and Brothers teach the device of claim 12.
Ji further teaches
the controller [paragraph 0063, “A system 800 includes a processor 802 … The processor 802 may be … a microcontroller] controls the25 threshold adjusting circuit to adjust a threshold value of the selected layer26 among the plurality of layers [paragraph 0063 discloses the processor can be a combination of the devices comprising a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit, a microcontroller, a programmable logic device, discrete circuits, and claim 20 recites “the processor is further configured to, for each of the layers of the neural network: changing the threshold in response to the comparison”; It can be seen that the processor can be a microcontroller that comprises and controls a circuit to perform the threshold changing], and then the controller controls the weight adjusting circuit to adjust the plurality of weights of the plurality of synapses in the neural network [paragraph 0063 discloses the processor can be a combination of the devices comprising a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit, a microcontroller, a programmable logic device, discrete circuits, and claim 20 recites “the processor is further configured to, for each of the layers of the neural network: pruning the layer of the neural network using the threshold”; where, paragraph 0019, “the neural networks may be pruned so as to make many of parameters to be zero”; paragraph 0021, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero”; It can be seen that the processor can be a microcontroller that comprises and controls a circuit (weight adjusting circuit in this case) to perform the weight adjusting]. 5   

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Brothers et al. and further in view of Kubo et al. (US Patent 10,515,312).
As per claim 15, Ji and Brothers teach the device of claim 12.
Ji further teaches
the controller [paragraph 0063, “A system 800 includes a processor 802 … The processor 802 may be … a microcontroller] controls the threshold adjusting circuit, the computing circuit, or both [claim 20, “the processor is further configured to, for each of the layers of the neural network: pruning the layer of the neural network using the threshold … changing the threshold in response to the comparison”; since the processor may include controller, and the processor is configured to pruning the layer of the neural network using the threshold (adjusting the weights by setting some of the weights to zero) … changing the threshold in response to the comparison, and as explained above, the processor may include circuits, controller … or a combination of the devices which are configured to adjust a threshold value and adjusting the weights, therefore, it can be seen that the processor can be a controller configured to control the threshold adjusting circuit to perform the thresholds adjusting]; 
Ji and Brothers do not teach 
assume a value of a neuron in the selected layer that is less than the threshold value of the selected layer is 0. 10  
Kubo teaches  
a value of a neuron [activation probability] in the selected layer that is less than the threshold value of the selected layer is 0 [claims 1, 10 and 12, “deactivate a first node of a first layer of the plurality of internal layers … to deactivate the first node is based at least partly on the activation probability satisfying a criterion, wherein the criterion comprises a value of the activation probability falling below a threshold … wherein deactivating at least the first node of the artificial neural network comprises multiplying a value of the first node by zero”]. 
 Claim 15 is rejected using the same rationale as claim 3.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Brothers et al. in view of Kubo et al. and further in view of Pang et al. (US Pub. 2015/0326695).10
As per claim 16, Ji, Brothers and Kubo teach the device of claim 15.
Ji further teaches 
the controller further controls the threshold adjusting circuit and the weight adjusting circuit to operate when a pruning ratio of the neural network is “…” [paragraphs 0026-0028 disclose the process of calculating and comparing the percentage of pruned weights (pruning ratio) to a corresponding range or value, when the pruning ratio is within a desired value, the pruning process is done for the present selected layer, and the next layer of the network will be processed with the similar steps, wherein, the similar steps are the steps described in paragraphs 0021-0024, “the layer of the neural network is pruned using the threshold. For example, the threshold is used to set some weights of the layer to be zero … If the pruning error has not reached the pruning error allowance, the threshold is changed … after the threshold is changed in 108, the process repeats by pruning the layer (adjusting the weights by setting some of the weights to zero based on the adjusted threshold); It can be seen that when the pruning ratio is within a desired value or range, the system selects a next layer for pruning, thus, repeating the similar processes of adjusting the threshold and adjusting the weights … (controls the threshold adjusting circuit and the weight adjusting circuit to operate)”]; Ji teaches the percentage of pruned weights which is the number of weights that are pruned compare to the total number of weight in the selected layer. 
Kubo teaches 
the pruning ratio being equal to a ratio of a number of neurons in the neural network having a value of 0 to a total number of neurons in the neural 15 network [Col. 5, lines 48-62 and Col. 6, lines 3-4, “The computing system 500 may determine activation probabilities for each of nodes 142-188 … For example, the hyper parameters may be selected such that the activation probabilities of about 33% of the nodes of the internal hidden layers, including nodes 142, 146, 162, 188, converge on 0.0 over the course of training. The activation probabilities for the remaining hidden layer nodes, including nodes 144, 148, 164, 166, 168, 182, 184, and 186, may then converge on 1.0 over the course of training … The computing system 500 may generate a compact NN 150 by removing nodes 142, 146, 162, 188 from the NN 100.”; It can be seen that the pruning ratio is 33% which is the ratio of 4 removed nodes to the total of 12 nodes (142-188)].  
Ji, Brothers and Kubo do not teach
controls the threshold adjusting circuit and the weight adjusting circuit to operate when a pruning ratio of the neural network is less than a target value (emphasis added);
Pang teaches
controls the threshold adjusting circuit and the weight adjusting circuit to operate when a pruning ratio of the neural network is less than a target value [paragraph 0108, “if the terminal detects that compression rates of several consecutive compressed packets are less than a set threshold, the terminal stops performing compression”; since Ji teaches when the pruning ratio is within a desired value or range, the system selects a next layer for pruning, thus, repeating the similar processes of adjusting the threshold and adjusting the weights … (controls the threshold adjusting circuit and the weight adjusting circuit to operate), Ji does not explicitly teach when the pruning ratio is less than a value, stop processing the current layer and select the next layer to process, while Pang teaches when the compression rate is less than a value, stop performing compression, therefore, the combination of Ji and Pang read on the claim limitation].
Claim 16 is rejected using the same rationale as claim 4.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Brothers et al. and further in view of Li et al. (Pruning Filters for Efficient ConvNets).
As per claim 17, Ji and Brothers teach the device of claim 12.
Ji teaches 
the threshold adjusting circuit selects a layer among the plurality of layers [paragraph 0063 discloses the processor can be a combination of the devices comprising a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit, a microcontroller, a programmable logic device, discrete circuits, and paragraph 0006 recites “a processor coupled to the memory and
configured to: prune a layer of a neural network having multiple layers using a threshold”; It can be seen that the processor which comprising a circuit (threshold adjusting circuit in this case) to select and prune a layer of a neural network having multiple layers];
Ji and Brothers do not teach
the threshold adjusting circuit selects a layer among the plurality of layers according to a reduction rate of accuracy of the selected layer and a reduction rate of computational 20 complexity of the selected layer (emphasis added).  
Li teaches
[Fig 2, table 2, page 7, the first layer is robust to pruning as compared to the next few layers … Even when 80% of the filters from the first layer are pruned, the number of remaining filters (12) is still larger than the number of raw input channels. However, when removing 80% filters from the second layer, the layer corresponds to a 64 to 12 mapping, which may lose significant information from previous layers, thereby hurting the accuracy. With 50% of the filters being pruned in layer 1 and from 8 to 13, we achieve 34% FLOP reduction for the same accuracy (selecting the layer(s) to prune based on the percent of FLOP and accuracy reduction)].
Claim 17 is rejected using the same rationale as claim 6.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Ji et al. in view of Brothers et al. and further in view of Yao et al. (US Pub. 2019/0188567).
As per claim 18, Ji and Brothers teach the device of claim 12.
Ji and Brothers do not explicitly teach
the weight adjusting circuit calculates a loss function and adjusts the plurality of weights of the plurality of synapses with a plurality of partial differential equations of the loss 25 function relative to the plurality of synapses, respectively.
Yao teaches
calculates a loss function [paragraphs 0065-0066, “the current DNN model may be applied to the mini-batch and the result from the output layer may be compared to the known ground truth output for the mini-batch to determine the network loss … backward propagation may be applied to the current DNN model and, based on known node outputs, the loss function gradient may be determined”] and adjusts the plurality of weights of the plurality of synapses with a plurality of partial [paragraph 0069, equation 4, “the connection weight matrix for the current layer of the DNN model may be updated. The connection weight matrix for the current layer may be updated using any suitable technique or techniques … each connection weight may be provided based on a positive learning rate parameter and the gradient of the loss function … each connection weight may be provided as a product of the positive learning rate parameter and a partial derivative of the loss function”; paragraph 0081, “updating the previous matrix of connection weights to a current matrix of connection weights based on the previous matrix of connection weights and the loss function gradient”].10  
Claim 18 is rejected using the same rationale as claim 8.

Allowable Subject Matter
Claim 7 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
Claim 7 is allowable for disclosing 
the method of claim 6, wherein the selected layer has a highest enhancement ratio among the plurality of layers, the enhancement 20 ratio of the selected layer being a reduction rate of computational complexity of the selected layer divided by a reduction rate of accuracy of the selected layer.  
The closet references found:  
Ji et al. (US Pub. 2018/0232640) discloses a method for pruning a layer of a neural network having multiple layers using a threshold, Ji in Fig. 6, paragraphs 0061-0062 further teaches the threshold for the layer is adjusted and the reduction rates of accuracy are calculated.

However, the prior art fail to teach “the selected layer has a highest enhancement ratio among the plurality of layers, the enhancement ratio of the selected layer being a reduction rate of computational complexity of the layer divided by a reduction rate of accuracy of the selected layer”.
Therefore the combination of features is considered to be allowable.

Prior Art

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Sun et al. (US Pub. 2018/0046915) describes a method for compressing dense neural networks into sparse neural networks while maintaining or even improving the accuracy of the neural networks after compression.
Ji et al. (US Pub. 2019/0050735) describes a method for pruning parameters of a neural network to reduce the computational load of the neural network.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103.  The examiner can normally be reached on M-F, 8 AM-5 PM, (CT).

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/T. N./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123