DETAILED ACTION
This action is in response to claims filed 23 November, 2020 for application 16455347 filed 27 June, 2019. Currently claims 1-7, 11-17, and 21-23 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04 January, 2021 has been entered.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim 1, 6, 7, 11, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yao et al. (US 20190197407) in view of Liu et al. (US 2017/0364799) further in view of Han et al. (EIE: Efficient Inference Engine on Compressed Deep Neural Network).

Regarding claim 1,
Yao teaches
A neural network processor (Abstract, “An apparatus and method are described for reducing the parameter density of a deep neural network (DNN).”, lines 1 – 2, and p. 1, “More particularly, the invention relates to an apparatus and method for reducing the parameter density of a deep neural network (DNN) (referred to herein as "DNN surgery").”, ¶ [0001], p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125])
comprising: 
a connection value generator circuit of a data modifier circuit(Fig. 15 and Fig. 16, and p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: the connection value generator is the ‘Layer-Wise Pruning’ module 1510 as well as the data modifier)
configured to: receive one or more groups of input data and one or more weight values (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501. In one embodiment, the pre-trained DNN model 1501 is initially generated by training a DNN architecture configuration (not shown) using training data 1502.”, ¶ [0126]: the training data 1502 is read as ‘one or more groups of input data’ and the pre-trained DNN model provides and teaches ‘one or more weight values’),
generate one or more connection values based on the one or more weight values (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501. In one embodiment, the pre-trained DNN model 1501 is initially generated by training a DNN architecture configuration (not shown) using training data 1502.”, ¶ [0126]: the connection values are the weights between the nodes of successive layers of the neural network, the pre-trained DNN model teaches ‘generate one or more connection values based on the one or more weight values’);
and a pruning circuit of the data modifier circuit configured to modify the one or more groups of input data and the one or more weight values based on the connection values (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501.”, ¶ [0126], and p. 12, “Mathematically, one embodiment of the invention is designed to prune the connections in an arbitrary originally-dense DNN model ( e.g., a CNN or an RNN model) by setting most of its parameters ( e.g., the weights and biases) to zero in a progressive layer-by-layer manner.”, ¶ [0129], and p. 12, “In one embodiment of the invention, backward propagation approximation is also considered, i.e., the relation between input residual and the output residual,… the output y and input residual Δx’ can be approximated with the given input x and output residual Δy.”, ¶ [0131-0132]: the layer-wise pruning 

However, Yao discloses weights but does not explicitly disclose: generate one or more connection values based on the one or more weight values, wherein each of the connection values indicates whether one of the weight values satisfies a predetermined condition; and
Wherein the connection values are respectively generated based on a distance between input nodes corresponding to the one or more groups of input data;

Liu teaches: generate one or more connection values based on the one or more weight values, wherein each of the connection values indicates whether one of the weight values satisfies a predetermined condition (“In one embodiment, the simplifying module 160 includes a comparator circuit.  After retrieving the weights w corresponding to apart or all of the neuron connections in the original neural network 100, the simplifying module 160 utilizes the comparator circuit to judge whether the absolute value |w| of each retrieved weight w is lower than a threshold T. If an absolute value |w| is lower than the threshold T, the simplifying module 160 abandons the neuron connection corresponding to this weight w. The simplifying module 160 can record its decisions (i.e. whether a neuron connection is abandoned or kept) in the memory 150.  For example, for each neuron connection, the circuit designer can set a storage unit in the memory 150 for storing a flag.  The default status of the flag is a first status (e.g. binary 1).  After determining to abandon a neuron connection, the simplifying module 160 changes the flag of this neuron connection from the first status to a second status (e.g. binary 0).” [0030], note: the connection values are binary 1 or 0 depending on the weight being above or below a threshold respectively.)

Yao and Liu are both in the same field of endeavor of pruning and reducing a neural network and are analogous. Yao teaches an exemplary pruning method. Liu teaches connection values representing weights that are above or below a threshold. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the pruning method of Yao with the connection values as taught by Liu to yield predictable results. One would have been motivated to combine as the connection values taught by Liu allow the neural network to have a reduced memory size without losing significant accuracy [Liu 0033].

	Han teaches: Wherein the connection values are respectively generated based on a distance between input nodes corresponding to the one or more groups of input data (“Figure 12 shows the number of padding zeros with different number PEs. Padding zero occur when the jump between two consecutive non-zero element in the sparse matrix is larger than 16, the largest number that 4 bits can encode. Padding zeros are considered non-zero and lead to wasted computation. Using more PEs reduces padding zeros, because the distance between non-zero elements get smaller due to matrix partitioning, and 4-bits encoding a max distance of 16 will more likely be enough.” P251 §VII.B ¶4, Fig 3 note: relative indexing is interpreted as a connection value. Distance is defined by the spec as a difference between array indices.).

Yao, Liu and Han are all in the same field of endeavor of neural networks and are analogous. Yao teaches an exemplary pruning method. Liu teaches connection values representing weights that are above or below a threshold. Han teaches a connection value based on a distance between nodes. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the pruning method and connection values of Yao and Liu with the distance based connection values as taught by Han. One would have been motivated to combine as the distance based connection values of Han allow the instructions to be compressed and take up less space. (P251 §VII.B ¶4).


Regarding claim 6, the rejection of claim 1 is incorporated and further:
Yao further teaches:
further comprising a computing circuit configured to respectively multiply the modified groups of input data with the modified weight values to generate one or more groups of output data (p. 12, Equation 2 and 3 in ¶ [0130], where Tables 1 and 2 on p. 13 are algorithms for performing the pruning of the deep neural network, and Table 2 is pseudocode for the layer-wise pruning that results in the modified weight matrix,                         
                            
                                
                                    M
                                
                                ^
                            
                        
                    , that is matrix multiplied with the modified input,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    , in Equation 3 (p.12), to produce the output, y, in Equation 2 (p.12)).

Regarding claim 7, the rejection of claim 1 is incorporated and further:
Yao further teaches:
wherein the data modifier circuit is connected to a weight cache and a data cache (p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the data modifier, i.e. the deep neural network pruning system shown in Fig. 15, that is implemented as circuitry on a semiconductor chip such as within CPU, GPU or ASIC, as mentioned previously, is connected with the cache that may be partitioned in different partitions, data cache and weight cache, etc.), 
and wherein the data modifier circuit is configured to store the modified weight values and the modified input data respectively in the weight cache and the data cache (p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the data modifier, i.e. the deep neural network pruning system shown in Fig. 15, that is implemented as circuitry on a semiconductor chip such as within CPU, GPU or ASIC, as mentioned previously, is connected with the cache that may be partitioned in different partitions, data cache and weight cache, etc., where the modified weight values and the modified input data, during the pruning and training of the deep neural network, would need to be stored from one epoch of training to another until the pruning is complete. The storing of the modified weight values and the modified input data would take place in the cache that may be respectively partitioned into a weight cache and a data cache, as mentioned above.).

Regarding claim 11,
The claim is directed to a method for modifying data for neural networks reciting the claim limitations of claim 1. The claim is rejected by the same reasoning as rejected claim 1.

Regarding claim 16, the rejection of claim 11 is incorporated and further:
The claim recites the claim limitations of claim 6 and is rejected by the same reasoning of rejected claim 6.

Regarding claim 17, the rejection of claim 11 is incorporated and further:
The claim recites the claim limitations of claim 7 and is rejected by the same reasoning of rejected claim 7.

Claims 2 - 5, 12 - 15, and 21 - 23 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Liu and Han as applied to claim 1 and claim 11 above, in view of Ng, A. (Ng, A., “Sparse autoencoder”: hereinafter Ng).

Regarding claim 2, the rejection of claim 1 is incorporated and further:
Yao teaches training the deep neural network to achieve a highly-sparse, i.e. pruned, deep neural network using a modification of the backpropagation algorithm, namely, Joint Feed-forward and Backward Propagation Approximation (JFBPA).
However, Yao does not explicitly disclose the backpropagation process as is recited in the claim as follows:
further comprising a computing circuit configured to:
multiply one or more output gradients with the modified input data to generate one or more weight differences, and 
subtract the one or more weight differences from the one or more modified weight values to generate one or more updated weight values.
Ng discloses sparsifying, i.e. pruning, a special version of a neural network, namely an autoencoder neural network, teaching the backpropagation algorithm applied in further details as reproduced below (p. 8, Sec. “2.2 Backpropagation algorithm”).

    PNG
    media_image1.png
    1067
    1213
    media_image1.png
    Greyscale

Ng teaches
output gradients (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2))
p. 8, Step 4,

    PNG
    media_image3.png
    113
    513
    media_image3.png
    Greyscale

a-j- is the input data to the first layer or input to any node in the successive layers after the input layer and functions as well as the modified input data to the inner layers during training and is multiplied by δ, which was previously shown to be the one or more output gradients in the previous step to generate one or more weight differences, which is the partial derivative of the cost function J with respect to the weights of the network.)
subtract the one or more weight differences from the one or more modified weight values to generate one or more updated weight values (p. 7, Sec. “2.2 Backpropagation algorithm”

    PNG
    media_image4.png
    283
    1148
    media_image4.png
    Greyscale

The one or more weight differences, i.e. the partial derivative with the respect to the weight parameters in the above equation, is subtracted from the one or more modified weight values, W in the right hand side of the above equation, to generate one or more updated weight values, W in the left hand side of the above equation.)
It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu, and Han. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 3, the rejection of claim 2 is incorporated and further:
Yao further teaches 
wherein the data modifier circuit (Fig. 15) is connected to a memory (Fig. 2, block 218 “Embedded memory module”) and a direct memory access (DMA) module (p. 2, “Memory device 120 can be a dynamic random access memory (DRAM) device,…”, ¶ [0033]: the pruning of the deep neural network, as in Fig. 15, i.e. the data modifier, is implemented in the CPU or GPU, exclusively and is connected to a dynamic random access memory (DRAM) device, i.e. a direct memory access (DMA) module),
wherein the data modifier circuit is configured to store … and the modified input data in the memory (p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: with the implementations of the DNN surgery neural network processor in CPU, GPU or ASIC, the intermediary values of the modified 
wherein the computing unit is further configured to store … and the modified input data respectively in a weight cache and a data cache (p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the cache associated with the computing unit may be partitioned in different partitions, data cache and weight cache, etc., where the modified input data, during the pruning of the deep neural network, may be stored in the data cache.).
Yao does not explicitly disclose the one or more output gradients and storing them.
Ng teaches the one or more output gradients (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2)).

It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation steps as taught in Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han and compute the output gradients, as in Ng, and have the data modifier configured to store the one or more output gradients and Yao. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 4, the rejection of claim 2 is incorporated and further:
Yao teaches training the deep neural network to achieve a highly-sparse, i.e. pruned, deep neural network using a modification of the backpropagation algorithm, namely, Joint Feed-forward and Backward Propagation Approximation (JFBPA).
However, Yao does not explicitly disclose the backpropagation process as is recited in the claim as follows:
wherein the computing circuit is further configured to: 
multiply the one or more output gradients with the one or more modified weight values to generate one or more multiplication results, 
add the one or more multiplication results to generate an intermediate sum, 
multiply the intermediate sum with a learning rate to generate an intermediate multiplication result, 
and apply a derivative of an activation function to the intermediate multiplication result to generate one or more input gradients.
Ng teaches
p. 8, Sec. “2.2 Backpropagation algorithm”, Step 3

    PNG
    media_image5.png
    226
    723
    media_image5.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient, here the δ, i.e. output gradient, from Step 2, is multiplied with the one or more modified weights of the deep neural network undergoing pruning),
add the one or more multiplication results to generate an intermediate sum (p. 8, Sec. “2.2 Backpropagation algorithm”, Step 3

    PNG
    media_image5.png
    226
    723
    media_image5.png
    Greyscale

: during an epoch of training the summation is an intermediate sum),
multiply the intermediate sum with a learning rate to generate an intermediate multiplication result (p. 8, Step 4,

    PNG
    media_image3.png
    113
    513
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    283
    1148
    media_image4.png
    Greyscale
: during a training epoch, the partial derivative of the cost function with respect to the weights of the neural network, which is a reflection of the intermediate sum from the equation above, merely scaled version, is multiplied with the learning rate α in the update equation above),
and apply a derivative of an activation function to the intermediate multiplication result to generate one or more input gradients (p. 8, Sec. “2.2 Backpropagation algorithm”, Step 3

    PNG
    media_image5.png
    226
    723
    media_image5.png
    Greyscale


    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale
: the derivative of an activation function, i.e. f’(), in the above equation is applied, read as multiplied, to the intermediate multiplication result to generate one or more input/output gradients).

Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 5, the rejection of claim 4 is incorporated and further:
Yao further teaches 
wherein the data modifier circuit (Fig. 15) is connected to a memory (Fig. 2, block 218 “Embedded memory module”) and a direct memory access (DMA) module (p. 2, “Memory device 120 can be a dynamic random access memory (DRAM) device,…”, ¶ [0033]: the pruning of the deep neural network, as in Fig. 15, i.e. the data modifier, is implemented in the CPU or GPU, exclusively and is connected to a dynamic random access memory (DRAM) device, i.e. a direct memory access (DMA) module),
wherein the data modifier circuit is configured to store the modified weight values … in the memory (p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: with the implementations of the DNN surgery neural network processor in CPU, GPU or ASIC, the intermediary values of the modified parameters and modified input data within the neural network are stored in memory teaching ‘to store the modified weight values … in the memory’), and 
wherein the computing circuit is further configured to: 
read the modified weight values … from the memory in response to an instruction received from a controller (Fig. 1, instruction set, 109, in processor cores, 107, and Fig. 6, Instruction Cache, 606, storing instructions to read from Cache, 104, and Data Cache, 612, through the operations directed by the Memory Controller Hub, 116, in Fig. 1),
and store … the weight values respectively in a weight cache and a data cache (p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the cache associated with the computing unit may be partitioned in different partitions, data cache and weight cache, etc., where the weight values, during the pruning of the deep neural network, may be stored in the data cache.).
Yao does not explicitly disclose the one or more output gradients, reading and storing them.
Ng teaches the one or more output gradients (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2)).
It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation steps as taught in Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han and compute the output gradients, as in Ng, and have the data modifier configured to store the one or more output gradients in memory and have the computing unit configured to read from memory in response to an instruction received from a controller and store the one or more output gradients in a weight cache, as in Yao. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 12, the rejection of claim 11 is incorporated and further:
The claim recites the claim limitations of claim 2 and is rejected by the same reasoning of rejected claim 2.

Regarding claim 13, the rejection of claim 12 is incorporated and further:


Regarding claim 14, the rejection of claim 12 is incorporated and further:
The claim recites the claim limitations of claim 4 and is rejected by the same reasoning of rejected claim 4.

Regarding claim 15, the rejection of claim 14 is incorporated and further:
The claim recites the claim limitations of claim 5 and is rejected by the same reasoning of rejected claim 5.

Regarding claim 21,
Yao teaches
A neural network processor for training neural networks (Abstract, “An apparatus and method are described for reducing the parameter density of a deep neural network (DNN).”, lines 1 – 2, and p. 1, “More particularly, the invention relates to an apparatus and method for reducing the parameter density of a deep neural network (DNN) (referred to herein as "DNN surgery").”, ¶ [0001], p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125])

a connection value generator circuit of a data modifier circuit (Fig. 15 and Fig. 16, and p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: the connection value generator is the ‘Layer-Wise Pruning’ module 1510 as well as the data modifier)
configured to: receive one or more groups of input data and one or more weight values at one or more input nodes of a current layer (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501. In one embodiment, the pre-trained DNN model 1501 is initially generated by training a DNN architecture configuration (not shown) using training data 1502.”, ¶ [0126]: the training data 1502 is read as ‘one or more groups of input data’ and the pre-trained DNN model provides and teaches ‘one or more weight values’),
generate one or more connection values based on the one or more weight values (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501. In one embodiment, the pre-trained DNN model 1501 is initially generated by training a DNN architecture configuration (not shown) using training data 1502.”, ¶ [0126]: the connection values are the weights between the nodes of successive 
and a pruning circuit of the data modifier circuit configured to modify the one or more groups of input data and the one or more weight values based on the connection values (p. 11, “Briefly, in one embodiment, the layer-wise pruning module 1510 performs layer-wise pruning on a pre-trained, originally dense reference DNN model 1501.”, ¶ [0126], and p. 12, “Mathematically, one embodiment of the invention is designed to prune the connections in an arbitrary originally-dense DNN model ( e.g., a CNN or an RNN model) by setting most of its parameters ( e.g., the weights and biases) to zero in a progressive layer-by-layer manner.”, ¶ [0129], and p. 12, “In one embodiment of the invention, backward propagation approximation is also considered, i.e., the relation between input residual and the output residual,… the output y and input residual Δx’ can be approximated with the given input x and output residual Δy.”, ¶ [0131-0132]: the layer-wise pruning module 1510 of the data modifier, the system taught in Fig. 15, performs the pruning by analyzing the current dense deep neural network, that requires the step of using an approximated input residual Δx teaches ‘configured to modify the one or more groups of input data’ and the pruning leads to modifying ‘the one or more weight values’ based on the connection values).
and a computing circuit configured to:
update the one or more modified weight values of the current layer based on the input data …and respectively multiply the input data with the modified weight values to generate one or more groups of output data at the one or more output nodes of the p. 12, Equation 2 and 3 in ¶ [0130], where Tables 1 and 2 on p. 13 are algorithms for performing the pruning of the deep neural network, and Table 2 is pseudocode for the layer-wise pruning that results in the modified weight matrix,                         
                            
                                
                                    M
                                
                                ^
                            
                        
                    , that is matrix multiplied with the modified input,                         
                            
                                
                                    x
                                
                                ^
                            
                        
                    , in Equation 3 (p.12), to produce the output, y, in Equation 2 (p.12)).

Yao does not explicitly disclose 
wherein each of the connection values indicates whether one of the weight values satisfies a predetermined condition.

…and one or more output gradients received at one or more output nodes of the current layer from a next layer, 
calculate one or more input gradients of the current layer based on the modified weight values and the one or more output gradients,..

Liu teaches: generate one or more connection values based on the one or more weight values, wherein each of the connection values indicates whether one of the weight values satisfies a predetermined condition (“In one embodiment, the simplifying module 160 includes a comparator circuit.  After retrieving the weights w corresponding to apart or all of the neuron connections in the original neural network 100, the simplifying module 160 utilizes the comparator circuit to judge whether the absolute value |w| of each retrieved weight w is lower than a threshold T. If an absolute value |w| is lower than the threshold T, the simplifying module 160 abandons the neuron connection corresponding to this weight w. The simplifying module 160 can record its decisions (i.e. whether a neuron connection is abandoned or kept) in the memory 150.  For example, for each neuron connection, the circuit designer can set a storage unit in the memory 150 for storing a flag.  The default status of the flag is a first status (e.g. binary 1).  After determining to abandon a neuron connection, the simplifying module 160 changes the flag of this neuron connection from the first status to a second status (e.g. binary 0).” [0030], note: the connection values are binary 1 or 0 depending on the weight being above or below a threshold respectively.)

Yao and Liu are both in the same field of endeavor of pruning and reducing a neural network and are analogous. Yao teaches an exemplary pruning method. Liu teaches connection values representing weights that are above or below a threshold. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the pruning method of Yao with the connection values as taught by Liu to yield predictable results. One would have been motivated to combine as the connection values taught by Liu allow the neural network to have a reduced memory size without losing significant accuracy [Liu 0033].

	Han teaches: Wherein the connection values are respectively generated based on a distance between input nodes corresponding to the one or more groups of input data (“Figure 12 shows the number of padding zeros with different number PEs. Padding zero occur when the jump between two consecutive non-zero element in the sparse matrix is larger than 16, the largest number that 4 bits can encode. Padding zeros are considered non-zero and lead to wasted computation. Using more PEs reduces padding zeros, because the distance between non-zero elements get smaller due to matrix partitioning, and 4-bits encoding a max distance of 16 will more likely be enough.” P251 §VII.B ¶4, Fig 3 note: relative indexing is interpreted as a connection value. Distance is defined by the spec as a difference between array indices.).

Yao, Liu and Han are all in the same field of endeavor of neural networks and are analogous. Yao teaches an exemplary pruning method. Liu teaches connection values representing weights that are above or below a threshold. Han teaches a connection value based on a distance between nodes. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the pruning method and connection values of Yao and Liu with the distance based connection values as taught by Han. One would have been motivated to combine as the distance based connection values of Han allow the instructions to be compressed and take up less space. (P251 §VII.B ¶4).


Ng teaches the backpropagation algorithm with one or more output gradients received at one or more output nodes of the current layer from a next layer, calculate one or more input gradients of the current layer based on the modified weight values and the one or more output gradients,.. (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2), the superscript above δ is the layer number in the deep neural network see the backpropagation algorithm on p. 8).

It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation steps as taught in Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 22, the rejection of claim 21 is incorporated and further:
Yao further teaches 
wherein the data modifier circuit (Fig. 15) is connected to a memory (Fig. 2, block 218 “Embedded memory module”) and a direct memory access (DMA) module (p. 2, “Memory device 120 can be a dynamic random access memory (DRAM) device,…”, ¶ [0033]: the pruning of the deep neural network, as in Fig. 15, i.e. the data 
wherein the data modifier circuit is configured to store … and the modified input data in the memory (p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: with the implementations of the DNN surgery neural network processor in CPU, GPU or ASIC, the intermediary values of the modified parameters and modified input data within the neural network are stored in memory teaching ‘the data modifier is configured to store … and the modified input data in the memory’),
and
wherein the computing circuit is further configured to store … and the modified input data respectively in a weight cache and a data cache (p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the cache associated with the computing unit may be partitioned in different partitions, data cache and weight cache, etc., where the modified input data, during the pruning of the deep neural network, may be stored in the data cache.).
Yao does not explicitly disclose the one or more output gradients received at the one or more output nodes from the next layer and storing them.
Ng teaches the one or more output gradients received at the one or more output nodes from the next layer (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2), and the superscript above δ is the layer number in the deep neural network, see the backpropagation algorithm on p. 8).

It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation steps as taught in Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han and compute the output gradients, as in Ng, and have the data modifier configured to store the one or more output gradients and have the computing unit configured to store the one or more output gradients in a weight cache, as in Yao. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).

Regarding claim 23, the rejection of claim 21 is incorporated and further:
Yao further teaches 
wherein the data modifier circuit (Fig. 15) is connected to a memory (Fig. 2, block 218 “Embedded memory module”) and a direct memory access (DMA) module (p. 2, “Memory device 120 can be a dynamic random access memory (DRAM) device,…”, ¶ [0033]: the pruning of the deep neural network, as in Fig. 15, i.e. the data modifier, is implemented in the CPU or GPU, exclusively and is connected to a dynamic random access memory (DRAM) device, i.e. a direct memory access (DMA) module),
wherein the data modifier circuit is configured to store the modified weight values … in the memory (p. 11, “One embodiment of a DNN Surgery architecture for reducing the density of a DNN is illustrated in FIG. 15. In one embodiment, the layer-wise pruning module 1510 and retraining module 1520 are implemented as circuitry on a semiconductor chip such as within CPU or GPU or an application-specific integrated circuit (ASIC).”, ¶ [0125]: with the implementations of the DNN surgery neural network processor in CPU, GPU or ASIC, the intermediary values of the modified parameters and modified input data within the neural network are stored in memory teaching ‘to store the modified weight values … in the memory’),
 and wherein the computing circuit is further configured to: 
read the modified weight values … from the memory in response to an instruction received from a controller (Fig. 1, instruction set, 109, in processor cores, 107, and Fig. 6, Instruction Cache, 606, storing instructions to read from Cache, 104, and Data Cache, 612, through the operations directed by the Memory Controller Hub, 116, in Fig. 1),
p. 7, “The cache can be configured as a data cache, an instruction cache, or a single cache that is partitioned to contain data and instructions in different partitions.”, ¶ [0083]: the cache associated with the computing unit may be partitioned in different partitions, data cache and weight cache, etc., where the weight values, during the pruning of the deep neural network, may be stored in the data cache.).
Yao does not explicitly disclose the one or more output gradients, reading and storing them.
Ng teaches the one or more output gradients (

    PNG
    media_image2.png
    120
    910
    media_image2.png
    Greyscale

Where δ equals the partial derivative with respect to z-i-,with z-I- being the output of the neuron. Therefore the partial derivative with respect to z is the output gradient (p. 8, Step 2), and the superscript above δ is the layer number in the deep neural network, see the backpropagation algorithm on p. 8).

It would have been obvious to one having ordinary skill in the art, before the effective filing date of the claimed invention, to use the backpropagation algorithm derivation steps as taught in Ng, into the pruning neural network processor and apparatus as taught by Yao, Liu and Han and compute the output gradients, as in Ng, and have the data modifier configured to store the one or more output gradients in memory and have the computing unit configured to read from memory in response to an instruction received from a controller and store the one or more output gradients in a Yao. The motivation behind using the backpropagation algorithm derivation as taught in Ng is that it offers an efficient way to compute the partial derivatives employed in the updates of the weight parameters using gradient descent (p. 7, Sec. “2.2 Backpropagation algorithm”, “We will now describe the backpropagation algorithm, which gives an efficient way to compute these partial derivatives.”, ¶ 3).


Response to Arguments
Applicant's arguments filed 23 November, 2020 have been fully considered but they are not persuasive. Applicant argues that the amendments are not taught by the current references. Han et al. (Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding) has been replaced by Han et al. (EIE: Efficient inference engine on compressed deep neural network). As the first reference has been entirely removed from the rejection, the second reference is referred to as “Han” throughout this rejection. Han discloses connection values generated based on a distance and input. Please see the above rejection for details.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC NILSSON whose telephone number is (571)272-5246.  The examiner can normally be reached on M-F: 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ERIC NILSSON/Primary Examiner, Art Unit 2122