DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed 10/15/2021 has been entered. Claims 1-11, 13-18 and 20-22 remain pending in the application. Applicant’s amendments to the claims have overcome the objection previously set forth in the Non-Final Office Action mail 07/15/2021.

Response to Arguments
Applicant’s arguments, filed 10/15/2021, with respect to the rejections of claims 1 and 14 under 103 have been fully considered and are persuasive because of the amendments. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Chai et al. (US Pub. 2020/0134461) in view of Lowell et al. (US Pub. 2019/0188557) and further in view of Burger et al. (US Pub. 2019/0340492).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 8-11, 13-16, 18 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Chai et al. (US Pub. 2020/0134461) in view of Lowell et al. (US Pub. 2019/0188557) and further in view of Burger et al. (US Pub. 2019/0340492).
As per claim 1, Chai teaches a method for transforming a neural network that uses floating point values into a neural network that uses quantized values [abstract, “Techniques are disclosed for training a deep neural network (DNN) for reduced computational resource requirements”; paragraph 0048, “The techniques here may generalize and encompass prior work that uses two stages (learning of DNN parameters and compression) as above. Other approaches to address memory size (including compression, quantization and approximation of DNNs)”], the method comprising: 
receiving a definition of a floating-point neural network comprising a set of floating-point weight coefficients [paragraph 0008, “method of training a DNN for reduced computing resource requirements, the method comprising: storing a set of weights of the DNN … the set of weights includes weights of the layer and the set of bit precision values”; paragraph 0057, “Techniques here may use a range preserving linear transform that uniformly discretizes the range into fixed steps б. Concretely, let W(l) denote the floating point representation of high-precision weights 114 for layer l and let W’(l) be the quantized version of the weights of layer l using b bits. A quantization using b bits is, 

    PNG
    media_image1.png
    133
    319
    media_image1.png
    Greyscale
, (α, β and б are representing as weight coefficients)”], wherein intermediate activation values of the floating-point neural network are floating-point values [Fig. 3, paragraph 0090, “machine learning system 104 may use the set of high-precision weights as weights of inputs of neurons in DNN 106 to calculate a first output data set based on a first input data set … according to Equation (1) … The first output data set may be the output y of the output layer 108N of DNN 106”; since W (equation 1) indicating the floating point representation of high-precision weights, thus the output y comprises the floating-point values]; 
training the floating-point neural network to be a quantized neural network comprising a set of ternary weight coefficients [paragraph 0061, “machine learning system 104 may select, during training of DNN 106, a quantization function that best preserves the encoded distribution of the high-precision weights 114”; paragraph 0067, “Quantization maps ranges of values into single values referred to as "bins."”; paragraph 0078, “consider two bins at { 0, 1, -1} (i.e., +-1)”], the training comprising: 
propagating a set of inputs through the neural network to generate a set of outputs using an approximate quantization function, that is a differentiable approximation to a step-wise quantization function [paragraph 0061, “machine learning system 104 may select, during training of DNN 106, a quantization function that best preserves the encoded distribution of the high-precision weights 114, even if the quantization is non-differentiable. For example, machine learning system 104 may use the following quantization function:

    PNG
    media_image2.png
    29
    217
    media_image2.png
    Greyscale
”; paragraph 0034, “as part of performing the training process, machine learning system 104 may perform a feed-forward phase in which machine learning system 104 uses high-precision weights 114 in DNN 106 to determine output data 112 based on input data in input data set 110”]; and 
performing a backpropagation operation that (i) backpropagates through the network a loss function value calculated based on the generated set of outputs to determine, a rate of change in the calculated loss function value relative to a rate of change in the weight coefficient [paragraph 0034, “machine learning system 104 may perform a backpropagation method that calculates a gradient of a loss function. The loss function produces a cost value based on the output data. In accordance with a technique of this disclosure, machine learning system 104 may then update high-precision weights 114, low-precision weights 116, and bit precision values 118 based on the gradient of the loss function”; paragraph 0070, “layer-wise quantization is adopted to learn one w(l) and b(l) for each layer l of the CNN. However, the loss function l(W) is not continuous and differentiable over the range of parameter values”; paragraph 0078, “The loss function of Equation (20) encourages small and/or large updates to W and discourages moderate sized updates. For the purpose of exposition, consider two bins at {0, 1, -1} (i.e., +- 1) and consider some weight that is equal to zero. A standard gradient descent may update the weight in any direction, here, layer-wise quantization is representing as step-wise quantization, quantization function is representing as approximate quantization function, quantization with high-precision distribution is representing as quantization of differentiable approximation, and the loss function differentiable over the range of parameter is representing as loss function relative to change of weight coefficient”];
	Chai does not teach
intermediate activation values of the quantized neural network are quantized to a particular range; 
the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values;
for each of at least a subset of the weight coefficients, a rate of change in the calculated loss function value relative to a rate of change in the weight coefficient (emphasis added);
performing a backpropagation operation that … ii) adjusts the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate (emphasis added).
Lowell teaches
performing a backpropagation operation that (i) backpropagates through the network a loss function value calculated based on the generated set of outputs to determine, for each of at least a subset of the weight coefficients, a rate of change in the calculated loss function value relative to a rate of change in the weight coefficient [paragraph 0017, “calculate a distribution of link weights for each of a plurality of subsets of layers of the ANN; select a quantization function to the plurality of link weights for each of the plurality of subsets of layers of the ANN based on each distribution; and apply the respective quantization function to the link weights for each of the plurality of subsets of layers”; paragraph 0052, “ANN training method 900 which includes dynamic quantization of the link weights in ANN 300 on a per-layer-subset basis”, here, weights in neural network on a per-layer-subset basis is representing as a subset of the weight coefficients];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of performing operation for each of at least a subset of the weight coefficients to calculate the rate of change in the weight of Lowell. Doing so would help increasing the effectiveness of the link weight quantization in ANN 300 as compared to determining a single quantization function for all link weights, with less complexity than performing quantization per-layer (Lowell, 0043).
Chai and Lowell do not explicitly teach

the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values;
performing a backpropagation operation that … ii) adjusts the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate (emphasis added).
Burger teaches
intermediate activation values of the quantized neural network are quantized to a particular range [paragraph 0051, “a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization”; paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”; paragraph 0090, “A number of different quantization parameters can be selected. These include bit widths for node weights, for example 3, 4, or 5 bits for values; bit widths for input or activation values for a neural network, for example 3, 4, or 5 bits for representing values”; paragraph 0106, “The normal-precision neural network model can be quantize by, for example, converting tensors representing weights, activation values, biases, or other neural network values for tensors to a block floating point format. For example, values in a 16- or 32-bit floating point format tensor can be converted to a three-, four-, five-, six-bit”]; 
the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values [paragraph 0043, “The quantization-enabled system 110 further includes a quantization emulator 140. The quantization emulator 140 provides functionality that can be used to convert data represented in full precision floating-point formats in the normal-precision neural network module 130 into quantized format values”; paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”];
[paragraph 0061, “Neural networks can be trained and retrained by adjusting constituent values of the activation function. For example, by adjusting weights w; or bias values b; for a node, the behavior of the neural network is adjusted by corresponding changes in the networks output tensor values. For example, a cost function C( w, b) can be used to find suitable weights and biases for the network”, here, adjusting weight by corresponding changes in tensor values is representing as adjust weight coefficient based on the determined rate].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of quantizing the intermediate activation values of the quantized neural network to a particular range, and adjusting the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate of Burger. Doing so would help improving performance and reducing computational resource usage by reducing bit widths used to represent values in one or more tensors of the quantized neural network (Burger, 0118).

As per claim 2, Chai, Lowell and Burger teach the method of claim 1.
Chai further teaches
the approximate quantization function is defined in terms of a temperature hyperparameter that determines a degree to which the approximate quantization function differs from the step-wise quantization function [paragraph 0035, “hyperparameters 120 may include a hyperparameter (denoted .A.2 in this disclosure) that controls a severity of a bit precision penalty term in the loss function. The bit precision values 118 (i.e., the number of bits used in low-precision weights 116 for each layer 108 of DNN 106} may be based on the value of the bit precision penalty term. Thus, different values of the hyperparameter may result in a loss function that penalize to different degrees cases where DNN 106 uses weights with higher bit precisions", here, hyperparameter is representing as temperature hyperparameter, and different degrees of low-precision and high-precision is representing as degree of approximate quantization and step-wise quantization”].  

As per claim 3, Chai, Lowell and Burger teach the method of claim 2.
Chai further teaches
the temperature hyperparameter defines noise associated with the approximate quantization function [paragraph 0126, “an approach shown to encourage robust learning with few probability distributions for the noise. The noise incorporated in to the gradients of BitNet is similar to a disjoint set of gaussians with equally spaced centers determined by W and b, whose number is determined by b and variance by the range of values in W", here, noise incorporated in to the gradients of BitNet is representing noise associated with the approximate quantization function”].  

As per claim 8, Chai, Lowell and Burger teach the method of claim 1.
Chai further teaches
the quantized neural network PERC.P0094-- 3 --PE001-0060-US-03is for execution by a neural network inference circuit, the method further comprising generating a set of program instructions for executing the quantized neural network on the neural network inference circuit [paragraph 0077, “machine learning system 104 may be configured to determine a set of quantized values for the layer by rounding values produced by applying a quantization function (e.g., v1 + v2 log2 |w|) to weights in the second set of weights (i.e., high-precision weights 114) that are associated with the layer"; paragraph 0130, "This disclosure uses the term "system architecture" to mean a set of the processing hardware (and associated software stack) to train one or more DNNs and/or execute the one or more DNNs in inference mode"; Claim 11, "processing circuitry for executing a machine learning system configured to train the DNN, wherein training the DNN comprises optimizing the set of weights and the set of bit precision values", here, processor with memory includes a set of program instructions to execute neural network inference circuit, and a set of quantized value produced by quantization function is representing as floating-point neural network produces using quantized values”].  

As per claim 9, Chai, Lowell and Burger teach the method of claim 8.
Burger further teaches
the quantized intermediate activation values are one of 8-bit values and 4-bit values [paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”; paragraph 0090, “A number of different quantization parameters can be selected. These include bit widths for node weights, for example 3, 4, or 5 bits for values; bit widths for input or activation values for a neural network, for example 3, 4, or 5 bits for representing values”; paragraph 0106, “The normal-precision neural network model can be quantize by, for example, converting tensors representing weights, activation values, biases, or other neural network values for tensors to a block floating point format. For example, values in a 16- or 32-bit floating point format tensor can be converted to a three-, four-, five-, six-bit”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of quantizing the intermediate activation values of the quantized neural network to one of 8-bit values and 4-bit values of Burger. Doing so would help improving performance and reducing computational resource usage by reducing bit widths used to represent values in one or more tensors of the quantized neural network (Burger, 0118).

As per claim 10, Chai, Lowell and Burger teach the method of claim 1.
Burger further teaches
[paragraph 0097, “a set of input tensors and corresponding output tensors generated by the trained neural network are stored”; wherein, paragraph 0106, “tensors representing weights, activation values, biases, or other neural network values; paragraph 0076, “the quantized numbers are stored in the quantized precision format and subsequent operations are emulated using, for example, block floating point representations”; paragraph 0098, “The quantized precision neural network model 940 can be stored in a block floating-point format representation. For example, the quantized neural network model 940 can be generated by converting weights, activation values, biases, and other values of the normal precision neural network model W 910 to a block floating point format”; paragraph 0033, “the term block floating-point (BFP) means a number system in which a single exponent is shared across two or more values, each of which is represented by a sign and mantissa pair”].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of storing the floating-point activation values using a variable position of a binary point used to represent the floating-point activation value of Burger. Doing so would help comparing an expected output for the neural network with the output that is produced using the quantized model, and saving the storage space (Burger, 0077).

As per claim 11, Chai, Lowell and Burger teach the method of claim 10.
Burger teaches
the intermediate activation values that are quantized [paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”];
Chai further teaches
[paragraph 0041, “bit-level precision of parameters has been explored under post processing for the purpose of compression of DNNs. The state-of-the-art requires either the bit-level precision to be fixed beforehand or separates into two steps the learning of DNN parameters and the compression of the DNN parameters”, here, bit-level precision of parameters is representing as fixed binary point position].  

As per claim 13, Chai, Lowell and Burger teach the method of claim 1.
Chai further teaches
the step-wise quantization function receives values in a range of values and, for each value in the range of received values, outputs a particular value in a set of values that can be represented in a particular number of bits of information used by a neural network inference circuit used to implement the quantized neural network [paragraph 0034, "machine learning system 104 may perform a feed-forward phase in which machine learning system 104 uses high-precision weights 114 in DNN 106 to determine output data 112 based on input data in input data set 110"; paragraph 0070, "layer-wise quantization is adopted to learn one W(l) and b(l)for each layer l of the CNN. However, the loss function l(W) is not continuous and differentiable over the range of parameter values”; paragraph 0077, "machine learning system 104 may be configured to determine a set of quantized values for the layer by rounding values produced by applying a quantization function (e.g., v1 + v2 log2|w|) to weights in the second set of weights (i.e., high-precision weights 114} that are associated with the layer", here, feed-forward precision weights determines output data is representing as set of output values by neural network inference circuit].

As per claim 14, Chai teaches a non-transitory machine readable medium storing a program for execution by a set of processing units [paragraph 0171, “a computer-readable medium, such as a computer-readable storage medium, containing instructions. Instructions embedded or encoded in a computer readable storage medium may cause a programmable processor, or other processor, to perform”], the program for transforming a neural network that uses floating point values into a neural network that uses quantized values abstract, “Techniques are disclosed for training a deep neural network (DNN) for reduced computational resource requirements”; paragraph 0048, “The techniques here may generalize and encompass prior work that uses two stages (learning of DNN parameters and compression) as above. Other approaches to address memory size (including compression, quantization and approximation of DNNs)”, the program comprising sets of instructions for:
receiving a definition of a floating-point neural network comprising a set of floating-point weight coefficients [paragraph 0008, “method of training a DNN for reduced computing resource requirements, the method comprising: storing a set of weights of the DNN … the set of weights includes weights of the layer and the set of bit precision values”; paragraph 0057, “Techniques here may use a range preserving linear transform that uniformly discretizes the range into fixed steps б. Concretely, let W(l) denote the floating point representation of high-precision weights 114 for layer l and let W’(l) be the quantized version of the weights of layer l using b bits. A quantization using b bits is, 

    PNG
    media_image1.png
    133
    319
    media_image1.png
    Greyscale
, (α, β and б are representing as weight coefficients)”], wherein intermediate activation values of the floating-point neural network are floating-point values [Fig. 3, paragraph 0090, “machine learning system 104 may use the set of high-precision weights as weights of inputs of neurons in DNN 106 to calculate a first output data set based on a first input data set … according to Equation (1) … The first output data set may be the output y of the output layer 108N of DNN 106”; since W (equation 1) indicating the floating point representation of high-precision weights, thus the output y comprises the floating-point values]; 
training the floating-point neural network to be a quantized neural network comprising a set of ternary weight coefficients [paragraph 0061, “machine learning system 104 may select, during training of DNN 106, a quantization function that best preserves the encoded distribution of the high-precision weights 114”; paragraph 0067, “Quantization maps ranges of values into single values referred to as "bins."”; paragraph 0078, “consider two bins at { 0, 1, -1} (i.e., +-1)”], the training comprising: 
propagating a set of inputs through the neural network to generate a set of outputs using an approximate quantization function, that is a differentiable approximation to a step-wise quantization function [paragraph 0061, “machine learning system 104 may select, during training of DNN 106, a quantization function that best preserves the encoded distribution of the high-precision weights 114, even if the quantization is non-differentiable. For example, machine learning system 104 may use the following quantization function:

    PNG
    media_image2.png
    29
    217
    media_image2.png
    Greyscale
”; paragraph 0034, “as part of performing the training process, machine learning system 104 may perform a feed-forward phase in which machine learning system 104 uses high-precision weights 114 in DNN 106 to determine output data 112 based on input data in input data set 110”]; and 
performing a backpropagation operation that (i) backpropagates through the network a loss function value calculated based on the generated set of outputs to determine, a rate of change in the calculated loss function value relative to a rate of change in the weight coefficient [paragraph 0034, “machine learning system 104 may perform a backpropagation method that calculates a gradient of a loss function. The loss function produces a cost value based on the output data. In accordance with a technique of this disclosure, machine learning system 104 may then update high-precision weights 114, low-precision weights 116, and bit precision values 118 based on the gradient of the loss function”; paragraph 0070, “layer-wise quantization is adopted to learn one w(l) and b(l) for each layer l of the CNN. However, the loss function l(W) is not continuous and differentiable over the range of parameter values”; paragraph 0078, “The loss function of Equation (20) encourages small and/or large updates to W and discourages moderate sized updates. For the purpose of exposition, consider two bins at {0, 1, -1} (i.e., +- 1) and consider some weight that is equal to zero. A standard gradient descent may update the weight in any direction, here, layer-wise quantization is representing as step-wise quantization, quantization function is representing as approximate quantization function, quantization with high-precision distribution is representing as quantization of differentiable approximation, and the loss function differentiable over the range of parameter is representing as loss function relative to change of weight coefficient”];
	Chai does not teach
intermediate activation values of the quantized neural network are quantized to a particular range; 
the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values;
performing a backpropagation operation that (i) backpropagates through the network a loss function value calculated based on the generated set of outputs to determine, for each of at least a subset of the weight coefficients, a rate of change in the calculated loss function value relative to a rate of change in the weight coefficient (emphasis added);
performing a backpropagation operation that … ii) adjusts the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate (emphasis added).
Lowell teaches
[paragraph 0017, “calculate a distribution of link weights for each of a plurality of subsets of layers of the ANN; select a quantization function to the plurality of link weights for each of the plurality of subsets of layers of the ANN based on each distribution; and apply the respective quantization function to the link weights for each of the plurality of subsets of layers”; paragraph 0052, “ANN training method 900 which includes dynamic quantization of the link weights in ANN 300 on a per-layer-subset basis”, here, weights in neural network on a per-layer-subset basis is representing as a subset of the weight coefficients];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of performing operation for each of at least a subset of the weight coefficients to calculate the rate of change in the weight of Lowell. Doing so would help increasing the effectiveness of the link weight quantization in ANN 300 as compared to determining a single quantization function for all link weights, with less complexity than performing quantization per-layer (Lowell, 0043).
Chai and Lowell do not explicitly teach
intermediate activation values of the quantized neural network are quantized to a particular range; 
the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values;
performing a backpropagation operation that … ii) adjusts the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate (emphasis added).
Burger teaches
[paragraph 0051, “a portion of values representing the neural network can be received, including edge weights, activation values, or other suitable parameters for quantization”; paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”; paragraph 0090, “A number of different quantization parameters can be selected. These include bit widths for node weights, for example 3, 4, or 5 bits for values; bit widths for input or activation values for a neural network, for example 3, 4, or 5 bits for representing values”; paragraph 0106, “The normal-precision neural network model can be quantize by, for example, converting tensors representing weights, activation values, biases, or other neural network values for tensors to a block floating point format. For example, values in a 16- or 32-bit floating point format tensor can be converted to a three-, four-, five-, six-bit”]; 
the approximate quantization function is applied to intermediate activation values to approximate quantization of the intermediate activation values [paragraph 0043, “The quantization-enabled system 110 further includes a quantization emulator 140. The quantization emulator 140 provides functionality that can be used to convert data represented in full precision floating-point formats in the normal-precision neural network module 130 into quantized format values”; paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”];
performing a backpropagation operation that … ii) adjusts the weight coefficient based on the determined PERC.P0094-- 2 --PE001-0060-US-03rate [paragraph 0061, “Neural networks can be trained and retrained by adjusting constituent values of the activation function. For example, by adjusting weights w; or bias values b; for a node, the behavior of the neural network is adjusted by corresponding changes in the networks output tensor values. For example, a cost function C( w, b) can be used to find suitable weights and biases for the network”, here, adjusting weight by corresponding changes in tensor values is representing as adjust weight coefficient based on the determined rate].


As per claim 15, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 14.
Chai further teaches
the approximate quantization function is defined in terms of a temperature hyperparameter that determines a degree to which the approximate quantization function differs from the step-wise quantization function [paragraph 0035, “hyperparameters 120 may include a hyperparameter (denoted .A.2 in this disclosure) that controls a severity of a bit precision penalty term in the loss function. The bit precision values 118 (i.e., the number of bits used in low-precision weights 116 for each layer 108 of DNN 106} may be based on the value of the bit precision penalty term. Thus, different values of the hyperparameter may result in a loss function that penalize to different degrees cases where DNN 106 uses weights with higher bit precisions", here, hyperparameter is representing as temperature hyperparameter, and different degrees of low-precision and high-precision is representing as degree of approximate quantization and step-wise quantization”].  

As per claim 16, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 15.
Chai further teaches
[paragraph 0126, “an approach shown to encourage robust learning with few probability distributions for the noise. The noise incorporated in to the gradients of BitNet is similar to a disjoint set of gaussians with equally spaced centers determined by W and b, whose number is determined by b and variance by the range of values in W", here, noise incorporated in to the gradients of BitNet is representing noise associated with the approximate quantization function”].  

As per claim 18, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 14.
Chai further teaches
the quantized neural network PERC.P0094-- 3 --PE001-0060-US-03is for execution by a neural network inference circuit, the method further comprising generating a set of program instructions for executing the quantized neural network on the neural network inference circuit [paragraph 0077, “machine learning system 104 may be configured to determine a set of quantized values for the layer by rounding values produced by applying a quantization function (e.g., v1 + v2 log2 |w|) to weights in the second set of weights (i.e., high-precision weights 114) that are associated with the layer"; paragraph 0130, "This disclosure uses the term "system architecture" to mean a set of the processing hardware (and associated software stack) to train one or more DNNs and/or execute the one or more DNNs in inference mode"; Claim 11, "processing circuitry for executing a machine learning system configured to train the DNN, wherein training the DNN comprises optimizing the set of weights and the set of bit precision values", here, processor with memory includes a set of program instructions to execute neural network inference circuit, and a set of quantized value produced by quantization function is representing as floating-point neural network produces using quantized values”].  
As per claim 20, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 14.
Chai further teaches
the step-wise quantization function receives values in a range of values and, for each value in the range of received values, outputs a particular value in a set of values that can be represented in a particular number of bits of information used by a neural network inference circuit used to implement the quantized neural network [paragraph 0034, "machine learning system 104 may perform a feed-forward phase in which machine learning system 104 uses high-precision weights 114 in DNN 106 to determine output data 112 based on input data in input data set 110"; paragraph 0070, "layer-wise quantization is adopted to learn one W(l) and b(l)for each layer l of the CNN. However, the loss function l(W) is not continuous and differentiable over the range of parameter values”; paragraph 0077, "machine learning system 104 may be configured to determine a set of quantized values for the layer by rounding values produced by applying a quantization function (e.g., v1 + v2 log2|w|) to weights in the second set of weights (i.e., high-precision weights 114} that are associated with the layer", here, feed-forward precision weights determines output data is representing as set of output values by neural network inference circuit].

As per claim 21, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 14.
Burger further teaches
the floating-point intermediate activation values are stored using a variable position of a binary point used to represent the floating-point activation value [paragraph 0097, “a set of input tensors and corresponding output tensors generated by the trained neural network are stored”; wherein, paragraph 0106, “tensors representing weights, activation values, biases, or other neural network values; paragraph 0076, “the quantized numbers are stored in the quantized precision format and subsequent operations are emulated using, for example, block floating point representations”; paragraph 0098, “The quantized precision neural network model 940 can be stored in a block floating-point format representation. For example, the quantized neural network model 940 can be generated by converting weights, activation values, biases, and other values of the normal precision neural network model W 910 to a block floating point format”; paragraph 0033, “the term block floating-point (BFP) means a number system in which a single exponent is shared across two or more values, each of which is represented by a sign and mantissa pair”].
the intermediate activation values that are quantized [paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”];
Chai further teaches
values that are quantized to a particular range use a fixed binary point position [paragraph 0041, “bit-level precision of parameters has been explored under post processing for the purpose of compression of DNNs. The state-of-the-art requires either the bit-level precision to be fixed beforehand or separates into two steps the learning of DNN parameters and the compression of the DNN parameters”, here, bit-level precision of parameters is representing as fixed binary point position].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of storing the floating-point activation values using a variable position of a binary point used to represent the floating-point activation value of Burger. Doing so would help comparing an expected output for the neural network with the output that is produced using the quantized model, and saving the storage space (Burger, 0077).
As per claim 22, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 14.
Burger further teaches
the quantized intermediate activation values are one of 8-bit values and 4-bit values [paragraph 0076, “both model parameters as well as activation values in the neural network are quantized”; paragraph 0090, “A number of different quantization parameters can be selected. These include bit widths for node weights, for example 3, 4, or 5 bits for values; bit widths for input or activation values for a neural network, for example 3, 4, or 5 bits for representing values”; paragraph 0106, “The normal-precision neural network model can be quantize by, for example, converting tensors representing weights, activation values, biases, or other neural network values for tensors to a block floating point format. For example, values in a 16- or 32-bit floating point format tensor can be converted to a three-, four-, five-, six-bit”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of quantizing the intermediate activation values of the quantized neural network to one of 8-bit values and 4-bit values of Burger. Doing so would help improving performance and reducing computational resource usage by reducing bit widths used to represent values in one or more tensors of the quantized neural network (Burger, 0118).

Claims 4 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Chai et al. in view of Lowell et al. in view of Burger et al. and further in view of Darvish Rouhani et al. (US Pub. 2020/0202213). 
As per claim 4, Chai, Lowell and Burger teach the method of claim 3.
Chai further teaches
[paragraph 0031, “DNN 106 has a plurality of layers 108. Each of layers 108 may include a respective set of artificial neurons. Layers 108 include an input layer 108A, an output layer 108N, and one or more hidden layers (e.g., layers 1088 through 108M). Layers 108 may include fully connected layers, convolutional layers, pooling layers, and/or other types of layers. In a fully connected layer, the output of each neuron of a previous layer forms an input of each neuron of the fully connected layer”];
Chai, Lowell and Burger do not teach
the noise associated with the approximate quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of the floating-point neural network to be a quantized neural network.  
Darvish Rouhani teaches
the noise associated with the approximate quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of the floating-point neural network to be a quantized neural network [abstract, “adjusting hyper-parameters of a neural network to compensate for noise, such as noise introduced via quantization of one or more parameters of the neural network … the adjustment can include scaling the hyper-parameter based on at least one metric representing noise present in the neural network”; paragraphs 0002-0003, “Methods for compensating for quantization noise during training of a neural network implemented with a quantization-enabled system … a method for training a neural network includes obtaining a tensor including values of one or more parameters of the neural network represented in a quantized-precision format and generating at least one metric (e.g., at least one noise-to-signal metric) representing quantization noise present in the tensor. The parameters can include edge weights and activation weights (associated with compute nodes) of the neural network … quantization of a value of a parameter (e.g., an activation weight or edge weight of a neural network) can introduce noise, as the value is represented with lower precision in the quantized format”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the noise associated with the quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of Darvish Rouhani. Doing so would help adjusting hyper-parameters of a neural network to compensate for noise (Darvish Rouhani, abstract).

As per claim 17, Chai, Lowell and Burger teach the non-transitory machine readable medium of claim 16.
Chai further teaches
the floating-point neural network comprises a plurality of layers, each layer comprising at least one computation node that uses activation values of computation nodes from previous layers as input values [paragraph 0031, “DNN 106 has a plurality of layers 108. Each of layers 108 may include a respective set of artificial neurons. Layers 108 include an input layer 108A, an output layer 108N, and one or more hidden layers (e.g., layers 1088 through 108M). Layers 108 may include fully connected layers, convolutional layers, pooling layers, and/or other types of layers. In a fully connected layer, the output of each neuron of a previous layer forms an input of each neuron of the fully connected layer”];
Chai, Lowell and Burger do not teach
the noise associated with the approximate quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of the floating-point neural network to be a quantized neural network.  
Darvish Rouhani teaches
the noise associated with the approximate quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of the floating-point neural network to be a quantized neural network [abstract, “adjusting hyper-parameters of a neural network to compensate for noise, such as noise introduced via quantization of one or more parameters of the neural network … the adjustment can include scaling the hyper-parameter based on at least one metric representing noise present in the neural network”; paragraphs 0002-0003, “Methods for compensating for quantization noise during training of a neural network implemented with a quantization-enabled system … a method for training a neural network includes obtaining a tensor including values of one or more parameters of the neural network represented in a quantized-precision format and generating at least one metric (e.g., at least one noise-to-signal metric) representing quantization noise present in the tensor. The parameters can include edge weights and activation weights (associated with compute nodes) of the neural network … quantization of a value of a parameter (e.g., an activation weight or edge weight of a neural network) can introduce noise, as the value is represented with lower precision in the quantized format”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the noise associated with the quantization function is used to generate a noise value associated with a particular computation node in the plurality of layers during training of Darvish Rouhani. Doing so would help adjusting hyper-parameters of a neural network to compensate for noise (Darvish Rouhani, abstract).

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Chai et al. in view of Lowell et al. in view of Burger et al. in view of Darvish Rouhani et al. and further in view of Ganesan et al. (US Pub. 2007/0060186). 
As per claim 5, Chai, Lowell, Burger and Darvish Rouhani teach the method of claim 4.
Darvish Rouhani teaches in paragraph 0097 that noise can be introduced by pruning.
Chai, Lowell, Burger and Darvish Rouhani do not explicitly teach
the particular computation node in the neural network is removed from the neural network based on the generated noise value associated with the particular computation node.
Ganesan teaches
the particular computation node in the neural network is removed from the neural network based on the generated noise value associated with the particular computation node [paragraph 0048, “if noise element has exceeded second noise threshold … the one or more of the plurality of communication links affected must be removed”; since Chai (as modified) discloses the noise value associated with a node introduced via quantization (Darvish Rouhani, paragraphs 0002-0003), and Ganesan teaches if the noise value exceed a threshold, removing the corresponding node/channel, therefore, the combination of Chai (as modified) and Ganesan read on the claim limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of removing the channel/node when the noise level exceed the threshold of Ganesan. Doing so would help reducing noise generated during the training to improve network performance.

As per claim 6, Chai, Lowell, Burger, Darvish Rouhani and Ganesan teach the method of claim 5.
Ganesan further teaches
the particular computation node is removed when the generated noise value is above a threshold value [paragraph 0048, “if noise element has exceeded second noise threshold … the one or more of the plurality of communication links affected must be removed”; since Chai (as modified) discloses the noise value associated with a node introduced via quantization (Darvish Rouhani, paragraphs 0002-0003), and Ganesan teaches if the noise value exceed a threshold, removing the corresponding node/channel, therefore, the combination of Chai (as modified) and Ganesan read on the claim limitation].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify the method of transforming a neural network of Chai to include the process of removing the channel/node when the noise level exceed the threshold of Ganesan. Doing so would help reducing noise generated during the training to improve network performance.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Chai et al. in view of Lowell et al. in view of Burger et al. in view of Darvish Rouhani et al. in view of Ganesan et al. and further in view of Louizos et al. (US Pub. 2019/0354842).
As per claim 7, Chai, Lowell, Burger, Darvish Rouhani and Ganesan teach the method of claim 6.
Chai, Lowell, Burger, Darvish Rouhani and Ganesan do not teach
the noise associated with the approximate quantization function is a multiplicative noise.  
Louizos teaches 
the noise associated with the approximate quantization function is a multiplicative noise [abstract, “A method for quantizing a neural network includes modeling noise of parameters of the neural network”; paragraph 0067, “for floating point grids, multiplicative noise may be more desirable than additive noise”; since Chai (as modified) discloses the noise associated with a node introduced via quantization (Darvish Rouhani, paragraphs 0002-0003), and Louizos teaches the noise is a multiplicative noise, therefore, the combination of Chai (as modified) and Louizos read on the claim limitation].  


Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
El-Yaniv et al. (US Pub. 2017/0286830) describes a neural network trained by a training set and having a plurality of neurons each associated with a quantized activation function adapted to output a quantized activation value.
Choi et al. (US Pub. 2019/0138882) describes a method for learning low-precision neural networks that combines weight quantization and activation quantization.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103. The examiner can normally be reached M-F, 8 AM-5 PM, (CT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, OMAR FERNANDEZ can be reached on 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T. N./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128