DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for each of a plurality of neural network layers and at each of a plurality of training time steps
Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;
Obtaining a current quantization range for output tensors of the neural network layer;
Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;
Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)). The above limitations in the context of this claim encompass for each of a plurality of neural network layers and at each of a plurality of training time steps: “processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;” (corresponds to mathematical calculations in the form of matrix multiplication and addition (Pg. 11, lines 7-12: “the layer execution engine 140 might multiply the quantized layer input 112 and the quantized weight tensor 132 and, optionally, add a bias vector to the 10 product to generate an initial activation tensor, and then process the initial activation tensor using an activation function.” )Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;(corresponds to mathematical calculations.  Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes.) and Obtaining a current quantization range for output tensors of the neural network layer; (Corresponds to mental process. It is stated in the specification (Pg.16, lines 11-15) that it is sufficient to use the minimum and maximum scalar values observed as the minimum and maximum values for the quantization range. This is making an observation of the minimum and maximum values which is a mental process.) and determining an error between the output tensor and quantized output tensor and determining an update to the quantization range (Corresponds to mathematical formulas and calculations. FIG.4 details the formulas and calculations necessary to determine the error and the associated update to the quantization range.)
	
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. These additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). This limitation corresponds to mathematical concepts (mathematical relations between the quantization range and the minimum and maximum scalar values) and mental processes. (Using the minimum and maximum scalar values in order to determine the quantization range is capable of being performed reasonably by the human mind with pen and paper.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. These additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.


Regarding claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations. This limitation corresponds to mathematical concepts. (This is a mathematical relation that states that the number of elements of the maximum and minimum tensors are equivalent to the number of elements of the output tensors of the neural network layer )
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). Thus, this limitation corresponds to mathematical concepts. (mathematical relations between the minimum/maximum tensors and the channels of the output tensors)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein determining an error between the output tensor and the quantized output tensor comprises determining, for each element of the output tensor, an element error between the element of the output tensor and the corresponding element of the quantized output tensor.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). Determining the error between the two output tensors corresponds to mathematical concepts as well as a mental process. (This limitation also corresponds to mathematical formulas and calculations used to determine the error as detailed in FIG.4.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein determining an element error of an element of the output tensor comprises: determining whether the element is below the quantization range, above the quantization range, or in the quantization range;
if determining that the element is below the quantization range, determining the element error to be a difference between the element and a minimum value of the quantization range;
if determining that the element is above the quantization range, determining the element error to be difference between the element and a maximum value of the quantization range;
if determining that the element is in the quantization range, determining the element error to be an average element error for elements evenly distributed in the quantization range.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts and mental processes  (mathematical relationships, mathematical formulas or equations, mathematical calculations). Determining whether the element is above the quantization range is a mental process.(An observation is made by comparing the element to the quantization range and determining if it is greater than the maximum value in the range.) Determining whether the element is below the quantization range is a mental process.(An observation is made by comparing the element to the quantization range and determining if it is less than the minimum value.) Determining whether the element is within the quantization range is a mental process. (An observation is made by comparing the element to the quantization range and determining if it is between the maximum and minimum values.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.


Regarding claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
determining a minimum weight value and a maximum weight value for elements of the weight tensor;
processing the weight tensor using the minimum weight value and the maximum weight value to generate a quantized weight tensor that has the second precision;
processing the input tensor and the quantized weight tensor to generate the output tensor.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)). Determining minimum and maximum weight values correspond to a mental process (This is an evaluation of the weight values in order to determine the minimum and maximum values.) Processing the weight tensor is a mathematical concept and a mental process. (The quantization of the weight tensor is effectively just normalizing the data to fit within the bounds of our quantization range. This quantization process is a mathematical transformation done via mathematical calculations.) Processing the input tensor and quantized weight tensor is also a mathematical concept (The processing is done via matrix multiplication. This matrix multiplication is a mathematical calculation/equation.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The processes of “receiving an input tensor” and “obtaining a weight tensor” are considered insignificant extra-solution activities in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional elements of “receiving an input tensor” and “obtaining a weight tensor” amount to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.



Regarding claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein determining an update to the quantization range using the determined error comprises determining the update using gradient descent.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). The use of gradient descent as a method of performing an abstract idea constitutes an abstract idea. The use of gradient descent to determine the update corresponds to mathematical concepts. (Corresponds specifically to mathematical calculations detailed in Fig. 4)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: : Each of the following limitations:
wherein a learning rate for determining the update to the quantization range is proportional to a learning rate for determining an update to a weight tensor of the neural network layer.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). The limitation that the learning rates be proportional corresponds to a mathematical concept. (This is a mathematical relation between the two learning rates.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.


Regarding claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: : Each of the following limitations:
wherein a learning rate for determining the update to the quantization range is inversely proportional to a size of the input tensor for the neural network layer.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). The limitation that the learning rate be inversely proportional to the size of the input tensor corresponds to a mathematical concept. (This is a mathematical relation between the learning rate and the size of the input tensor.)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a method, which is a process, one of the statutory categories.
Step 2A Prong One Analysis: : Each of the following limitations:
wherein determining an update to the quantization range using the determined error comprises determining the update using exponential moving average smoothing.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). The use of exponential moving average smoothing is a well-known statistical technique. This corresponds to a mathematical concept. (Application of exponential moving average smoothing corresponds to the use of mathematical calculations)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The claim is not patent eligible.

Regarding claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 12 is directed to a system, is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for each of a plurality of neural network layers and at each of a plurality of training time steps
Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;
Obtaining a current quantization range for output tensors of the neural network layer;
Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;
Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)). The above limitations in the context of this claim encompass for each of a plurality of neural network layers and at each of a plurality of training time steps: “processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;” (corresponds to mathematical calculations in the form of matrix multiplication and addition (Pg. 11, lines 7-12: “the layer execution engine 140 might multiply the quantized layer input 112 and the quantized weight tensor 132 and, optionally, add a bias vector to the 10 product to generate an initial activation tensor, and then process the initial activation tensor using an activation function.” )Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;(corresponds to mathematical calculations.  Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes.) and Obtaining a current quantization range for output tensors of the neural network layer; (Corresponds to mental process. It is stated in the specification (Pg.16, lines 11-15) that it is sufficient to use the minimum and maximum scalar values observed as the minimum and maximum values for the quantization range. This is making an observation of the minimum and maximum values which is a mental process.) and determining an error between the output tensor and quantized output tensor and determining an update to the quantization range (Corresponds to mathematical formulas and calculations. FIG.4 details the formulas and calculations necessary to determine the error and the associated update to the quantization range.)
	
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional elements of “one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations” as drafted is/are reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a system, is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). This limitation corresponds to mathematical concepts (mathematical relations between the quantization range and the minimum and maximum scalar values) and mental processes. (Using the minimum and maximum scalar values in order to determine the quantization range is capable of being performed reasonably by the human mind with pen and paper.)

Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional elements of “one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations” as drafted is/are reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations. This limitation corresponds to mathematical concepts. (This is a mathematical relation that states that the number of elements of the maximum and minimum tensors are equivalent to the number of elements of the output tensors of the neural network layer )
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional elements of “one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations” as drafted is/are reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).


Regarding claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). Thus, this limitation corresponds to mathematical concepts. (mathematical relations between the minimum/maximum tensors and the channels of the output tensors)
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional elements of “one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations” as drafted is/are reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).


Regarding claim 16,
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 16 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein determining an error between the output tensor and the quantized output tensor comprises determining, for each element of the output tensor, an element error between the element of the output tensor and the corresponding element of the quantized output tensor.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). Determining the error between the two output tensors corresponds to mathematical concepts as well as a mental process. (This limitation also corresponds to mathematical formulas and calculations used to determine the error as detailed in FIG.4.)
 Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional elements of “one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations” as drafted is/are reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of “receiving an input tensor” amounts to receiving data, which is an insignificant extra-solution activity that is well-understood, routine, conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed one or more non-transitory computer storage media, which is article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for each of a plurality of neural network layers and at each of a plurality of training time steps
Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;
Obtaining a current quantization range for output tensors of the neural network layer;
Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;
Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)). The above limitations in the context of this claim encompass for each of a plurality of neural network layers and at each of a plurality of training time steps: “processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;” (corresponds to mathematical calculations in the form of, at the layer execution engine 140, multiplying the quantized layer input 112 and the quantized weight sensor 132 to generate the unquantized output tensor.) “Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;”(corresponds to mathematical calculations and mental process. In the quantization process, values are divided by their original precisions and then multiplied by the maximum value of the quantization range in order to generate an approximate value in a lower precision. This approximate value is rounded to the nearest value that can lies in the set of numbers within the quantization range. This division and rounding are mathematical calculations and are also capable of being performed within the human mind with the assistance of pen and paper.) and Obtaining a current quantization range for output tensors of the neural network layer; (Corresponds to mental process. It is stated in the specification that it is sufficient to use the minimum and maximum scalar values observed in the unquantized tensor as the minimum and maximum values for the quantization range. This is making an observation of the minimum and maximum values which is a mental process.) and determining an error between the output tensor and quantized output tensor and determining an update to the quantization range (Corresponds to mathematical formulas and calculations as well as mental process. FIG.4 details the formulas and calculations necessary to determine the error and the associated update to the quantization range. These formulas are capable of being performed by the human mind with pen and paper.) 
Step 2A Prong One Analysis: Each of the following limitations:
for each of a plurality of neural network layers and at each of a plurality of training time steps
Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;
Obtaining a current quantization range for output tensors of the neural network layer;
Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;
Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)). The above limitations in the context of this claim encompass for each of a plurality of neural network layers and at each of a plurality of training time steps: “processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision;” (corresponds to mathematical calculations in the form of, at the layer execution engine 140, multiplying the quantized layer input 112 and the quantized weight sensor 132 to generate the unquantized output tensor.) “Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;”(corresponds to mathematical calculations and mental process. In the quantization process, values are divided by their original precisions and then multiplied by the maximum value of the quantization range in order to generate an approximate value in a lower precision. This approximate value is rounded to the nearest value that can lies in the set of numbers within the quantization range. This division and rounding are mathematical calculations and are also capable of being performed within the human mind with the assistance of pen and paper.) and Obtaining a current quantization range for output tensors of the neural network layer; (Corresponds to mental process. It is stated in the specification that it is sufficient to use the minimum and maximum scalar values observed in the unquantized tensor as the minimum and maximum values for the quantization range. This is making an observation of the minimum and maximum values which is a mental process.) and determining an error between the output tensor and quantized output tensor and determining an update to the quantization range (Corresponds to mathematical formulas and calculations as well as mental process. FIG.4 details the formulas and calculations necessary to determine the error and the associated update to the quantization range. These formulas are capable of being performed by the human mind with pen and paper.)	
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional element of “One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations for training a quantized neural network” as drafted is reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, with respect to integration of the abstract idea into a practical application, the additional element of “receiving an input tensor” is mere data gathering and thus, insignificant extra-solution activity. The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). This limitation corresponds to mathematical concepts (mathematical relations between the quantization range and the minimum and maximum scalar values) and mental processes. (Using the minimum and maximum scalar values in order to determine the quantization range is capable of being performed reasonably by the human mind with pen and paper.)

Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional element of “One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations for training a quantized neural network” as drafted is reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, with respect to integration of the abstract idea into a practical application, the additional element of “receiving an input tensor” is mere data gathering and thus, insignificant extra-solution activity. The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations. This limitation corresponds to mathematical concepts. (This is a mathematical relation that states that the number of elements of the maximum and minimum tensors are equivalent to the number of elements of the output tensors of the neural network layer )

Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. The process of “receiving an input tensor” is considered an insignificant extra-solution activity in the form of mere data gathering. This additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. All other additional elements within the claim only recite additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f), 2106.05(g). The additional element of “One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations for training a quantized neural network” as drafted is reciting a generic computer component. The generic computer components in these steps are recited a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above, with respect to integration of the abstract idea into a practical application, the additional element of “receiving an input tensor” is mere data gathering and thus, insignificant extra-solution activity. The additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. The claim is not patent eligible. Mere instructions to apply the exception using a generic computer component cannot provide an inventive concept. See MPEP 2106.05(f), 2106.05(g).

Regarding claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a system, which in the context of this limitation is a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.
As drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations). Thus, this limitation corresponds to mathematical concepts. (mathematical relations between the minimum/maximum tensors and the channels of the output tensors)

Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations for training a quantized neural network” and, as drafted, is reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. See MPEP 2106.05(f), MPEP 2106.05(g).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5,7-8, 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al. (INTEGER QUANTIZATION FOR DEEP LEARNING INFERENCE: PRINCIPLES AND EMPIRICAL EVALUATION) in view of Choi et al. (PACT: PARAMETERIZED CLIPPING ACTIVATION FOR QUANTIZED NEURAL NETWORKS)

Regarding Claim 1, 
Wu teaches A method of training a quantized neural network (Pg.10, Section 5.2: “Quantization Aware Training (QAT) describes the technique of inserting quantization operations in to the neural network before training” teaches inserting quantization operations into the neural network (this corresponds to quantizing the neural network) before training is being done. This means the training is being done on a quantized neural network.) comprising, for each of a plurality of neural network layers and at each of a plurality of training time steps(Pg. 6, Second paragraph: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications.” Teaches the use of linear (fully-connected) layers and projection layers (corresponds to a plurality of neural network layers);Pg. 6, second paragraph: “Furthermore, consecutive operations can be executed with a fused implementation, avoiding memory reads and writes for the intermediate values.” Teaches a plurality of training time steps because the use of “consecutive operations” shows that the processes are performed in series and thus in different time steps.), Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision(Pg. 5, First paragraph: “Y = XW”, “X = … is the input activation tensor”, “Y = … is the output tensor” teaches processing the input activation tensor (corresponds to the input tensor) using a linear (fully-connected) layer (corresponds to the neural network layer) to generate an output tensor, wherein the output tensor has a first precision.(The output tensor has yet to be quantized and thus has first precision));
 Obtaining a current quantization range for output tensors of the neural network layer (Pg. 4, Section 3.1.2, Paragraph 2: “Equation 6 and 7 define scale quantization of a real value x, with a chosen representable range [-α,α], producing a b-bit integer value” Teaches that the quantization range is obtained by using the variable “α” to define the maximum and minimum bounds.); Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;(Pg.6, Section 4, Paragraph 2: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications. … Therefore we leave quantization of the output activations to the input of the next operation” teaches that the output activations (corresponds to output tensor) are quantized in the input of the next operation. This quantization at the input of the next operation is still being performed on the output activations of the previous operation. This quantization step is reducing the precision from the first precision (Pg. 3, Section 3: “Quantize: convert a real number to a quantized integer representation” the integer representation of the number must be a smaller precision/bit-width than the real number representation.)) 
Wu does not appear to explicitly teach “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.” However, Choi teaches “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error. (Pg. 3, Section 4: “α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0, α].” teaches dynamically adjusting the parameter α in order to minimize accuracy degradation from quantizing the output tensor to the quantized output tensor. (Accuracy degradation corresponds to error)
Pg. 3, Section 4, Equation (1): 
    PNG
    media_image1.png
    145
    719
    media_image1.png
    Greyscale
  
Pg. 4, first sentence: “where α limits the range of activation to [0, α]” Teaches that α is used in determining the quantization range. Because α is dynamically adjusted in order to minimize accuracy degradation (corresponds to error) and α is a component of the quantization range then the quantization range is being updated according to the determined error. (accuracy degradation))

Wu and Choi are analogous art to the claimed invention because they are both directed to methods of quantizing layers of neural networks.

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)

Regarding Claim 2,
	Wu in view of Choi teaches the method of claim 1.
	Wu further teaches wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value. (Pg. 3, Section 3.1: “Uniform quantization transforms the input value … to lie within [-2b-1, 2b-1 - 1], where inputs outside the range are clipped to the nearest bound.” Teaches that the range of quantization is defined by the minimum scalar value of “-2b-1 ” and the maximum scalar value of “2b-1 - 1”)


Regarding claim 3,

Wu in view of Choi teaches the method of claim 1.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.(Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization maximum (α) and therefore a different quantization range to individual elements (“individual α for each neuron activation”). This means that there is a distinct max of α and minimum 0 per element.) 

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)





Regarding claim 4,

Wu in view of Choi teaches the method of claim 1.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors (Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization maximum (α) and therefore a different quantization range to individual elements (“Shared α among neurons within the same output channel”). This means that there is a distinct max of α and minimum 0 per output channel.)
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)


Regarding claim 5,

Wu in view of Choi teaches the method of claim 1.
Wu further teaches wherein determining an error between the output tensor and the quantized output tensor comprises determining, for each element of the output tensor, an element error between the element of the output tensor and the corresponding element of the quantized output tensor. (Pg. 19, Section B: “Neural networks are trained by minimizing a loss function with stochastic gradient                         
                            
                                
                                    δ
                                    L
                                
                                
                                    δ
                                    w
                                
                            
                        
                     , are computed and weights are iteratively updated in the direction of the negative gradient until the model converges to some minimum.” and “By training with quantization, we may potentially avoid these narrow minima by computing gradients with respect to the quantized weights, as shown in Figure 6b. In doing so, narrow minima will result in larger gradients, potentially allowing the model to explore the loss landscape for “wide” [22] or “flat” [16, 30] minima, where quantization points have lower loss, and thus higher accuracy.” teaches that the training of the neural network involves minimizing a loss function. The output of this loss function corresponds to the element error between the element of the output tensor and the corresponding element of the quantized output tensor.)


Regarding claim 7,
	Wu in view of Choi teaches the method of claim 1.
	Wu further teaches wherein processing the input tensor using the neural network layer comprises: obtaining a weight tensor for the neural network layer(Pg. 5, First paragraph: “Consider a linear (fully-connected) layer that performs a matrix multiplication Y = XW, where … W = … is the weight tensor”), wherein the weight tensor has the first precision(Pg. 5, First paragraph: “The result of the real-valued matrix multiplication Y = XW can be approximated with quantized tensors … by first dequantizing them, and then performing the matrix multiplication.” This process of first dequantizing them ensures that the multiplication of XW is done with unquantized weight and input vectors (vectors of the first precision)); determining a minimum weight value and a maximum weight value for elements of the weight tensor and processing the weight tensor using the minimum weight value and the maximum weight value to generate a quantized weight tensor that has the second precision;(Pg.6, Section 3.4: “Calibration is the process of choosing α and β for model weights and activations … Max: Use the maximum absolute value seen during calibration” and “Scale quantization performs range mapping with only a scale transformation … Equation 6 and 7 define scale quantization
of a real value x, with a chosen representable range [-α, α]” Teaches that the absolute max value (the absolute value of the maximum and/or minimum values) is used in determining the range for quantization. The reference also teaches the use of scale quantization allowing the quantization range to be defined by the greater absolute value between the minimum and maximum values. This is using the minimum and maximum weight values (These minimum and maximum weight values determine the absolute max value) as the α parameter for quantizing the weight tensor to the quantized weight tensor.); Pg. 4, Section 3.1.2, Paragraph 2: “Equation 6 and 7 define scale quantization of a real value x, with a chosen representable range [-α,α], producing a b-bit integer value” teaches that the weights are quantized using the parameter α.); processing the input tensor and the quantized weight tensor to generate the output tensor. (Pg. 5, First paragraph: “The result of the real-valued matrix multiplication Y = XW can be approximated with quantized tensors Xq = … and Wq = … by first dequantizing them, and then performing the matrix multiplication.” teaches processing the input tensor (corresponds to the input tensor) and the quantized weight tensor (corresponds to Wq) to generate an output tensor.(The output tensor has yet to be quantized and thus has the first precision.))


Regarding claim 8,
Wu in view of Choi teaches the method of claim 1.
Wu further teaches wherein determining an update to the quantization range using the determined error comprises determining the update using gradient descent. (Pg. 19, Section B: “Neural networks are trained by minimizing a loss function with stochastic gradient descent.” and Pg. 11, Section 5.3: “While the techniques described in the previous sections relied on quantization parameters calibrated on the pre-trained network, it is also possible to jointly learn the quantization parameters along with the model weights.” Teaches using gradient descent train the neural network based on the determined error (the determined error corresponds to the result of the loss function). This also teaches that the quantization range can be learned in along with the model weights as part of training


Regarding Claim 12, 
Wu teaches A method of training a quantized neural network (Pg.10, Section 5.2: “Quantization Aware Training (QAT) describes the technique of inserting quantization operations in to the neural network before training” teaches inserting quantization operations into the neural network (this corresponds to quantizing the neural network) before training is being done. This means the training is being done on a quantized neural network.) comprising, for each of a plurality of neural network layers and at each of a plurality of training time steps(Pg. 6, Second paragraph: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications.” Teaches the use of linear (fully-connected) layers and projection layers (corresponds to a plurality of neural network layers);Pg. 6, second paragraph: “Furthermore, consecutive operations can be executed with a fused implementation, avoiding memory reads and writes for the intermediate values.” Teaches a plurality of training time steps because the use of “consecutive operations” shows that the processes are performed in series and thus in different time steps.), Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision(Pg. 5, First paragraph: “Y = XW”, “X = … is the input activation tensor”, “Y = … is the output tensor” teaches processing the input activation tensor (corresponds to the input tensor) using a linear (fully-connected) layer (corresponds to the neural network layer) to generate an output tensor, wherein the output tensor has a first precision.(The output tensor has yet to be quantized and thus has first precision));
 Obtaining a current quantization range for output tensors of the neural network layer (Pg. 4, Section 3.1.2, Paragraph 2: “Equation 6 and 7 define scale quantization of a real value x, with a chosen representable range [-α,α], producing a b-bit integer value” Teaches that the quantization range is obtained by using the variable “α” to define the maximum and minimum bounds.); Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;(Pg.6, Section 4, Paragraph 2: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications. … Therefore we leave quantization of the output activations to the input of the next operation” teaches that the output activations (corresponds to output tensor) are quantized in the input of the next operation. This quantization at the input of the next operation is still being performed on the output activations of the previous operation. This quantization step is reducing the precision from the first precision (Pg. 3, Section 3: “Quantize: convert a real number to a quantized integer representation” the integer representation of the number must be a smaller precision/bit-width than the real number representation.)) 
Wu does not appear to explicitly teach “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.” However, Choi teaches “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error. (Pg. 3, Section 4: “α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0, α].” teaches dynamically adjusting the parameter α in order to minimize accuracy degradation from quantizing the output tensor to the quantized output tensor. (Accuracy degradation corresponds to error)
Pg. 3, Section 4, Equation (1): 
    PNG
    media_image1.png
    145
    719
    media_image1.png
    Greyscale
  
Pg. 4, first sentence: “where α limits the range of activation to [0, α]” Teaches that α is used in determining the quantization range. Because α is dynamically adjusted in order to minimize accuracy degradation (corresponds to error) and α is a component of the quantization range then the quantization range is being updated according to the determined error. (accuracy degradation))

Wu and Choi are analogous art to the claimed invention because they are both directed to methods of quantizing layers of neural networks.

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)

Regarding Claim 13,
	Wu in view of Choi teaches the system of claim 12.
	Wu further teaches wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value. (Pg. 3, Section 3.1: “Uniform quantization transforms the input value … to lie within [-2b-1, 2b-1 - 1], where inputs outside the range are clipped to the nearest bound.” Teaches that the range of quantization is defined by the minimum scalar value of “-2b-1 ” and the maximum scalar value of “2b-1 - 1”)


Regarding claim 14,

Wu in view of Choi teaches the system of claim 12.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.(Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization maximum (α) and therefore a different quantization range to individual elements (“individual α for each neuron activation”). This means that there is a distinct max of α and minimum 0 per element.) 

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)




Regarding claim 15,

Wu in view of Choi teaches the system of claim 12.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors (Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization maximum (α) and therefore a different quantization range to individual elements (“Shared α among neurons within the same output channel”). This means that there is a distinct max of α and minimum 0 per output channel.)

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)


Regarding claim 16,

Wu in view of Choi teaches the system of claim 12.
Wu further teaches wherein determining an error between the output tensor and the quantized output tensor comprises determining, for each element of the output tensor, an element error between the element of the output tensor and the corresponding element of the quantized output tensor. (Pg. 19, Section B: “Neural networks are trained by minimizing a loss function with stochastic gradient                         
                            
                                
                                    δ
                                    L
                                
                                
                                    δ
                                    w
                                
                            
                        
                     , are computed and weights are iteratively updated in the direction of the negative gradient until the model converges to some minimum.” and “By training with quantization, we may potentially avoid these narrow minima by computing gradients with respect to the quantized weights, as shown in Figure 6b. In doing so, narrow minima will result in larger gradients, potentially allowing the model to explore the loss landscape for “wide” [22] or “flat” [16, 30] minima, where quantization points have lower loss, and thus higher accuracy.” teaches that the training of the neural network involves minimizing a loss function. The output of this loss function corresponds to the element error between the element of the output tensor and the corresponding element of the quantized output tensor.)

Regarding Claim 17, 
Wu teaches A method of training a quantized neural network (Pg.10, Section 5.2: “Quantization Aware Training (QAT) describes the technique of inserting quantization operations in to the neural network before training” teaches inserting quantization operations into the neural network (this corresponds to quantizing the neural network) before training is being done. This means the training is being done on a quantized neural network.) comprising, for each of a plurality of neural network layers and at each of a plurality of training time steps(Pg. 6, Second paragraph: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications.” Teaches the use of linear (fully-connected) layers and projection layers (corresponds to a plurality of neural network layers);Pg. 6, second paragraph: “Furthermore, consecutive operations can be executed with a fused implementation, avoiding memory reads and writes for the intermediate values.” Teaches a plurality of training time steps because the use of “consecutive operations” shows that the processes are performed in series and thus in different time steps.), Processing the input tensor using the neural network layer to generate an output tensor, wherein the output tensor has a first precision(Pg. 5, First paragraph: “Y = XW”, “X = … is the input activation tensor”, “Y = … is the output tensor” teaches processing the input activation tensor (corresponds to the input tensor) using a linear (fully-connected) layer (corresponds to the neural network layer) to generate an output tensor, wherein the output tensor has a first precision.(The output tensor has yet to be quantized and thus has first precision));
 Obtaining a current quantization range for output tensors of the neural network layer (Pg. 4, Section 3.1.2, Paragraph 2: “Equation 6 and 7 define scale quantization of a real value x, with a chosen representable range [-α,α], producing a b-bit integer value” Teaches that the quantization range is obtained by using the variable “α” to define the maximum and minimum bounds.); Processing the output tensor using the current quantization range to generate a quantized output tensor that has a second precision that is lower than the first precision;(Pg.6, Section 4, Paragraph 2: “We focus on quantizing the computationally intensive operations, including convolutions, linear (fully-connected) layers, LSTM cells, projection layers and other matrix multiplications. … Therefore we leave quantization of the output activations to the input of the next operation” teaches that the output activations (corresponds to output tensor) are quantized in the input of the next operation. This quantization at the input of the next operation is still being performed on the output activations of the previous operation. This quantization step is reducing the precision from the first precision (Pg. 3, Section 3: “Quantize: convert a real number to a quantized integer representation” the integer representation of the number must be a smaller precision/bit-width than the real number representation.)) 
Wu does not appear to explicitly teach “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error.” However, Choi teaches “Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error. (Pg. 3, Section 4: “α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0, α].” teaches dynamically adjusting the parameter α in order to minimize accuracy degradation from quantizing the output tensor to the quantized output tensor. (Accuracy degradation corresponds to error)
Pg. 3, Section 4, Equation (1): 
    PNG
    media_image1.png
    145
    719
    media_image1.png
    Greyscale
  
Pg. 4, first sentence: “where α limits the range of activation to [0, α]” Teaches that α is used in determining the quantization range. Because α is dynamically adjusted in order to minimize accuracy degradation (corresponds to error) and α is a component of the quantization range then the quantization range is being updated according to the determined error. (accuracy degradation))

Wu and Choi are analogous art to the claimed invention because they are both directed to methods of quantizing layers of neural networks.

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of Determining an error between the output tensor and the quantized output tensor, and determining an update to the quantization range using the determined error as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)

Regarding Claim 18,
	Wu in view of Choi teaches the non-transitory computer storage media of claim 17.
	Wu further teaches wherein the quantization range for output tensors of the neural network layer is defined by a minimum scalar value and a maximum scalar value. (Pg. 3, Section 3.1: “Uniform quantization transforms the input value … to lie within [-2b-1, 2b-1 - 1], where inputs outside the range are clipped to the nearest bound.” Teaches that the range of quantization is defined by the minimum scalar value of “-2b-1 ” and the maximum scalar value of “2b-1 - 1”)


Regarding claim 19,

Wu in view of Choi teaches the non-transitory computer storage media of claim 17.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer.(Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization α and therefore a different quantization range to individual elements. (“individual α for each neuron activation”)

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a same number of elements as the output tensors of the neural network layer and ii) a maximum tensor having a same number of elements as the output tensors of the neural network layer” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)

Regarding claim 20,

Wu in view of Choi teaches the non-transitory computer storage media of claim 17.
Choi further teaches wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors (Pg. 11, Section A.1: “One of key questions is the optimal scope for α… We considered 3 possible choices: (a) Individual α for each neuron activation, (b) Shared α among neurons within the same output channel, and (c) Shared α within a layer.” and Pg. 3, Section 4: “Building on these insights, we introduce PACT, a new activation quantization scheme in which the ActFn has a parameterized clipping level, α. α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0,α].” teaches that the parameter α is used in determining the quantization range and can define a different quantization maximum (α) and therefore a different quantization range to individual elements (“Shared α among neurons within the same output channel”). This means that there is a distinct max of α and minimum 0 per output channel.)

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein the quantization range for output tensors of the neural network layer is defined by i) a minimum tensor having a number of elements equal to a number of channels of the output tensors and ii) a maximum tensor having a number of elements equal to the number of channels of the output tensors.” as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Choi and further in view of Park (Value-aware Quantization for Training and Inference of Neural Networks).

Regarding Claim 9,
	Wu in view of Choi teaches the method of claim 8
	Wu in view of Choi does not explicitly teach “wherein a learning rate for determining the update to the quantization range is proportional to a learning rate for determining an update to a weight tensor of the neural network layer.” however Park teaches “wherein a learning rate for determining the update to the quantization range is proportional to a learning rate for determining an update to a weight tensor of the neural network layer.” (Pg. 11, 3rd full paragraph: "Table 5 shows the impact of RV-Quant configurations on training accuracy of ResNet50. We change the configurations when the learning rate changes (with the initial value of 0.1) at 0.01 and 0.001. For instance, (F)-(3,2)-(2,0) represents the case that, as the initial configuration, we use full-precision activation (F) during back-propagation. After 30 epochs, the  configuration is changed to 3-bit 2% RV-Quant. Then, after 60 epochs, it is changed to 2-bit 0% RV-Quant." Teaches changing the configuration (which include the type of quantization, in this case, 3-bit or 2-bit) as the learning rate changes. This change of 3-bit or 2-bit also involves changing the quantization range/intervals correspondingly.)

Wu in view of Choi and Park are analogous art to the claimed invention because they are both directed to methods of quantizing layers of neural networks.

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein a learning rate for determining the update to the quantization range is proportional to a learning rate for determining an update to a weight tensor of the neural network layer” as taught by Park into the system of Wu in View of Choi in order to “have smaller memory cost of activations.” (Park, Pg. 12, 1st paragraph)

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Wu in view of Choi and further in view of Jacob (Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference).

Wu in view of Choi teaches the method of claim 1.
Choi teaches “wherein determining an update to the quantization range using the determined error” . (Pg. 3, Section 4: “α is dynamically adjusted via gradient descent-based training with the objective of minimizing the accuracy degradation arising from quantization. … α limits the range of activation to [0, α].” teaches dynamically adjusting the parameter α in order to minimize accuracy degradation from quantizing the output tensor to the quantized output tensor. (Accuracy degradation corresponds to error)
Pg. 3, Section 4, Equation (1): 
    PNG
    media_image1.png
    145
    719
    media_image1.png
    Greyscale
  
Pg. 4, first sentence: “where α limits the range of activation to [0, α]” Teaches that α is used in determining the quantization range. Because α is dynamically adjusted in order to minimize accuracy degradation (corresponds to error) and α is a component of the quantization range then the quantization range is being updated according to the determined error. (accuracy degradation))

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of wherein determining an update to the quantization range using the determined error as taught by Choi into the system of Wu in order to leverage “a new parameter α that is used to represent the clipping level in the activation function and is learned via back-propagation. α sets the quantization scale smaller than ReLU to reduce the quantization error” (Choi et al, Pg. 2, Bullet 1)


	Wu in view of Choi does not teach “comprises determining the update using exponential moving average smoothing.” however, Jacob teaches “comprises determining the update using exponential moving average smoothing.” (Jacob, Pg. 2708, Section 3.1: “Quantization ranges are treated differently for weight quantization vs. activation quantization: ... For activations, ranges depend on the inputs to the network. To estimate the ranges, we collect [a; b] ranges seen on activations during training and then aggregate them via exponential moving averages (EMA) with the smoothing parameter being close to 1 so that observed ranges are smoothed across thousands of training steps.” Teaches the utilization of exponential moving averages along with a smoothing parameter so that ranges are smoothed.)

Wu in view of Choi and Jacob are analogous art to the claimed invention because they are both directed to methods of quantizing layers of neural networks.

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the idea of “wherein determining an update to the quantization range using the determined error comprises determining the update using exponential moving average smoothing.” as taught by Jacob into the system of Wu in View of Choi so that “so that observed ranges are smoothed across thousands of training steps.“(Jacob, Pg. 2708, Section 3.1)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BARRETT DOUGLAS CALHOUN whose telephone number is (571)272-6513. The examiner can normally be reached 8:30-5:00 Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BARRETT DOUGLAS CALHOUN/Examiner, Art Unit 4173                                                                                                                                                                                                        
/YING YU CHEN/Primary Examiner, Art Unit 2125