DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1, 2, 5, 8, 9-12, 15, 16, and 19 are objected to because of the following informalities:  
Line 7 of claim 1 recites “weights values”, which should read “weight values” for grammatical correctness. Similar terminology appears in claims 8 and 15, which should be corrected similarly. 
Line 13 of claim 1 recites “the quantized weights”, which appears to be referring back to the weight values for the layer of the neural network that have been quantized. While this does not rise to an issue of lack of clarity, since it is clear what is meant, applicant should be consistent with claim terms, i.e. consistently call the values “quantized weights” or “quantized weight values for the layer of the neural network”. 
Similar analysis applies to going from “adjust the quantized weights” to “the adjusted quantized weight values”- the term can be “adjusted quantized weights” or “adjusted quantized weight values” but should not be both. Similar analysis also applies to the term “the quantized weights for the layer”, “the quantized weights”, and “the adjusted quantized weight values” in claim 2. Similar analysis applies to claims 8, 9, 15, and 16.
Line 6 of claim 5 recites “diving” which appears to be a typographical error for “dividing”. Similar terminology appears in claims 12 and 19, which should be corrected similarly.
Line 1 of claims 10-12 recites “The system of method of Claim 8” which appears to be a typographical error for “The system of claim 8”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites the limitation "the extrema weight value" in line 1. There is insufficient antecedent basis for this limitation in the claim since there are multiple extrema weight values in claim 1, on which claim 3 depends. Similar limitations with similar issues exist in claims 4, 10, 11, 17, and 18.
Claim 3 recites the limitation “the quantized value” in lines 2-3. . There is insufficient antecedent basis for this limitation in the claim since there are multiple quantized values in claim 1, on which claim 3 depends. Similar limitations with similar issues exist in claims 4, 10, 11, 17, and 18.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over de Vangel (U.S. Publication 2020/0242473) in view of Vantrease (U.S. Publication 2019/0294413).

As to claim 1, de Vangel discloses a computer-implemented method for quantization for a neural network (p. 1, section 0011) comprising:
identifying a set of extrema weight values from weight values for a layer of the neural network, the set of extrema weight values comprising a maximum weight value and a minimum weight value (p. 3, section 0036; p. 4, section 0055-p. 5, section 0057;  p. 5, section 0070-p. 6, section 0074; a range of values from a minimum to a maximum for a layer is identified; note that the reference discusses the specifics with regard to quantization of input values, but also uses the quantization to quantize weights); 
obtaining a scaling factor for quantizing the weight values of the layer of the neural network using the set of extrema weight values and a number of bits that will be used to represent the weights values in quantized form (p. 4, section 0052; p. 5-6, section 0070; as part of the quantization process, a scaling factor is used to position the values from the min-max range to a particular range in a particular bit space, for example an 8-bit space); 
using one of the extrema weight values and the scaling factor to quantize the weight values for the layer of the neural network (p. 4, section 0052; p. 5-6, section 0070; as part of the quantization process, a scaling factor is used to position the values from the min-max range to a particular range in a particular bit space, for example an 8-bit space); 
obtaining an offset value for the layer that is an integer value using the scaling factor and the extreme value from the set of extrema weight values that was used to quantize the weight values for the layer (p. 4, section 0052; p. 5-6, section 0070; as part of the quantization process, a shift value D, which reads on an offset value, is derived based on the value that needs to be added to convert the min-max range to the desired min-max range after the scaling factor is applied); 
obtaining an output for the layer comprises using only integer operations to multiply the adjusted quantized weight values with input values for the layer (fig. 3b, element 345; p. 4, section 0045; p. 4, sections 0052-0054; p. 5, section 0058; the inference to obtain an output for the layer is performed by multiplying integer input values and integer weights).
De Vangel discloses quantized values that are weights, as noted above. De Vangel does not explicitly disclose, but Vantrease does disclose for the layer, storing the scaling factor, the offset value, and the quantized values, to be used during inference and using only integer operations to adjust the quantized values by the offset value (fig. 8, elements 820, 825, 830; p. 10, section 0086; p. 11, section 0095; p. 12, sections 0102-0104; p. 13, section 0107; the offset zero-point subtraction operation is an integer and integer subtraction; all of scaling, offset/zero-point, and quantized values are stored; scaling and zero-point shifting can also be performed for weight values). The motivation for this is to reduce size of data and improve efficiency of computation and make the quantization and dequantization process less complex (p. 1, section 0017-p. 2, section 0018). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify De Vangel to, for each layer, store the scaling factor, the offset value, and the quantized values, to be used during inference and using only integer operations to adjust the quantized values by the offset value in order to reduce size of data and improve efficiency of computation and make the quantization and dequantization process less complex as taught by Vantrease.

As to claim 2, De Vangel discloses computing an output for the layer of the neural network using the scaling factor for the layer, the offset value for the layer, the quantized weights for the layer, and input values, wherein integer operations are used to multiply the adjusted quantized weight values with the input values (fig. 3b, element 345; p. 4, section 0045; p. 4, sections 0052-0054; p. 5, section 0058; p. 5-6, section 0070; the inference to obtain an output for the layer is performed by multiplying integer input values and integer weights that have been adjusted using the scaling and offset values). De Vangel discloses quantized values that are weights, as noted above. De Vangel does not explicitly disclose, but Vantrease does disclose integer operations are used to adjust the quantized weights by the offset value (fig. 8, elements 820, 825, 830; p. 10, section 0086; p. 12, sections 0102-0104; the offset zero-point subtraction operation is an integer and integer subtraction). Motivation for the combination is given in the rejection to claim 1.

As to claim 3, as best understood, De Vangel discloses wherein the extrema weight value is the maximum weight value (p. 5-6, section 0070; as part of the quantization process, a shift value D, which reads on an offset value, is derived based on the value that needs to be added to convert the min-max range to the desired min-max range after the scaling factor is applied; both min and max are used to determine the shift). De Vangel discloses that the quantized value is a weight value, as noted above. De Vangel does not disclose, but Vantrease does disclose that an adjusted quantized value is obtained by subtracting the quantized value from the offset value using an integer operation (fig. 8, elements 820, 825, 830; p. 10, section 0086; p. 12, sections 0102-0104; the offset zero-point subtraction operation is an integer and integer subtraction). Motivation for the combination is given in the rejection to claim 1.

As to claim 4, as best understood, De Vangel discloses wherein the extrema weight value is the minimum weight value and an adjusted quantized weight value is obtained by adding the quantized value to the offset value (p. 5-6, section 0070; as part of the quantization process, a shift value D, which reads on an offset value, is derived based on the value that needs to be added to convert the min-max range to the desired min-max range after the scaling factor is applied; both min and max are used to determine the shift and the value D is added to match the desired range). De Vangel does not explicitly disclose, but Vantrease does disclose that the offset value operation occurs using an integer operation (fig. 8, elements 820, 825, 830; p. 10, section 0086; p. 12, sections 0102-0104; the offset zero-point subtraction operation is an integer and integer operation). Motivation for the combination is given in the rejection to claim 1.

As to claim 5, De Vangel does not disclose, but Vantrease does disclose wherein the step of obtaining an offset value for the layer that is an integer value using the scaling factor and the extreme value from the set of extrema weight values that was used to quantize the weight values for the layer comprises: obtaining a quotient by dividing the extreme value from the set of extrema weight values by the scaling factor; and converting the quotient to an integer value (p. 11, sections 0092-0094; Xq represents a quantized value and the extrema values -0.5 and 3.5 are divided by the scale factor Sx and an integer value is obtained by rounding, for example rounding 32 + 64 * 3.499 to 255). Motivation for the combination is given in the rejection to claim 1.

As to claim 6, De Vangel discloses using the method of claim 1 for each layer of two or more layers of the neural network (p. 3, section 0036; p. 5, sections 0057-0059; the method is done on each layer on a per layer basis). 

As to claim 7, De Vangel discloses wherein at least a plurality of the two or more layers are consecutive layers (p. 4, section 0045; neurons can be output from a layer to a previous or next layer, which in either case would mean the plurality of layers are consecutive) and the method further comprises: 
computing an output for the consecutive layers of the neural network by performing the steps comprising: using integer operations (fig. 3b, element 345; p. 4, section 0045; p. 4, sections 0052-0054; p. 5, section 0058; the inference to obtain an output for the layer is performed by multiplying integer input values and integer weights) to: 
and multiply together the sets of adjusted quantized weight values and input values for the first layer of the consecutive layers to obtain an intermediate product ((fig. 3b, element 345; p. 4, section 0045; p. 4, sections 0052-0054; p. 5, section 0058; the inference to obtain an output for the layer is performed by multiplying integer input values and integer weights; this can be done for multiple consecutive layers, and a first of two consecutive layers would give an intermediate product).
De Vangel discloses obtaining, for each layer of the consecutive layers, a set of adjusted quantized weight values for the layer by adjusting the quantized weight values of the layer by the offset value of the layer (p. 3, section 0036; p. 4, section 0052; p. 5, sections 0057-0059; p. 5-6, section 0070; as part of the quantization process, a shift value D, which reads on an offset value, is derived based on the value that needs to be added to convert the min-max range to the desired min-max range after the scaling factor is applied; the method is done on each layer on a per layer basis). De Vangel does not explicitly disclose, but Vantrease does disclose that this is an integer operation (fig. 8, elements 820, 825, 830; p. 10, section 0086; p. 11, section 0095; p. 12, sections 0102-0104; p. 13, section 0107; the offset zero-point subtraction operation is an integer and integer subtraction). Further, De Vangel discloses quantization at each layer of consecutive layers, and obtaining an output for the consecutive layers from an intermediate product as noted above. De Vangel does not disclose, but Vantrease does disclose multiplying together the scaling factors for each layer and the intermediate product to obtain the output (p. 12, sections 0099-0100; p. 12, section 0105; the two scaling factors for each layer are multiplied by an intermediate quantized product to obtain an output).  Motivation for the combination is given in the rejection to claim 1.

As to claim 8, see the rejection to claim 1. Further, De Vangel discloses a system comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed (p. 6, section 0081-p. 7, section 0092).

As to claim 9, see the rejection to claim 2.

As to claim 10, see the rejection to claim 3.

As to claim 11, see the rejection to claim 4.

As to claim 12, see the rejection to claim 5.
As to claim 13, see the rejection to claim 6.

As to claim 14, see the rejection to claim 7.

As to claim 15, see the rejection to claims 1 and 8.

As to claim 16, see the rejection to claim 2.

As to claim 17, see the rejection to claim 3.

As to claim 18, see the rejection to claim 4.

As to claim 19, see the rejection to claim 5.

As to claim 20, see the rejection to claim 7.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON M RICHER whose telephone number is (571)272-7790. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AARON M RICHER/Primary Examiner, Art Unit 2612