DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 06/13/2018. Claims 1-20 are pending and have been examined.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 06/13/2018.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
Claims 19-20 recite “computer readable storage medium.” Specification [0098] provides the following, “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire” (emphasis added). Therefore, “computer readable storage medium” in claims 19-20 has been interpreted as “non-transitory computer readable storage medium.”




Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-7, 9-10, 13-15, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3 recites “estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value” (emphasis added). It is unclear whether the emphasized limitation requires “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”, or if the limitation requires that the first and second statistical functions are both elements of the “linear or a non-linear function”. For examination purposes, the emphasized limitation has been interpreted as requiring “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”.
The term "most other quantization errors" (emphasis added) in claim 6 is a relative term which renders the claim indefinite.  The term "most other quantization errors" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, what constitutes 
The recitation of “the quantization scale value is within a defined value distance of a theoretical quantization scale value” (emphasis added) in claim 7 lacks clarity because it is unclear what a “theoretical quantization scale value” means and because under broadest reasonable interpretation, a “theoretical” value does not exist. One of ordinary skill in the art would not be able to ascertain the metes and bounds of a claimed “theoretical” value. For examination purposes, “the quantization scale value is within a defined value distance of a theoretical quantization scale value” has been interpreted as “the quantization scale value is within a defined value distance of any quantization scale value.”
Claim 10 recites the limitation "the deep learning system" in line 3. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, "the deep learning system" has been interpreted as "a deep learning system".
Claim 13 recites “estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value” (emphasis added). It is unclear whether the emphasized limitation requires “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”, or if the limitation requires that the first and second statistical functions are both elements of the “linear or a non-linear function”. For examination purposes, the emphasized limitation has been interpreted as requiring “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”.
The term "most other quantization scale values" (emphasis added) in claim 14 is a relative term which renders the claim indefinite.  The term "most other quantization scale values" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of 
The term "substantially close" in claim 15 is a relative term which renders the claim indefinite.  The term "substantially close" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For examination purposes, quantization error of any value can be considered as “substantially close to being same as a minimum quantization error.” 
The recitation of “a minimum quantization error of the weights associated a theoretical quantization scale value” (emphasis added) in claim 15 lacks clarity because it is unclear what a “theoretical quantization scale value” means and because under broadest reasonable interpretation, a “theoretical” value does not exist. One of ordinary skill in the art would not be able to ascertain the metes and bounds of a claimed “theoretical” value. For examination purposes, “a minimum quantization error of the weights associated a theoretical quantization scale value” has been interpreted as “a minimum quantization error of the weights associated with any quantization scale value.”
The term "most other quantization errors" (emphasis added) in claim 20 is a relative term which renders the claim indefinite.  The term "most other quantization errors" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, what constitutes “most other”— is it over 75% or 90% of the other quantization errors? For examination purposes, “most other quantization errors” has been interpreted as “other quantization errors.”
Claim 20 recites “estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value” (emphasis added). It is unclear whether the emphasized limitation requires “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”, or if the limitation requires that the first and second statistical functions are both elements of the “linear or a non-linear function”. For examination purposes, the emphasized limitation has been interpreted as requiring “linear or a non-linear function of a first statistical function of a weight value of the weight” AND “linear or a non-linear function of a second statistical function of the weight value”.
Each dependent claim of the above claims is rejected based on the same rationale as the claim from which it depends.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-7, 9, 11-15, and 19-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 1 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for a set of weights, determining...a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values;
determining...a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; and
quantizing...weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass for a set of weights, determining a quantization scale as a function of a bit precision level (corresponds to evaluation with assistance of pen and paper), determining...a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error (corresponds to evaluation and judgement with assistance of pen and paper), quantizing...weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value (corresponds to evaluation and judgement with assistance of pen and paper as “quantizing,” under broadest reasonable interpretation, refers to the process of constraining an input from a continuous or otherwise large set of values to a discrete set).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 2 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantizing the weights further comprises, for the bit precision level, at least one of symmetrically or uniformly quantizing the weights to generate the quantized weights based on the quantization scale value, and 
wherein rounding is utilized to generate the quantized weights.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass quantizing the weights further comprises, for the bit precision level, at least one of 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 3 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the determining the quantization scale value comprises estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the second statistical function is different from the first statistical function.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass determining the quantization scale value (corresponds to evaluation and judgement with assistance of pen and paper), and estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the second statistical function is different from the first statistical function (estimating values based on a linear or non-linear function of a first and second statistical functions amounts to mathematical relationships and calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 4 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
determining...a first coefficient value associated with the first statistical function based on measurement data relating to weight quantization; and
determining...a second coefficient value associated with the second statistical function based on the measurement data.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“a system operatively coupled to a processor”, “by 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.


Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 5 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
updating...at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass updating at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization (corresponds to evaluation with assistance of pen and paper based on mathematical operations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 6 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization scale value is associated with a lower quantization error than all or at least most other quantization errors associated with other quantization scale values.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 7 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in accordance with the defined quantization criterion, the quantization scale value is within a defined value distance of a theoretical quantization scale value that is able to minimize the quantization error associated with quantizing the weights
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in accordance with the defined quantization criterion, the quantization scale value is within a defined value distance of a theoretical quantization scale value that is able to minimize the quantization error associated with quantizing the weights (corresponds to evaluation and judgement with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 9 is directed to a computer-implemented method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
reducing...a bit precision of the weights of the set of weights based on the applying of the quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“a system operatively coupled to a processor”, “by the system”, “computer-implemented”). For example, the above limitations in the context of this claim encompass reducing a bit precision of the weights of the set of weights based on the applying of the quantization scale value (corresponds to evaluation with assistance of pen and paper based on mathematical calculations of number of bits required to represent value).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a system operatively coupled to a processor”, “by the system”, “computer-implemented”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 11 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for a set of weights: determines a quantization scale based on a number of quantization levels, wherein the quantization scale comprises quantization scale values; and
determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights; and
quantizes weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the mere instructions to apply language (“a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”). For example, the above limitations in the context of this claim encompass for a set of weights: determines a quantization scale based on a number of quantization levels (corresponds to evaluation with assistance of pen and paper), determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights (corresponds to evaluation and judgement with assistance of pen and paper), quantizes weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value (corresponds to evaluation and judgement with assistance of pen and paper as “quantizing,” under broadest reasonable interpretation, refers to the process of constraining an input from a continuous or otherwise large set of values to a discrete set).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 12 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
at least one of symmetrically or uniformly quantizes the weights, and 
utilizes rounding to generate the quantized weights based on the quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the mere instructions to apply language (“a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”). For example, the above limitations in the context of this claim 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 13 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the set of weights comprises a weight...estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the first statistical function is different from the second statistical function.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the mere instructions to apply language (“a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”). For example, the above limitations in the context of this claim encompass wherein the set of weights comprises a weight...estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the first statistical function is different from the second statistical function (estimating values based on a linear or non-linear function of a first and second statistical functions amounts to mathematical relationships and calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a memory that stores computer-executable components; and 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 14 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
wherein the quantization scale value is associated with a smaller quantization error as compared to other quantization errors associated with all or at least most other quantization scale values.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the mere instructions to apply language (“a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 15 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be substantially close to being same as a minimum quantization error of the weights associated a theoretical quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the mere instructions to apply language (“a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”). For example, the above limitations in the context of this claim encompass estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be substantially close to being same as a minimum quantization error of the weights associated a theoretical quantization scale value (corresponds to evaluation and judgement with assistance of pen and paper).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising:...a quantizer management component...a quantizer component”, as drafted, amount to mere instructions to implement an abstract idea on a computer, or 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 19 is directed to a computer program product, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
for a set of weights, determine a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values;
determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error; and
quantize weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the generic computer components language (“A computer program product that facilitates quantizing weights, the 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “A computer program product that facilitates quantizing weights, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions are executable by a processor to cause the processor to”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 20 is directed to a computer program product, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: Each of the following limitations:
estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, wherein the first statistical function is different from the second statistical function, and 
wherein the quantization scale value is associated with a lower quantization error than at least most other quantization errors associated with other quantization scale values of the quantization scale values.
as drafted, under the broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the generic computer components language (“A computer program product that facilitates quantizing weights, the computer program product comprising a computer readable storage medium 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “A computer program product that facilitates quantizing weights, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions are executable by a processor to cause the processor to”, as drafted, is/are reciting a generic computer component. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over WANG et al. (US 2019/0050710 A1) in view of CHOI et al. (US 2019/0138882 A1).
Regarding Claim 1,
WANG et al. teaches A computer-implemented method, comprising: for a set of weights, determining, by a system operatively coupled to a processor, a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values (pg. 4 [0027] teaches computer-implemented and processor; Fig. 2 teaches a system in the memory coupled to the processor; pg. 8 [0066] teaches determining a quantization scale based on a quantization scale as a function of a bit precision level; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized weights based on quantization bit-width (scale); also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters);
...quantizing, by the system, weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value (pg. 4 [0027] teaches computer-implemented and processor; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches quantizing weights based on quantization bit-width (scale) values; also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters; Fig. 4 teaches the neural network has multiple layers).
WANG et al. does not appear to explicitly teach determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error.
However, CHOI et al. teaches determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate determining, by the system, a quantization scale value that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 2,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
WANG et al. further teaches wherein the quantizing the weights further comprises, for the bit precision level, at least one of symmetrically or uniformly quantizing the weights to generate the quantized weights based on the quantization scale value (pg. 6 [0057]: “When applying the 8-bit uniform quantization to the weight parameters, the weight parameters are still expressed in full-precision format, but the total number of such response levels are bound by 28- 1, with half in the positive and half in the negative. In each iteration of the training phase, this quantization function is applied in each layer in the forward pass” teaches the quantizing can further comprise of for a specific bit precision level, applying uniform quantization and symmetric quantization (“with half in the positive and half in the negative”) based on the 8-bit quantization (quantization scale value)).
CHOI et al. further teaches wherein rounding is utilized to generate the quantized weights (pg. 4 [0051] teaches rounding is utilized to generate the quantized weights).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein rounding is utilized to generate the quantized weights as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 3,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the determining the quantization scale value comprises estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the second statistical function is different from the first pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) and how that factor affects the mean square quantization error (MSQE), which corresponds to estimating the quantization scale value; pg. 4 [0053]:  
    PNG
    media_image1.png
    462
    520
    media_image1.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image2.png
    252
    510
    media_image2.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the determining the quantization scale value comprises estimating the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the second statistical function is different from the first statistical function as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 4,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 3.
CHOI et al. further teaches further comprising: determining, by the system, a first coefficient value associated with the first statistical function based on measurement data relating to weight quantization (pg. 4 [0053]:  
    PNG
    media_image1.png
    462
    520
    media_image1.png
    Greyscale
teaches determining a regularization coefficient (first coefficient) associated with the first statistical function (Equation (6)) based on weight quantization data); and
determining, by the system, a second coefficient value associated with the second statistical function based on the measurement data (pg. 5 [0059]:  

    PNG
    media_image2.png
    252
    510
    media_image2.png
    Greyscale
 teaches determining a regularization coefficient (second coefficient) associated with the second statistical function (Equation (10)) based on weight quantization data).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: determining, by the system, a first coefficient value associated with the first statistical function based on measurement data relating to weight quantization; and determining, by the system, a second coefficient value associated with the second statistical function based on the measurement data as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 5,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 4.
CHOI et al. further teaches further comprising: updating, by the system, at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization (pg. 4 [0053]:  
    PNG
    media_image1.png
    462
    520
    media_image1.png
    Greyscale
teaches increasing (updating) the regularization coefficient (first coefficient) associated with the first statistical function (Equation (6)) based on weight quantization data to generate a third coefficient (the increased coefficient); pg. 5 [0059]:  

    PNG
    media_image2.png
    252
    510
    media_image2.png
    Greyscale
 teaches learning (updating) the regularization coefficient (second coefficient) associated with the second statistical function (Equation (10)) based on weight quantization data to generate a fourth coefficient (the learned/updated coefficient)).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate further comprising: updating, by the system, at least one of the first coefficient value or the second coefficient value to generate at least one of a third coefficient value associated with the first statistical function or a fourth coefficient value associated with the second statistical function based on additional measurement data relating to the weight quantization as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 6,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the quantization scale value is associated with a lower quantization error than all or at least most other quantization errors associated with other quantization scale values (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights, minimizing corresponds to determining an error that is lower than other errors).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantization scale value is associated with a lower quantization error than all or at least most other quantization errors associated with other quantization scale values as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 7,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
CHOI et al. further teaches wherein the determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a learnable scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights wherein learning the scaling factor includes determining, in accordance with the MSQE (defined quantization criterion), that the scaling factor selected is, or is close to (within a distance), the scaling factor that would minimize the quantization error; the scaling factor being “learnable” means there is an optimization process to learn which scaling factor would result in the minimization of error).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the determining the quantization scale value comprises estimating the quantization scale value based on the defined quantization criterion, wherein, in accordance with the defined quantization criterion, the quantization scale value is within a defined value distance of a theoretical quantization scale value that is able to minimize the quantization error associated with quantizing the weights as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes 
Regarding Claim 8,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 1.
WANG et al. further teaches further comprising: applying, by the system, the quantization scale value to generate the quantized weights to facilitate training a deep learning system or producing an inference by the deep learning system (pg. 2 [0019]: “during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits)” teaches quantizing weights based on a quantization scale value to facilitate training a deep learning model; pg. 2 [0020] teaches producing inference by the deep learning system; Fig. 2 teaches a system in the memory coupled to the processor).
Regarding Claim 9,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 6.
WANG et al. further teaches further comprising: reducing, by the system, a bit precision of the weights of the set of weights based on the applying of the quantization scale value (pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches reducing Fig. 2 teaches a system in the memory coupled to the processor).
Regarding Claim 10,
WANG et al. in view of CHOI et al. teaches the computer-implemented method of claim 6.
WANG et al. further teaches further comprising: reducing, by the system, at least one of a memory usage or a communication overhead utilized to transfer data between layers of the deep learning system based on the applying of the quantization scale value (pg. 8 [0072]: “The device reduces (504) a footprint (e.g., memory and computation cost) of the first neural network model on the computing device (e.g., both during storage, and, optionally, during deployment of the model) by using respective reduced bit-widths for storing the respective sets of parameters of different layers of the first neural network model, wherein: preferred values (e.g., optimal bit-width values that have been identified using the techniques described herein) of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold (e.g., as measured by the Jensen-Shannon Divergence described herein) is met by respective response statistics of the two or more layers” teaches reducing footprint in the form of memory usage utilized in the deployment of the neural network; pg. 3 [0024]: “As shown in FIG. 1, once the reduced, adaptive bit-width model 112 is provided to a deployment platform 116 on the model deployment system 104, real-world input data or testing data 114 is fed to the reduced, adaptive bit-width model 112, and final prediction result 118 is generated by the reduced, adaptive bit-width” teaches the reduced adaptive bit-width model (neural network) based on the bit-width (quantization scale value), when deployed, uses the model to make predictions (which would require passing input throughout the layers of the network, and thus requiring transferring of data between Fig. 2 teaches a system in the memory coupled to the processor).
Regarding Claim 11,
WANG et al. teaches A system, comprising: a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising (Fig. 2 teaches various computer-executable components stored in memory wherein a processor is coupled to a memory):
a quantizer management component that, for a set of weights: determines a quantization scale based on a number of quantization levels, wherein the quantization scale comprises quantization scale values (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer management component); pg. 8 [0066] teaches determining a quantization scale based on a number of quantization levels; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized weights based on quantization bit-width (scale); also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters);
Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches quantizing weights based on quantization bit-width (scale) values; also see pg. 6 [0057] for 8-bit uniform quantization to the weight parameters; Fig. 4 teaches the neural network has multiple layers).
WANG et al. does not appear to explicitly teach determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights.
However, CHOI et al. teaches determines, based on a defined quantization criterion relating to the quantization error, a quantization scale value that reduces a quantization error of weights of the set of weights (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.

One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 12,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
WANG et al. further teaches wherein the quantizer component at least one of symmetrically or uniformly quantizes the weights (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 6 [0057]: “When applying the 8-bit uniform quantization to the weight parameters, the weight parameters are still expressed in full-precision format, but the total number of such response levels are bound by 28- 1, with half in the positive and half in the negative. In each iteration of the training phase, this quantization function is applied in each layer in the forward pass” teaches the quantizing can further comprise of for a specific bit precision level, applying uniform quantization and symmetric quantization (“with half in the positive and half in the negative”) based on the 8-bit quantization (quantization scale value)).
CHOI et al. further teaches utilizes rounding to generate the quantized weights based on the quantization scale value (pg. 4 [0051] teaches rounding is utilized to generate the quantized weights based on scaling factor (quantization scale value)).

It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate utilizes rounding to generate the quantized weights based on the quantization scale value as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 13,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the set of weights comprises a weight, and wherein the quantizer management component estimates the quantization scale value to apply to the weight as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, and wherein the first statistical function is different from the second statistical function (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) and how that factor affects the mean square quantization error (MSQE), which corresponds to estimating the quantization scale value; pg. 4 [0053]:  
    PNG
    media_image1.png
    462
    520
    media_image1.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image2.png
    252
    510
    media_image2.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.

One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 14,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the quantization scale value is associated with a smaller quantization error as compared to other quantization errors associated with all or at least most other quantization scale values (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights, minimizing corresponds to determining an error that is lower/smaller than other errors).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.

One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 15,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
CHOI et al. further teaches wherein the quantizer management component estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be substantially close to being same as a minimum quantization error of the weights associated a theoretical quantization scale value (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a learnable scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights wherein learning the scaling factor includes determining, in accordance with the MSQE (defined quantization criterion), that the scaling factor selected would minimize the quantization error (thus rendering the error to be “close” to being the 
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate wherein the quantizer management component estimates the quantization scale value based on the defined quantization criterion, wherein the quantization scale value reduces the quantization error of the weights to have the quantization error be substantially close to being same as a minimum quantization error of the weights associated a theoretical quantization scale value as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 16,
WANG et al. in view of CHOI et al. teaches the system of claim 11.
WANG et al. further teaches wherein the quantizer component applies the quantization scale value to generate the quantized weights to facilitate training a deep learning model or generating an inference by the deep learning model (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 2 [0019]: “during training, integer (INT) weight regularization and 8-bit quantization techniques are applied to push the values of the full-precision parameters of the deep learning model 106 toward their corresponding integer values, and reduce the value ranges of the parameters such that they fall within the dynamic range of a predefined reduced maximum bit-width (e.g., 8 bits)” teaches quantizing weights based on a quantization scale value to facilitate training a deep learning model; pg. 2 [0020] teaches producing inference by the deep learning system).
Regarding Claim 17,
WANG et al. in view of CHOI et al. teaches the system of claim 16.
WANG et al. further teaches wherein the quantizer component reduces a bit precision associated with the weights based on the application of the quantization scale value. (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches reducing the bit-width (bit precision) based on applying the quantization scale value (for example, reducing to 6-bit from the original bit-width)).
Regarding Claim 18,
WANG et al. in view of CHOI et al. teaches the system of claim 16.
WANG et al. further teaches wherein the quantizer component reduces at least one of a memory usage or a communication overhead used to transfer data between layers of the deep learning model based on the application of the quantization scale value (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 8 [0072]: “The device reduces (504) a footprint (e.g., memory and computation cost) of the first neural network model on the computing device (e.g., both during storage, and, optionally, during deployment of the model) by using respective reduced bit-widths for storing the respective sets of parameters of different layers of the first neural network model, wherein: preferred values (e.g., optimal bit-width values that have been identified using the techniques described herein) of the respective reduced bit-widths are determined through multiple iterations of forward propagation through the first neural network model using a validation data set while each of two or more layers of the first neural network model is expressed with different degrees of quantization corresponding to different reduced bit-widths until a predefined information loss threshold (e.g., as measured by the Jensen-Shannon Divergence described herein) is met by respective response statistics of the two or more layers” teaches reducing footprint in the form of memory usage utilized in the deployment of the neural network; pg. 3 [0024]: “As shown in FIG. 1, once the reduced, adaptive bit-width model 112 is provided to a deployment platform 116 on the model deployment system 104, real-world input data or testing data 114 is fed to the reduced, adaptive bit-width model 112, and final prediction result 118 is generated by the reduced, adaptive bit-width” teaches the reduced adaptive bit-width model (neural network) based on the bit-width (quantization scale value), when deployed, uses the model to make predictions (which would require passing input throughout the layers of the network, and thus requiring transferring of data between layers, see Fig. 4 for deep learning model)).
Regarding Claim 19,
WANG et al. teaches A computer program product that facilitates quantizing weights, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions are executable by a processor to cause the processor to (pg. 1 [0006] teaches non-transitory computer readable storage medium with instructions executable by processor; pg. 8 [0067] teaches weight quantization):
for a set of weights, determine a quantization scale as a function of a bit precision level, wherein the quantization scale comprises quantization scale values (pg. 4 [0027] teaches computer-implemented pg. 8 [0066] teaches determining a quantization scale based on a quantization scale as a function of a bit precision level; pg. 9 [0074]: “In some embodiments, a first layer (e.g., i=2) of the plurality of layers in the reduced neural network model (e.g., model 112) has a first reduced bit-width (e.g., 4-bit) that is smaller than the original bit-width (e.g., 32-bit) of the first neural network model, a second layer (e.g., i=3) of the plurality of layers in the reduced neural network model (e.g., model 112) has a second reduced bit-width (e.g., 6-bit) that is smaller than the original bit-width of the first neural network model, and the first reduced bit-width is distinct from the second reduced bit-width in the reduced neural network model” teaches various quantization scale values for a set of weights; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches obtaining quantized weights based on quantization bit-width (scale));
...quantize weights of at least a portion of a layer of the set of weights to generate quantized weights based on the quantization scale value (pg. 4 [0027] teaches computer-implemented and processor; pg. 8 [0067]: “The result of the above process is the set of quantized weights Wopt,i with the optimal quantization bit-width(s) for each layer i, and the set of quantized bias bopt,i with the optimal quantization bit-width(s) for each layer i. The adaptive bit-width model 112 is thus obtained” teaches quantizing weights based on quantization bit-width (scale) values; Fig. 4 teaches the neural network has multiple layers).
WANG et al. does not appear to explicitly teach determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error.
However, CHOI et al. teaches determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights in accordance with a defined quantization criterion relating to the quantization error, which is the particular measure of mean square quantization error (MSQE) for weights; Figs. 1-2 teach system for weight quantization).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate determine a quantization scale value of the quantization scale values that reduces a quantization error of the set of weights in accordance with a defined quantization criterion relating to the quantization error as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Regarding Claim 20,
WANG et al. in view of CHOI et al. teaches the computer program product of claim 19.
WANG et al. further teaches wherein to facilitate the determining the quantization scale value, the program instructions are executable by a processor to cause the processor to (pg. 1 [0006] teaches pg. 8 [0066] teaches determining a quantization scale).
CHOI et al. further teaches estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, wherein the first statistical function is different from the second statistical function (Fig. 2 element 218 teaches Model Generation Module (corresponds to quantizer component); pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) and how that factor affects the mean square quantization error (MSQE), which corresponds to estimating the quantization scale value; pg. 4 [0053]:  
    PNG
    media_image1.png
    462
    520
    media_image1.png
    Greyscale
teaches applying a non-linear function of a first statistical function (cost function Equation (6)) of a weight value; pg. 5 [0059]:  

    PNG
    media_image2.png
    252
    510
    media_image2.png
    Greyscale
 teaches applying a non-linear function of a second statistical function (cost function Equation (10)) of a weight value; Equation (6) and Equation (10) are different), and


 (pg. 1 [0005] “optimizing a weight scaling factor for the quantized weights and an activation scaling factor for quantized activations, and wherein the quantized weights are quantized using the optimized weight scaling factor” and pg. 2 [0018]: “During training, a scaling factor for weights in each layer is learnable as well such that the present system optimizes the scaling factor to minimize the MSQE” teach determining a scaling factor for weights (quantization scale value) that minimizes (reduces) the quantization error for weights, minimizing corresponds to determining an error that is lower than other errors).
WANG et al. and CHOI et al. are analogous art to the claimed invention because they are directed to weight quantization.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate estimate the quantization scale value to apply to a weight of the set of weights as a linear or a non-linear function of a first statistical function of a weight value of the weight and a second statistical function of the weight value, wherein the first statistical function is different from the second statistical function, and wherein the quantization scale value is associated with a lower quantization error than at least most other quantization errors associated with other quantization scale values of the quantization scale values as taught by CHOI et al. to the disclosed invention of WANG et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to “[optimize] the scaling factor to minimize the MSQE” to “[obtain] low-precision neural networks having quantized weights” because “low-precision weights and activations are preferred and sometimes necessary for efficient processing with reduced power consumption when computation and power budgets are limited” (CHOI et al. pg. 1 [0004] & pg. 2 [0017]-[0018]).
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Baum et al. (US 2018/0285736 A1) teaches data driven quantization optimization of weights and input data in an artificial neural network (ANN).
LEE et al. (US 2018/0341857 A1) teaches adjusting quantization levels assigned to the data values based on the weighted entropy.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 






/YING YU CHEN/               Examiner, Art Unit 2125