DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
	Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/30/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: 
In paragraph [0002], line 1, “relates to r the quantization” should read “relates to the quantization”
In paragraph [0003], line 12, “that they are implement” should read “that they are implementing.”
In paragraph [00273], line 2, “initializes a weigh of a current layer” should read “initializes a weight of a current layer”
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4 and 6-20  are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 4 recites the limitation “the quantized weight” in line 4. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the quantized weight” has been interpreted as “the weight” in reference to “a weight” in line 5 of claim 1.
Claim 6 recites the limitation “the calculated loss” in line 12. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated loss” has been interpreted as “the loss” in reference to “a loss” in line 7.
Claim 7 recites the limitation “the quantized weight” in line 5. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the quantized weight” has been interpreted as “a quantized weight”.
Claim 7 recites the limitation “the quantized activation map” in line 8. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the quantized activation map” has been interpreted as “a quantized activation map”.
Claim 10 recites the limitation “the updated first representation bit number” in line 8. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the updated first representation bit number” has been interpreted as “an updated first representation bit number”.
Claim 10 recites the limitation “the updated second representation bit number” in line 9. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the updated second representation bit number” has been interpreted as “an updated second representation bit number”.
Claim 10 recites the limitation “the calculated loss” in line 12. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated loss” has been interpreted as “the loss” in reference to “a loss” in line 8.
Claim 10 recites the limitation “the updated weight” in line 14. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the updated weight” has been interpreted as “an updated weight”.
Claim 10 recites the limitation “the updated weight quantization parameter” in line 14. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the updated weight quantization parameter” has been interpreted as “an updated weight quantization parameter”.
Claim 10 recites the limitation “the updated activation quantization parameter” in lines 14-15. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the updated activation quantization parameter” has been interpreted as “an updated activation quantization parameter”.
Claim 16 recites the limitation “the calculated loss” in line 9. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated loss” has been interpreted as “the loss” in reference to “a loss” in line 6.
Claim 19 recites the limitation “the calculated loss” in line 11. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the calculated loss” has been interpreted as “the loss” in reference to “a loss” in line .
Claim 20 recites the limitation “the quantized weight” in line 4. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the quantized weight” has been interpreted as “a quantized weight”.
	 Dependent claims 7-9 are rejected based on being directly or indirectly dependent on rejected claim 6.
Dependent claims 11-15 are rejected based on being directly or indirectly dependent on rejected claim 10.
Dependent claims 17 and 18 are rejected based on being directly or indirectly dependent on rejected claim 16.
Dependent claim 20 is rejected based on being directly or indirectly dependent on rejected claim 19.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“outputting an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer”
“outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter”
As drafted, under their broadest reasonable interpretations, cover mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The above limitations in the context of this claim encompass outputting the output activation map by performing a convolution operation with respect to the input activation map and a weight quantized with a first representation bit number (corresponds to mathematical calculations); and outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)) or insignificant extra-solution activity (See MPEP 2106.05(g)). The limitation:
“processor-implemented”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). The limitation:
“inputting an input activation map into a current layer of the neural network”
As drafted, is an additional element that corresponds to insignificant extra-solution activity. In particular, the additional elements are merely directed towards data gathering. See MPEP 2106.05(g). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 1 of outputting a quantized activation map by quantizing the output activation map. The limitation of claim 2 further limits the limitation of claim 1 by further defining what the activation quantization parameter comprises. The above limitation in the context of this claim encompasses outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter, where the activation quantization parameter comprises a first threshold and a second threshold related to the output activation map, the first threshold indicating an upper limit of an activation map section with respect to the output activation map, and the second threshold indicating a lower limit of the activation map section (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The recitation of the additional element in claim 1 of a processor, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “inputting an input activation map …” amounts to no more than insignificant extra-solution activity for mere data gathering. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 1 of outputting a quantized activation map by quantizing the output activation map. The limitation of claim 3 further limits the limitation of claim 1 by further defining what the activation quantization parameter comprises. The above limitation in the context of this claim encompasses outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter, where the activation quantization parameter comprises a first median value and a first difference value with respect to the output activation map, the first difference value indicating a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The recitation of the additional element in claim 1 of a processor, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “inputting an input activation map …” amounts to no more than insignificant extra-solution activity for mere data gathering. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the outputting of the output activation map comprises performing the convolution operation by performing a multiplication operation and an accumulation operation, or performing a bit-wise operation with respect to the input activation map and the quantized weight”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 1 of outputting an output activation map by performing a convolution operation. The limitation of claim 4 further limits the limitation of claim 1 by further defining what the activation quantization parameter comprises. The above limitation in the context of this claim encompasses outputting an output activation map by performing the convolution operation with respect to the input activation map and the quantized weight by performing a multiplication operation and an accumulation operation, or by performing a bit-wise operation (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The recitation of the additional element in claim 1 of a processor, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “inputting an input activation map …” amounts to no more than insignificant extra-solution activity for mere data gathering. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the first representation bit number and the second representation bit number are equal”
As drafted, under its broadest reasonable interpretation, is part of the abstract ideas of claim 1 of outputting an output activation map by performing a convolution operation and outputting a quantized activation map by quantizing the output activation map. The limitation of claim 5 further limits the limitations of claim 1 by further defining first representation bit number and the second representation bit number. The above limitation in the context of this claim encompasses outputting the output activation map by performing a convolution operation with respect to the input activation map and a weight quantized with a first representation bit number and outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter, where the first representation bit number and the second representation bit number are equal (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The recitation of the additional element in claim 1 of a processor, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “inputting an input activation map …” amounts to no more than insignificant extra-solution activity for mere data gathering. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter”
“calculating a loss based on the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter with training data”
“updating the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter based on the calculated loss”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter); calculating a loss using training data based on the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter (corresponds to mathematical calculations); and using the calculated loss to update the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can used the calculated loss to update the values of the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)). The limitation:
“processor-implemented”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the calculating of the loss comprises: quantizing the weight based on the first representation bit number and the weight quantization parameter”
“outputting the output activation map by performing a convolution operation with respect to the quantized weight and an input activation map input into the current layer”
“quantizing the output activation map with the second representation bit number and the activation quantization parameter”
“calculating the loss based on the quantized activation map”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass quantizing the weight based on the first representation bit number and the weight quantization parameter (corresponds to mathematical calculations); outputting the output activation map by performing a convolution operation with respect to the quantized weight and an input activation map (corresponds to mathematical calculations); quantizing the output activation map with the second representation bit number and the activation quantization parameter (corresponds to mathematical calculations); and calculating the loss based on the quantized activation map (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 6 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 6 of initializing a weight, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter. The limitation of claim 8 further limits the limitation of claim 6 by further defining what the activation quantization parameter and the weight quantization parameter comprise. The above limitation in the context of this claim encompasses initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter, where the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter, where the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 6 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 6 of initializing a weight, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter. The limitation of claim 9 further limits the limitation of claim 6 by further defining what the activation quantization parameter and the weight quantization parameter comprise. The above limitation in the context of this claim encompasses initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter, where the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter, where the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 6 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“initializing a weight of a current layer of a first neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter”
“updating the first representation bit number and the second representation bit number based on an accuracy calculated in a previous iteration”
“calculating a loss based on the weight, the updated first representation bit number, the weight quantization parameter, the updated second representation bit number, and the activation quantization parameter based on training data”
“updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss”
“calculating an accuracy to be implemented in a subsequent iteration with validity data based on the updated weight, the updated weight quantization parameter, and the updated activation quantization parameter”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter); updating the first representation bit number and second representation bit number based on a previously calculated accuracy (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the previously calculated accuracy to update the values of the first and second representation bit numbers); calculating a loss using training data based on the weight, the updated first representation bit number, the weight quantization parameter, the updated second representation bit number, and the activation quantization parameter (corresponds to mathematical calculations); using the calculated loss to update the weight, the weight quantization parameter, and the activation quantization parameter (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can used the calculated loss to update the values of the weight, the weight quantization parameter, and the activation quantization parameter); and calculating an accuracy to be implemented in a subsequent iteration using validity data based on the updated weight, the updated weight quantization parameter, and the updated activation quantization parameter (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)). The limitation:
“processor-implemented”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the updating of the first representation bit number and the second representation bit number comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 10 of updating the first representation bit number and the second representation bit number. The limitation of claim 11 further limits the limitation of claim 10 by further defining what the updating the first representation bit number and the second representation bit number comprises. The above limitation in the context of this claim encompasses updating the first representation bit number and second representation bit number based on a previously calculated accuracy by increasing or decreasing the first representation but number and the second representation bit number by a preset bit number based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the previously calculated accuracy to update the values of the first and second representation bit numbers by increasing or decreasing the first and second representation bit numbers by a preset bit number based on the calculated loss).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 10 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 12 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the updating of the first representation bit number and the second representation bit number comprises updating the first representation bit number and the second representation bit number with a second neural network based on the calculated loss”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 10 of updating the first representation bit number and the second representation bit number. The limitation of claim 12 further limits the limitation of claim 10 by further defining what the updating the first representation bit number and the second representation bit number comprises. The above limitation in the context of this claim encompasses updating the first representation bit number and second representation bit number based on a previously calculated accuracy by using a second neural network to update the first representation bit number and the second representation bit number based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the previously calculated accuracy to update the values of the first and second representation bit numbers by using a second neural network to update the first and second representation bit values based on the calculated loss).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 10 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.


Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the updating of the first representation bit number and the second representation bit number comprises updating the first representation bit number and the second representation bit number based on a state and a reward including the accuracy calculated in the previous iteration, and wherein the state includes a quantization error of each of the activation quantization parameter and the weight quantization parameter, a distribution of the weight, and a distribution of the output activation map, and the reward includes the accuracy”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 10 of updating the first representation bit number and the second representation bit number. The limitation of claim 13 further limits the limitation of claim 10 by further defining what the updating the first representation bit number and the second representation bit number comprises. The above limitation in the context of this claim encompasses updating the first representation bit number and second representation bit number based on a previously calculated accuracy by updating the first representation bit number and the second representation bit number based on a state and a reward including the accuracy calculated in the previous iteration, and wherein the state includes a quantization error of each of the activation quantization parameter and the weight quantization parameter, a distribution of the weight, and a distribution of the output activation map, and the reward includes the accuracy  (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the previously calculated accuracy to update the values of the first and second representation bit numbers by updating the first and second representation bit numbers based on a state and a reward, wherein the state includes a quantization error of each of the activation quantization parameter and the weight quantization parameter, a distribution of the weight, and a distribution of the output activation map, and the reward includes the accuracy).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 10 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 10 of initializing a weight, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter. The limitation of claim 14 further limits the limitation of claim 10 by further defining what the activation quantization parameter and the weight quantization parameter comprise. The above limitation in the context of this claim encompasses initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter, where the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter, where the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 10 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 10 of initializing a weight, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter. The limitation of claim 15 further limits the limitation of claim 10 by further defining what the activation quantization parameter and the weight quantization parameter comprise. The above limitation in the context of this claim encompasses initializing a weight of a current layer of the neural network, a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter, where the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight of the current layer, a first representation bit number, a weight quantization parameter, a second representation bit number, and an activation quantization parameter, where the activation quantization parameter includes a first median value and a first difference value with respect to the output activation map, wherein the first difference value indicates a half of a difference between a first threshold and a second threshold, and the first median value indicates a middle value of the first threshold and the second threshold, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and the weight quantization parameter includes a second median value and a second difference value of an absolute value of the weight of the current layer, wherein the second difference value indicates a half of a difference between a third threshold and a fourth threshold, and the second median value indicates a middle value of the third threshold and the fourth threshold, wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 10 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 16,
Claim 16 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 16 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“initializing a first representation bit number related to a weight of a current layer of a pre-trained first neural network and a second representation bit number related to an output activation map output from the current layer”
“calculating a loss based on the pre-trained first neural network, the first representation bit number, and the second representation bit number based on training data”
“updating the first representation bit number and the second representation bit number based on the calculated loss”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass initializing a first representation bit number related to a weight of a current layer of a pre-trained neural network and a second representation bit number related to an output activation map from the current layer (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a first representation bit number and a second representation bit number); using training data to calculate a loss based on the pre-trained neural network, the first representation bit number, and the second representation bit number (corresponds to mathematical calculations); and updating the first representation bit number and the second representation bit number based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the calculated loss to update the first and second representation bit numbers).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)). The limitation:
“processor-implemented”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the updating comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 16 of updating the first representation bit number and the second representation bit number. The limitation of claim 17 further limits the limitation of claim 16 by further defining what the updating the first representation bit number and the second representation bit number comprises. The above limitation in the context of this claim encompasses updating the first representation bit number and second representation bit number by increasing or decreasing the first representation but number and the second representation bit number by a preset bit number based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can the first and second representation bit numbers by increasing or decreasing the first and second representation bit numbers by a preset bit number based on the calculated loss).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 16 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the updating comprises updating the first representation bit number and the second representation bit number with a second neural network based on the calculated loss”
As drafted, under its broadest reasonable interpretation, is part of the abstract idea of claim 16 of updating the first representation bit number and the second representation bit number. The limitation of claim 18 further limits the limitation of claim 16 by further defining what the updating the first representation bit number and the second representation bit number comprises. The above limitation in the context of this claim encompasses updating the first representation bit number and second representation bit number by using a second neural network to update the first representation bit number and the second representation bit number based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use a second neural network to update the first and second representation bit values based on the calculated loss).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply language (See MPEP 2106.05(f)). The recitation of the additional element in claim 16 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“initializing a weight of a current layer of the neural network, a weight quantization parameter related to the weight, and an activation quantization parameter related to an output activation map output from the current layer”
“calculating a loss based on a pre-trained first representation bit number related to the weight, a pre-trained second representation bit number related to the output activation map, the weight, the weight quantization parameter, and the activation quantization parameter based on training data”
“updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) of mere instructions to apply language (See MPEP 2106.05(f)). The above limitations in the context of this claim encompass initializing a weight of a current layer of a neural network, a weight quantization parameter related to the weight, and an activation quantization parameter related to an output activation map (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can initialize (i.e. set/write initial values) a weight, a weight quantization parameter, and an activation quantization parameter); using training data to calculate a loss based on a pre-trained first representation bit number, a pre-trained second representation bit number, the weight, the weight quantization parameter, and the activation quantization parameter (corresponds to mathematical calculations); and updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss (corresponds to evaluation and judgement; in particular, a human, with the assistance of pen and paper, can use the calculated loss to update the weight, the weight quantization parameter, and the activation quantization parameter).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)). The limitation:
“processor-implemented”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a method of training a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“quantizing the weight based on the updated weight quantization parameter and the pre-trained first representation bit number”
As drafted, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The above limitation in the context of this claim encompasses quantizing the weight based on the updated weight quantization parameter and the pre-trained first representation bit number (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP
2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The limitation:
“storing the quantized weight and the activation quantization parameter”
As drafted, is an additional element that corresponds to insignificant extra-solution activity. In particular, the additional element are merely directed towards data gathering. See MPEP 2106.05(g). Furthermore, the recitation of the additional element in claim 19 of a processor, as drafted, is reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “storing …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity… iv. Storing and retrieving information in memory).

Regarding Claim 21,
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 21 is directed to a data processing method in a neural network, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“perform the data processing method of claim 1”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The above limitations in the context of this claim encompass
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitation:
“A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor”
As drafted, is an additional element that amounts to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). Furthermore, the recitation of the additional element in claim 1 of a processor, as drafted, are reciting mere instructions to apply language such that it amounts to no more than mere instructions to apply the exceptions. In addition, the additional element of “inputting an input activation map …” amounts to no more than insignificant extra-solution activity for mere data gathering. Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe a processor for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “inputting an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 22,
Claim 22 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 22 is directed to a data processing apparatus, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer”
“output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter”
As drafted, under their broadest reasonable interpretations, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) and mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply language (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)). The above limitations in the context of this claim encompass
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to apply (See MPEP 2106.05(f)) or insignificant extra-solution activity. The limitations:
“at least one processor”
“at least one memory”
As drafted, are additional elements that amount to no more than mere instructions to apply the exception for the abstract ideas. See MPEP 2106.05(f). The limitations:
“store instructions to be executed by the at least one processor and a neural network”
“input an input activation map into a current layer included in the neural network”
As drafted, are additional elements that correspond to insignificant extra-solution activity. In particular, the additional elements are merely directed towards data gathering.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply an exception” (I.e. the additional elements describe at least one processor and at least one memory for applying the abstract ideas) or insignificant extra-solution activity (i.e. mere data gathering). Furthermore, the “store …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity… iv. Storing and retrieving information in memory). Additionally, the “input an input activation map …” limitation is insignificant extra-solution activity that is well-understood, routine, and conventional according to MPEP 2106.05(d) as shown by Ferdman et al. (WO 2018/071546 A1) in specification paragraph [0054]: “the convolutional layer 114 reads the second intermediate feature maps as the input feature maps, and processes these input feature maps in order to produce output feature maps as the output 104 according to, for example, conventional CNN layer-by-layer processing” (Also see [0062]). Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 5, 21, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Singh et al. (US 2018/0189981 A1) in view of Yao (US 2018/0046894 A1).
Regarding Claim 1,
	Singh et al. teaches a processor-implemented data processing method in a neural network ([0152]: "FIG. 27 is a flow diagram illustrating a method of performing CNN operations, according to an embodiment. In one embodiment the method of FIG. 27 is performed via the compute architecture 1900 of FIG. 19, although differing compute architectures can be configured to perform the illustrated method" teaches a method for performing CNN operations (data processing method in a neural network) implemented by a compute architecture (processor)), the data processing method comprising: 
inputting an input activation map into a current layer of the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer).
	Singh et al. does not appear to explicitly teach outputting an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer; and outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter.
	However, Yao teaches outputting an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)); and 
outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
	Singh et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate outputting an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer; and outputting a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter as taught by Yao to the disclosed invention of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 2,
	Singh et al. in view of Yao teaches the method of claim 1.
	Additionally, Yao further teaches wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has two thresholds related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), 
wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has an upper limit threshold of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
the second threshold indicates a lower limit of the activation map section ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has a lower limit threshold of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)).
	Singh et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section as taught by Yao to the disclosed invention of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 4,
Singh et al. in view of Yao teaches the method of claim 1.
	Additionally, Yao further teaches wherein the outputting of the output activation map comprises performing the convolution operation by performing a multiplication operation and an accumulation operation, or performing a bit-wise operation with respect to the input activation map and the quantized weight ([0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Equation 1; [0045]-[0046]: "The CONV layer can be expressed with Equation 1:

    PNG
    media_image1.png
    37
    298
    media_image1.png
    Greyscale

where gi,j is the convolutional kernel applied to j-th input feature map and i-th output feature map" teaches that the convolution operation involves a multiplication operation and a summation (accumulation operation) with respect to the input feature map and the convolutional kernel (i.e. the weights optimized during weight quantization)).
	Singh et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the outputting of the output activation map comprises performing the convolution operation by performing a multiplication operation and an accumulation operation, or performing a bit-wise operation with respect to the input activation map and the quantized weight as taught by Yao to the disclosed invention of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 5,
Singh et al. in view of Yao teaches the method of claim 1.
	Additionally, Yao further teaches wherein the first representation bit number and the second representation bit number are equal (Table 2; teaches that the weight bits (first representation bit number) and the data bits (second representation bit number) can be equal).
	Singh et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the first representation bit number and the second representation bit number are equal as taught by Yao to the disclosed invention of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 21,
Singh et al. in view of Yao teaches the method of claim 1.
	Additionally, Singh et al. further teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the data processing method of claim 1 ([0169]: "Embodiments may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein" teaches a non-transitory machine readable medium storing machine-executable instructions that, when executed by a computer or other electronic device (processor), performs the embodied method).
Regarding Claim 22,
Singh et al. teaches a data processing apparatus (Fig. 1; [0036]: "FIG. 1 is a block diagram of a processing system 100, according to an embodiment. In various embodiments, the system 100 includes one or more processors 102 and one or more graphics processors 108, and may be a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors 102 or processor cores 107" teaches a data processing system (data processing apparatus)), comprising: 
at least one processor (Fig. 1; [0036]: "FIG. 1 is a block diagram of a processing system 100, according to an embodiment. In various embodiments, the system 100 includes one or more processors 102 and one or more graphics processors 108, and may be a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors 102 or processor cores 107" teaches that the data processing system comprises one or more processors 102); and 
at least one memory configured to store instructions to be executed by the at least one processor and a neural network (Fig. 1; [0041]: "Memory device 120 can be a dynamic random-access memory (DRAM) device, a static random-access memory (SRAM) device, flash memory device, phase-change memory device, or some other memory device having suitable performance to serve as process memory. In one embodiment the memory device 120 can operate as system memory for the system 100, to store data 122 and instructions 121 for use when the one or more processors 102 executes an application or process" teaches a memory device 120 (memory) that stores instructions 121 for execution by the processors 102. [0177]: "a data processing system configured to perform operations to enable a convolutional neural network (CNN)" teaches that the data processing system can be used for implementing operations of a neural network), 
wherein the at least one processor is configured to, based on the instructions: input an input activation map into a current layer included in the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer).
	Singh et al. does not appear to explicitly teach output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
	However, Yao teaches output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)), and 
output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
	Singh et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter as taught by Yao to the disclosed invention of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).

Claims 6-8, 10, 11, 14, 16, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hah et al. (US 2018/0307783 A1) in view of Yao (US 2018/0046894 A1).
Regarding Claim 6,
Hah et al. teaches a processor-implemented method of training a neural network ([0002]: "This disclosure relates to systems and methods for implementing learned parameter systems on programmable integrated circuits" teaches a method for implementing a learned parameter system on programmable integrated circuits (e.g. a processor). [0004]: "Learned parameter systems include systems that learn and/or adjust values associated with parameters in a training or tuning phase, and then apply the learned values (which do not change or change very slowly and/or rarely and thus may be referred to as “stable”) in a use phase. References to training phases should be understood to include tuning phases, as well as any other suitable phases that adjust the values to become more suited to perform a desired function, such as retraining phases, fine-tuning phases, search phases, exploring phases, or the like. A use instance of a learned parameter system and/or its parameters is an instance of the same that has stable parameter values and may be employed in the use phase of the learned parameter system. For example, learned parameter systems may include Deep Learning systems, Deep Neural Networks, Neuromorphic systems, Spiking Networks, and the like" teaches that learned parameter systems implementing neural networks, including a training phase for training the neural network), the method comprising:46012052.1631 
initializing a weight of a current layer of the neural network … (Fig. 2; [0028]: "FIG. 2 is a diagram of a convolution layer 40 of a CNN that may be programmed into the programmable integrated circuit 12 of FIG. 1, according to an embodiment of the present disclosure. As illustrated, each convolution layer 40 may convolve a set of N input feature maps 42 with M sets of N K×K learned parameters (also referred to as filter matrices or weights 44)" teaches an initial set of weights 44 for a convolution layer 40 (current layer) of a neural network).
	Hah et al. does not appear to explicitly teach initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter; calculating a loss based on the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter with training data; and updating the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter based on the calculated loss.
	However, Yao teaches initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer; data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that, for each layer of a neural network, a numerical range of quantization of the weights (weight quantization parameter) is dynamically chosen (i.e. initialized), and a numerical range of quantization of the respective feature (activation) map (activation quantization parameter) output from the layer is dynamically chosen (i.e. initialized). Table 2; teaches that weight bits (first representation bit number) are used to set (initialize) the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number), and that data bits (second representation bit number) are used to set (initialize) the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number)); 
calculating a loss based on the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter with training data ([0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (training data). Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that the output of the neural network is based on the parameters of the weight quantization phase (which includes the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter)) and the data quantization phase (which includes the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter))); and 
updating the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter based on the calculated loss (Fig. 5; [0082]-[0084]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter) to achieve a higher accuracy; and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter) to achieve a higher accuracy (i.e. lower the calculated loss)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter; calculating a loss based on the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter with training data; and updating the weight, the first representation bit number, the weight quantization parameter, the second representation bit number, and the activation quantization parameter based on the calculated loss as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 7,
Hah et al. in view of Yao teaches the method of claim 6.
	Additionally, Yao further teaches wherein the calculating of the loss comprises: quantizing the weight based on the first representation bit number and the weight quantization parameter ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)); 
outputting the output activation map by performing a convolution operation with respect to the quantized weight and an input activation map input into the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map); 
quantizing the output activation map with the second representation bit number and the activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps); and 
calculating the loss based on the quantized activation map ([0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (training data). [0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the neural network (output activation map) is quantized via data quantization (i.e. the output of the neural network is a quantized feature (activation) map)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the calculating of the loss comprises: quantizing the weight based on the first representation bit number and the weight quantization parameter; outputting the output activation map by performing a convolution operation with respect to the quantized weight and an input activation map input into the current layer; quantizing the output activation map with the second representation bit number and the activation quantization parameter; and calculating the loss based on the quantized activation map as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 8,
Hah et al. in view of Yao teaches the method of claim 7.
	Additionally, Yao further teaches wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has two thresholds related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has two thresholds related to the weights (since a range of values has an upper limit and a lower limit)), 
wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has an upper limit threshold (first threshold) of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
the second threshold indicates a lower limit of the activation map section ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has a lower limit threshold (second threshold) of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has an upper limit threshold (third threshold) of the range related to the weight (since a range of values has an upper limit and a lower limit)), and 
the fourth threshold indicates a lower limit of the weight section ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has a lower limit threshold (fourth threshold) of the range related to the weight (since a range of values has an upper limit and a lower limit)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 10,
Hah et al. teaches a processor-implemented method ([0002]: "This disclosure relates to systems and methods for implementing learned parameter systems on programmable integrated circuits" teaches a method for implementing a learned parameter system on programmable integrated circuits (e.g. a processor)) comprising: 
initializing a weight of a current layer of a first neural network … (Fig. 2; [0028]: "FIG. 2 is a diagram of a convolution layer 40 of a CNN that may be programmed into the programmable integrated circuit 12 of FIG. 1, according to an embodiment of the present disclosure. As illustrated, each convolution layer 40 may convolve a set of N input feature maps 42 with M sets of N K×K learned parameters (also referred to as filter matrices or weights 44)" teaches an initial set of weights 44 for a convolution layer 40 (current layer) of a neural network).
	Hah et al. does not appear to explicitly teach initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter; updating the first representation bit number and the second representation bit number based on an accuracy calculated in a previous iteration; calculating a loss based on the weight, the updated first representation bit number, the weight quantization parameter, the updated second representation bit number, and the activation quantization parameter based on training data; updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss; and calculating an accuracy to be implemented in a subsequent iteration with validity data based on the updated weight, the updated weight quantization parameter, and the updated activation quantization parameter.
	However, Yao teaches initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer; data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that, for each layer of a neural network, a numerical range of quantization of the weights (weight quantization parameter) is dynamically chosen (i.e. initialized), and a numerical range of quantization of the respective feature (activation) map (activation quantization parameter) output from the layer is dynamically chosen (i.e. initialized). Table 2; teaches that weight bits (first representation bit number) are used to set (initialize) the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number), and that data bits (second representation bit number) are used to set (initialize) the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number)); 
updating the first representation bit number and the second representation bit number based on an accuracy calculated in a previous iteration (Fig. 5; [0082]-[0084]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network, and that the neural network is updated when the accuracy (loss) calculated in a previous iteration is not within an acceptable threshold. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter) to achieve a higher accuracy; and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter) to achieve a higher accuracy (i.e. lower the calculated loss)); 
calculating a loss based on the weight, the updated first representation bit number, the weight quantization parameter, the updated second representation bit number, and the activation quantization parameter based on training data (Fig. 5; [0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (training data). Fig. 6A; Fig. 6B; [0100]-[0111]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network, and that the neural network is updated when the accuracy (loss) calculated in a previous iteration is not within an acceptable threshold. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that the output of the neural network is based on the parameters of the weight quantization phase (which includes the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter)) and the data quantization phase (which includes the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter))); 
updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss (Fig. 5; [0082]-[0084]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network. Fig. 6A; Fig. 6B; [0100]-[0111]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network, and that the neural network is updated when the accuracy (loss) calculated in a previous iteration is not within an acceptable threshold. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter) to achieve a higher accuracy; and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter) to achieve a higher accuracy (i.e. lower the calculated loss)); and 
calculating an accuracy to be implemented in a subsequent iteration with validity data based on the updated weight, the updated weight quantization parameter, and the updated activation quantization parameter (Fig. 5; [0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (validity data), where the accuracy is used for further training (updates) of the neural network in a subsequent iteration if an accuracy threshold is not met. Fig. 6A; Fig. 6B; [0100]-[0111]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network, and that the neural network is updated when the accuracy (loss) calculated in a previous iteration is not within an acceptable threshold. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that the output of the neural network is based on the parameters of the weight quantization phase (which includes the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter)) and the data quantization phase (which includes the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter))).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate initializing … a first representation bit number related to the weight, a weight quantization parameter, a second representation bit number related to an output activation map output from the current layer, and an activation quantization parameter; updating the first representation bit number and the second representation bit number based on an accuracy calculated in a previous iteration; calculating a loss based on the weight, the updated first representation bit number, the weight quantization parameter, the updated second representation bit number, and the activation quantization parameter based on training data; updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss; and calculating an accuracy to be implemented in a subsequent iteration with validity data based on the updated weight, the updated weight quantization parameter, and the updated activation quantization parameter as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 11,
Hah et al. in view of Yao teaches the method of claim 10.
	Additionally, Yao further teaches wherein the updating of the first representation bit number and the second representation bit number comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss (Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weight bit width (first representation bit number), and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number) by adjusting (increasing or decreasing) the bit width (bit number), to achieve a higher accuracy (i.e. lower the calculated loss). Table 2; teaches that the weight bits bit width (first representation bit number) and the data bits bit width (second representation bit number) are adjusted by a preset bit number).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the updating of the first representation bit number and the second representation bit number comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 14,
Hah et al. in view of Yao teaches the method of claim 10.
	Additionally, Yao further teaches wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has two thresholds related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has two thresholds related to the weights (since a range of values has an upper limit and a lower limit)), 
wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has an upper limit threshold (first threshold) of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
the second threshold indicates a lower limit of the activation map section ([0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps, meaning that the activation quantization parameter has a lower limit threshold (second threshold) of the range related to the feature (activation) map (since a range of values has an upper limit and a lower limit)), and 
wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has an upper limit threshold (third threshold) of the range related to the weight (since a range of values has an upper limit and a lower limit)), and 
the fourth threshold indicates a lower limit of the weight section ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights, meaning that the weight quantization parameter has a lower limit threshold (fourth threshold) of the range related to the weight (since a range of values has an upper limit and a lower limit)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the activation quantization parameter includes a first threshold and a second threshold related to the output activation map, and the weight quantization parameter includes a third threshold and a fourth threshold of an absolute value of the weight, wherein the first threshold indicates an upper limit of an activation map section with respect to the output activation map, and the second threshold indicates a lower limit of the activation map section, and wherein the third threshold indicates an upper limit of a weight section with respect to the absolute value of the weight, and the fourth threshold indicates a lower limit of the weight section as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 16,
Hah et al. teaches a processor-implemented method of training a neural network ([0002]: "This disclosure relates to systems and methods for implementing learned parameter systems on programmable integrated circuits" teaches a method for implementing a learned parameter system on programmable integrated circuits (e.g. a processor). [0004]: "Learned parameter systems include systems that learn and/or adjust values associated with parameters in a training or tuning phase, and then apply the learned values (which do not change or change very slowly and/or rarely and thus may be referred to as “stable”) in a use phase. References to training phases should be understood to include tuning phases, as well as any other suitable phases that adjust the values to become more suited to perform a desired function, such as retraining phases, fine-tuning phases, search phases, exploring phases, or the like. A use instance of a learned parameter system and/or its parameters is an instance of the same that has stable parameter values and may be employed in the use phase of the learned parameter system. For example, learned parameter systems may include Deep Learning systems, Deep Neural Networks, Neuromorphic systems, Spiking Networks, and the like" teaches that learned parameter systems implementing neural networks, including a training phase for training the neural network), the method comprising: 
… a weight of a current layer of a pre-trained first neural network … (Fig. 10; [0046]: "The computing engine 16 may receive (process block 152) an input set of learned parameters for an input learned parameter system. Each learned parameter may be previously determined through a learning algorithm, such as back propagation. That is, each learned parameter may be trained or tuned repeatedly over time, and thus may be have use instances or values that are stable or fixed (e.g., unchanging). In some embodiments, the computing engine 16 may quantize the set of learned parameters to generate a set of quantized learned parameters" teaches that the learned parameter system may receive learned parameters that have been already trained (pre-trained) by a learning algorithm, which are then quantized).
	Hah et al. does not appear to explicitly teach initializing a first representation bit number related to a weight of a current layer … and a second representation bit number related to an output activation map output from the current layer; calculating a loss based on the pre-trained first neural network, the first representation bit number, and the second representation bit number based on training data; and updating the first representation bit number and the second representation bit number based on the calculated loss.
	However, Yao teaches initializing a first representation bit number related to a weight of a current layer … and a second representation bit number related to an output activation map output from the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer; data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that, for each layer of a neural network, a numerical range of quantization of the weights (weight quantization parameter) is dynamically chosen (i.e. initialized), and a numerical range of quantization of the respective feature (activation) map (activation quantization parameter) output from the layer is dynamically chosen (i.e. initialized). Table 2; teaches that weight bits (first representation bit number) are used to set (initialize) the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number), and that data bits (second representation bit number) are used to set (initialize) the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number)); 
calculating a loss based on the pre-trained first neural network, the first representation bit number, and the second representation bit number based on training data (Fig. 5; [0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (training data). Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that the output of the neural network is based on the parameters of the weight quantization phase (which includes the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter)) and the data quantization phase (which includes the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter))); and 
updating the first representation bit number and the second representation bit number based on the calculated loss (Fig. 5; [0082]-[0084]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network, and that the neural network is updated when the accuracy (loss) calculated in a previous iteration is not within an acceptable threshold. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter) to achieve a higher accuracy; and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter) to achieve a higher accuracy (i.e. lower the calculated loss)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate initializing a first representation bit number related to a weight of a current layer … and a second representation bit number related to an output activation map output from the current layer; calculating a loss based on the pre-trained first neural network, the first representation bit number, and the second representation bit number based on training data; and updating the first representation bit number and the second representation bit number based on the calculated loss as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 17,
Hah et al. in view of Yao teaches the method of claim 16.
	Additionally, Yao further teaches wherein the updating comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss (Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weight bit width (first representation bit number), and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number) by adjusting (increasing or decreasing) the bit width (bit number), to achieve a higher accuracy (i.e. lower the calculated loss). Table 2; teaches that the weight bits bit width (first representation bit number) and the data bits bit width (second representation bit number) are adjusted by a preset bit number).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the updating comprises increasing or decreasing the first representation bit number and the second representation bit number by a preset bit number based on the calculated loss as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 19,
Hah et al. teaches a processor-implemented method of training a neural network ([0002]: "This disclosure relates to systems and methods for implementing learned parameter systems on programmable integrated circuits" teaches a method for implementing a learned parameter system on programmable integrated circuits (e.g. a processor). [0004]: "Learned parameter systems include systems that learn and/or adjust values associated with parameters in a training or tuning phase, and then apply the learned values (which do not change or change very slowly and/or rarely and thus may be referred to as “stable”) in a use phase. References to training phases should be understood to include tuning phases, as well as any other suitable phases that adjust the values to become more suited to perform a desired function, such as retraining phases, fine-tuning phases, search phases, exploring phases, or the like. A use instance of a learned parameter system and/or its parameters is an instance of the same that has stable parameter values and may be employed in the use phase of the learned parameter system. For example, learned parameter systems may include Deep Learning systems, Deep Neural Networks, Neuromorphic systems, Spiking Networks, and the like" teaches that learned parameter systems implementing neural networks, including a training phase for training the neural network), the method comprising: 
initializing a weight of a current layer of the neural network … (Fig. 2; [0028]: "FIG. 2 is a diagram of a convolution layer 40 of a CNN that may be programmed into the programmable integrated circuit 12 of FIG. 1, according to an embodiment of the present disclosure. As illustrated, each convolution layer 40 may convolve a set of N input feature maps 42 with M sets of N K×K learned parameters (also referred to as filter matrices or weights 44)" teaches an initial set of weights 44 for a convolution layer 40 (current layer) of a neural network).
	Hah et al. does not appear to explicitly teach initializing … a weight quantization parameter related to the weight, and an activation quantization parameter related to an output activation map output from the current layer; calculating a loss based on a pre-trained first representation bit number related to the weight, a pre-trained second representation bit number related to the output activation map, the weight, the weight quantization parameter, and the activation quantization parameter based on training data; and updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss.
	However, Yao teaches initializing … a weight quantization parameter related to the weight, and an activation quantization parameter related to an output activation map output from the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer; data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that, for each layer of a neural network, a numerical range of quantization of the weights (weight quantization parameter) is dynamically chosen (i.e. initialized), and a numerical range of quantization of the respective feature (activation) map (activation quantization parameter) output from the layer is dynamically chosen (i.e. initialized)); 
calculating a loss based on a pre-trained first representation bit number related to the weight, a pre-trained second representation bit number related to the output activation map, the weight, the weight quantization parameter, and the activation quantization parameter based on training data (Fig. 5; [0084]: "The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that accuracy (loss) can be measured (calculated) based on the results output from the neural network when using benchmark test data (training data). Fig. 5 further teaches a pruning step based on updating the already trained (pre-trained) neural network parameters. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that the output of the neural network is based on the parameters of the weight quantization phase (which includes the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter)) and the data quantization phase (which includes the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter))); and 
updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss (Fig. 5; [0082]-[0084]: "In step 505, training said ANN by adjusting weights of ANN until the accuracy of ANN reaches a predetermined level … The accuracy of ANN can be measured by, for example, inputting a benchmark test data to the ANN and decide how accurate the prediction results of said ANN is" teaches that the weights of the neural network are updated based on the accuracy (calculated loss) of the neural network. Fig. 6A; Fig. 6B; [0100]-[0111]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl … In step 620, the data quantization phase aims to find the optimal fl for a set of feature maps between two layers. In this phase, the intermediate data of the fixed-point CNN model and the floating-point CNN model are compared layer by layer using a greedy algorithm to reduce the accuracy loss. For each layer, the optimization target is shown in Equation 12:

    PNG
    media_image3.png
    84
    315
    media_image3.png
    Greyscale

In Equation 12, x+ represents the result of a layer when we denote the computation of a layer as x+=Ax. It should be noted, for either CONV layer or FC layer, the direct result x+ has longer bit width than the given standard. Consequently, truncation is needed when optimizing fl selection. Finally, the entire data quantization configuration is generated" teaches that weight quantization and data quantization phases are updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number), and the dynamic range of the weights (weight quantization parameter) to achieve a higher accuracy; and the data quantization phase updates the bit-width for the feature (activation) maps (second representation bit number), and the range of the output feature (activation) map data (activation quantization parameter) to achieve a higher accuracy (i.e. lower the calculated loss)).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate initializing … a weight quantization parameter related to the weight, and an activation quantization parameter related to an output activation map output from the current layer; calculating a loss based on a pre-trained first representation bit number related to the weight, a pre-trained second representation bit number related to the output activation map, the weight, the weight quantization parameter, and the activation quantization parameter based on training data; and updating the weight, the weight quantization parameter, and the activation quantization parameter based on the calculated loss as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Regarding Claim 20,
Hah et al. in view of Yao teaches the method of claim 19.
	Additionally, Yao further teaches further comprising: quantizing the weight based on the updated weight quantization parameter and the pre-trained first representation bit number ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that weights are quantized during weight quantization according to a numerical range (weight quantization parameter) for the respective weights. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number). Fig. 6A; Fig. 6B; [0100]-[0107]: "As shown in FIG. 6A, to convert floating-point numbers into fixed-point ones while achieving the highest accuracy, we propose a dynamic-precision data quantization strategy and an automatic workflow … As shown in FIG. 6B, the proposed quantization flow mainly consists of two phases: Step 610: the weight quantization phase, and Step 620: the data quantization phase … In step 610, the weight quantization phase aims to find the optimal fl for weights in one layer, as shown in Equation 10:

    PNG
    media_image2.png
    59
    306
    media_image2.png
    Greyscale

where W is a weight and W(bw; fl) represents the fixed-point format of W under the given bw and fl … the dynamic ranges of weights in each layer is analyzed first, for example, by sampling. After that, the fl is initialized to avoid data overflow. Furthermore, we search for the optimal fl in the adjacent domains of the initial fl " teaches that weight quantization phase is updated based on the accuracy (calculated loss); where the weight quantization phase updates the weights, the weight bit width (first representation bit number) to achieve a higher accuracy (i.e. lower the calculated loss)); and 
storing the quantized weight and the activation quantization parameter ([0167]: "The external memory 8120 stores all the ANN model parameters, data, and instructions are stored" teaches that all neural network parameters and data (including quantized weights and quantization parameters) are stored in memory).
	Hah et al. and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate further comprising: quantizing the weight based on the updated weight quantization parameter and the pre- trained first representation bit number; and storing the quantized weight and the activation quantization parameter as taught by Yao to the disclosed invention of Hah et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).

Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 2017/0076195 A1) in view of Singh et al. (US 2018/0189981 A1) and further in view of Yao (US 2018/0046894 A1).
Regarding Claim 23,
Yang et al. teaches a face recognition apparatus (Fig. 7; [0071]: "FIG. 7 is an illustrative diagram of an example system 700 for implementing at least a portion of a neural network, arranged in accordance with at least some implementations of the present disclosure. As shown in FIG. 7, system 700 may include a central processor 701, a graphics processor 702, a memory 703, a communications interface 501, and/or a transmitter 503. Also, as shown, central processor 701 may include or implement lower level layer modules 521 and fully connected portion modules 522. In the example of system 700, memory 703 may store sensor data, image data, video data, or related content such as input layer data, feature maps, sub-sampled feature maps, neural network parameters or models, object labels, and/or any other data as discussed herein" teaches a system (apparatus) for implementing the embodied neural network. [0031]: "Furthermore, in some embodiments, multiple fully connected portions may be implemented based on sub-sampled feature maps 115 with each fully connected portion performing a particular object detection such as face detection, pedestrian detection, auto detection, license plate detection, and so on" teaches that the layers of the embodied neural network may be used for performing face detection (facial recognition)), comprising: 
at least one processor (Fig. 7; [0071]: "FIG. 7 is an illustrative diagram of an example system 700 for implementing at least a portion of a neural network, arranged in accordance with at least some implementations of the present disclosure. As shown in FIG. 7, system 700 may include a central processor 701, a graphics processor 702, a memory 703, a communications interface 501, and/or a transmitter 503. Also, as shown, central processor 701 may include or implement lower level layer modules 521 and fully connected portion modules 522. In the example of system 700, memory 703 may store sensor data, image data, video data, or related content such as input layer data, feature maps, sub-sampled feature maps, neural network parameters or models, object labels, and/or any other data as discussed herein" teaches that the system comprises a processor); and 
… an input image comprising a facial image of a user ([0025]: "As shown, neural network 100 may receive an input layer (IL) 111, which may include any suitable input data such as sensor data or image data or the like. As used herein, sensor data may include any data generated via a sensor such as area monitoring data, environmental monitoring data, industrial monitoring data, or the like. As used herein, image data may include any suitable still image data, video frame data, or the like in any suitable format" teaches that the input to the neural network is an image. [0045]: "Such a common or shared format may provide for a set of neurons or the like implemented via cameras 301 that may serve multiple different applications using the same data (e.g., each set of sets sub-sampled feature maps 351) and thereby provide common building blocks for object recognition tasks or the like. Such data may be used for multiple types of object detection (e.g., via specific lower level layers and/or specific fully connected portions implemented via gateway 302 and/or cloud computing resources 303) … In some embodiments, the same lower level data may be used to perform face detection, pedestrian detection, automobile detection, license plate detection, and so on" teaches that the lower level input data may be from a camera and used for face detection (i.e. an image comprising a user's face used for facial recognition)), and
perform a user recognition process by processing the quantized activation map ([0022]: "In embodiments where the device transmits the feature maps to a gateway, the gateway may implement one or more additional lower level neural network layers to generate feature maps (e.g., convolutional neural network feature maps or the like) and transmit the feature maps to a cloud computing resource or the like. In either embodiment, the cloud computing resource may receive the feature maps (e.g., from the distributed device or the gateway) and the cloud computing resource may optionally implement one or more additional lower level neural network layers to generate feature maps and the cloud computing resource may implement a fully connected portion (e.g., a fully connected multilayer perceptron portion of the neural network) to the received or internally generated feature maps to generate output labels (e.g., object detection labels) or similar data" teaches that the generated output feature map (quantized activation map) may be used for object detection. [0023]: "Such feature maps having the same format may be used for different types of object detection or output labeling or the like based on implementation of a specialized fully connected portion of the neural network. For example, multiple object detections (e.g., attempting to detect a variety of objects such as automobiles, faces, human bodies, and so on)" teaches that the object detection includes face recognition of a user).
Yang et al. does not appear to explicitly teach at least one memory configured to store instructions to be executed by the at least one processor and a neural network, wherein the at least one processor is configured to, based on the instructions: output an input activation map from an input …, input the input activation map into a current layer included in the neural network, output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and51012052.1631 output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
	However, Singh et al. teaches at least one memory configured to store instructions to be executed by the at least one processor and a neural network (Fig. 1; [0041]: "Memory device 120 can be a dynamic random-access memory (DRAM) device, a static random-access memory (SRAM) device, flash memory device, phase-change memory device, or some other memory device having suitable performance to serve as process memory. In one embodiment the memory device 120 can operate as system memory for the system 100, to store data 122 and instructions 121 for use when the one or more processors 102 executes an application or process" teaches a memory device 120 (memory) that stores instructions 121 for execution by the processors 102. [0177]: "a data processing system configured to perform operations to enable a convolutional neural network (CNN)" teaches that the data processing system can be used for implementing operations of a neural network), 
wherein the at least one processor is configured to, based on the instructions: output an input activation map from an input … ([0125]: "an original image 1502 having some data to be analyzed is processed by a set of convolution kernels that apply each apply a different filter 1504A, 1504B to the original image 1502. The filters 1504A, 1504B are learnable and typically much smaller than the original image to which the filters will be applied. The convolution kernels output a set of feature maps" teaches that a feature map (input activation map) may be computed (output) from an input), and
input the input activation map into a current layer included in the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer).
Yang et al. and Singh et al. are analogous to the claimed invention because they are directed to neural network implementation techniques.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate at least one memory configured to store instructions to be executed by the at least one processor and a neural network, wherein the at least one processor is configured to, based on the instructions: output an input activation map from an input …, and input the input activation map into a current layer included in the neural network as taught by Singh et al. to the disclosed invention of Yang et al.
One of ordinary skill in the art would have been motivated to make this modification to "preserve memory bus bandwidth and reduce system memory access power requirements when performing CNN operations" (Singh et al. [0032]).
Yang et al. in view of Singh et al. does not appear to explicitly teach output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer,51012052.1631 output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
However, Yao teaches output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)), and51012052.1631 
output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
Yang et al., Singh et al., and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, 51012052.1631 output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter as taught by Yao to the disclosed invention of Yang et al. in view of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).

Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Pennington et al. (US 10,438,131 B1) in view of Singh et al. (US 2018/0189981 A1) and further in view of Yao (US 2018/0046894 A1).
Regarding Claim 24,
Pennington et al. teaches a speech recognition apparatus (Col. 1, lines 45-63: "According to one aspect, a computing system includes at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the computing system to perform operations. The operations may include generating an approximation of polynomial kernel as a sum of Gaussian kernels and storing the sample of the vector values as a nonlinear randomized feature map … The operations may also include generating input vectors for a kernel-based machine learning system using the nonlinear randomized feature map and training the machine learning system using the input vectors" teaches a computing system (apparatus). Col. 3, lines 53-55: "The system 100 may use a machine learning engine 120 to perform image searches, speech recognition, etc., on the data items 130" teaches that the computing system may be used for speech recognition), comprising: 
at least one processor (Col. 1, lines 45-63: " According to one aspect, a computing system includes at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the computing system to perform operations. The operations may include generating an approximation of polynomial kernel as a sum of Gaussian kernels and storing the sample of the vector values as a nonlinear randomized feature map … The operations may also include generating input vectors for a kernel-based machine learning system using the nonlinear randomized feature map and training the machine learning system using the input vectors" teaches a computing system (apparatus). Col. 3, lines 53-55: "The system 100 may use a machine learning engine 120 to perform image searches, speech recognition, etc., on the data items 130" teaches the computing system comprising at least one processor); and 
at least one memory configured to store instructions to be executed by the at least one processor and a neural network (Col. 1, lines 45-63: " According to one aspect, a computing system includes at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the computing system to perform operations. The operations may include generating an approximation of polynomial kernel as a sum of Gaussian kernels and storing the sample of the vector values as a nonlinear randomized feature map … The operations may also include generating input vectors for a kernel-based machine learning system using the nonlinear randomized feature map and training the machine learning system using the input vectors" teaches a computing system (apparatus). Col. 3, lines 53-55: "The system 100 may use a machine learning engine 120 to perform image searches, speech recognition, etc., on the data items 130" teaches the computing system comprising a memory storing instruction for a processor and a neural network), and
… speech data representing a word (Col. 3, lines 48-51: "The data items 130 may be a database, for example of files or search items. For instance, the data items 130 may be any kind of file, such as documents, images, sound files, video files, etc., and the feature vectors may be extracted from the file" teaches that the input is a sound file (speech data representing a word)), and
perform a word recognition process by processing the quantized activation map (Col. 6, lines 54-62: "The system 100 may also include machine learning engine 120. The machine learning engine 120 may be any type of kernel-based machine-learning system, such as a long short-term memory (LSTM) neural network, feed-forward neural network, a support vector machine (SVM) classifier etc., that can predict one thing given the data item approximations 134 as input. For example, the machine learning engine 120 may take as input a data item and may use the feature map 136 to generate a transformation of the data item that is used to provide, as output, a classification for the data item. The data item can be an image and the classification may be a label for the image or a description of something identified in the image. The data item can also be sound file and the classification may be a word or words recognized in the sound file" teaches that the output feature map (quantized activation map) is used to perform a word classification (recognition) for a sound file).
	Pennington et al. does not appear to explicitly teach wherein the at least one processor is configured to, based on the instructions: output an input activation map from speech data representing a word, input the input activation map into a current layer included in the neural network, output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
	However, Singh et al. teaches wherein the at least one processor is configured to, based on the instructions: output an input activation map from … data … ([0125]: "an original image 1502 having some data to be analyzed is processed by a set of convolution kernels that apply each apply a different filter 1504A, 1504B to the original image 1502. The filters 1504A, 1504B are learnable and typically much smaller than the original image to which the filters will be applied. The convolution kernels output a set of feature maps" teaches that a feature map (input activation map) may be computed (output) from a input data), and
input the input activation map into a current layer included in the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer). 
Pennington et al. and Singh et al. are analogous to the claimed invention because they are directed to neural network implementation techniques.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the at least one processor is configured to, based on the instructions: output an input activation map from … data …, and input the input activation map into a current layer included in the neural network as taught by Singh et al. to the disclosed invention of Pennington et al.
One of ordinary skill in the art would have been motivated to make this modification to "preserve memory bus bandwidth and reduce system memory access power requirements when performing CNN operations" (Singh et al. [0032]).
Pennington et al. in view of Singh et al. does not appear to explicitly teach output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
However, Yao teaches output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)), and 
output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
Pennington et al., Singh et al., and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter as taught by Yao to the disclosed invention of Pennington et al. in view of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable Radosavljevic et al. (US 2019/0094858 A1) in view of Singh et al. (US 2018/0189981 A1) and further in view of Yao (US 2018/0046894 A1).
Regarding Claim 25,
Radosavljevic et al. teaches an autonomous driving control apparatus (Fig. 1; teaches a parking prediction system 102 (autonomous driving control apparatus) used for controlling an autonomous vehicle 106), comprising: 
at least one processor (Fig. 1; Fig. 3; [0077]: "Referring now to FIG. 3, FIG. 3 is a diagram of example components of a device 300. Device 300 corresponds to one or more devices of parking prediction system 102, one or more devices of image database 104, and/or one or more devices (e.g., one or more devices of a system of) autonomous vehicle 106. In some non-limiting embodiments, one or more devices of parking prediction system 102, one or more devices of image database 104, and/or one or more devices (e.g., one or more devices of a system of) autonomous vehicle 106 include at least one device 300 and/or at least one component of device 300. As shown in FIG. 3, device 300 includes bus 302, processor 304, memory 306, storage component 308, input component 310, output component 312, and communication interface 214" teaches that the device 300 (corresponding to the parking prediction system 102) that includes a processor 304); and 
at least one memory configured to store instructions to be executed by the at least one processor and a neural network (Fig. 1; Fig. 3; [0082]: "device 300 performs one or more processes described herein. In some non-limiting embodiments, device 300 performs these processes based on processor 304 executing software instructions stored by a computer-readable medium, such as memory 306 and/or storage component 308" teaches a memory 306 storing instructions to be executed by the processor 304), 
… input data representing driving environment information of a vehicle (Fig. 5; [0143]: "As shown in FIG. 5, at step 502, process 500 includes processing one or more images (e.g., one or more feature maps, one or more vehicle maps, one or more geographic location images, etc.) of one or more roads to produce one or more artificial neurons associated with one or more convolution layers of a convolutional neural network model" teaches that the input is image data comprising location/road (environment) information of the vehicle), and 
control a driving operation of the vehicle by processing the quantized activation map ([0127]: "For example, parking prediction system 102 may determine whether each element of the matrix of the vehicle map and/or the feature map includes a prediction score indicating that element is associated with a parking location" teaches that the parking prediction system uses the image data to generate an output feature map (quantized activation map). [0133]: "In some non-limiting embodiments, parking prediction system 102 provides the map to autonomous vehicle 106 and autonomous vehicle 106 travels (e.g., navigate, travels on a route, navigates a route, etc.) based on the map … the autonomous vehicle 106 performs vehicle control actions (e.g., braking, steering, accelerating) and plans a route based on a parking location of the map" teaches that an autonomous vehicle performs control actions (driving operation) based on the generated map (quantized activation map) from the parking prediction system).
	Radosavljevic et al. does not appear to explicitly teach wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data …, input the input activation map into a current layer included in the neural network, output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
	However, Singh et al. teaches wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data … ([0125]: "an original image 1502 having some data to be analyzed is processed by a set of convolution kernels that apply each apply a different filter 1504A, 1504B to the original image 1502. The filters 1504A, 1504B are learnable and typically much smaller than the original image to which the filters will be applied. The convolution kernels output a set of feature maps" teaches that a feature map (input activation map) may be computed (output) from a input data), and
input the input activation map into a current layer included in the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer).
Radosavljevic et al. and Singh et al. are analogous to the claimed invention because they are directed to neural network implementation techniques.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data …, and input the input activation map into a current layer included in the neural network as taught by Singh et al. to the disclosed invention of Radosavljevic et al.
One of ordinary skill in the art would have been motivated to make this modification to "preserve memory bus bandwidth and reduce system memory access power requirements when performing CNN operations" (Singh et al. [0032]).
Radosavljevic et al. in view of Singh et al. does not appear to explicitly teach output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter.
However, Yao teaches output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)), and 
output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
Radosavljevic et al., Singh et al., and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number with an activation quantization parameter as taught by Yao to the disclosed invention of Radosavljevic et al. in view of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).

Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Asif et al. (US 2019/0163955 A1) in view of Singh et al. (US 2018/0189981 A1) and further in view of Yao (US 2018/0046894 A1).
Regarding Claim 26,
Asif et al. teaches a robot control apparatus (Fig. 1; [0021]: "FIG. 1 is a schematic diagram of a system for classifying and localizing mammographic images, according to embodiments of the disclosure. Referring now to the figure, an initial 2-dimensional (2D) mammography scan 10 received from a mammographic device 15 is provided to a geometric embedding application 11 and a deep fusion CNN 12 according to an embodiment of the disclosure. The geometric embedding application 11 processes the 1D mammography scan 10 into geometric embedding vector, and provides it to the deep fusion CNN 12, which combines the geometric embedding vector with the 2D mammography scan 10 to output a 3-channel feature map 13 that includes the tumor detection confidence and localizes the tumor in each of the three channels. The 3-channel feature map 13 can be effectively be used within a closed control loop of a robotic surgical device 14" teaches a system for classifying and localizing images, where the output of the system is used to control a robotic surgical device), comprising: 
at least one processor ([0011]: “According to another aspect of the invention, there is provided a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for digital image classification and localization” teaches that the image classification and localization system (apparatus) can be a non-transitory program storage device comprising instructions for execution by a computer (processor)); and 
at least one memory configured to store instructions to be executed by the at least one processor and a neural network ([0011]: “According to another aspect of the invention, there is provided a non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform the method steps for digital image classification and localization” teaches that the image classification and localization system (apparatus) can be a non-transitory program storage device (memory) comprising instructions for execution by a computer (processor)), 
… input data representing environment information of a robot ([0021]: "Referring now to the figure, an initial 2-dimensional (2D) mammography scan 10 received from a mammographic device 15 is provided to a geometric embedding application 11 and a deep fusion CNN 12 according to an embodiment of the disclosure" teaches that the input is an image comprising information about the surgical environment in which the robotic surgical device is operating), and 
perform a control operation of the robot by processing the quantized activation map ([0021]: "The geometric embedding application 11 processes the 1D mammography scan 10 into geometric embedding vector, and provides it to the deep fusion CNN 12, which combines the geometric embedding vector with the 2D mammography scan 10 to output a 3-channel feature map 13 that includes the tumor detection confidence and localizes the tumor in each of the three channels. The 3-channel feature map 13 can be effectively be used within a closed control loop of a robotic surgical device 14" teaches that the output feature map (quantized activation map) can be used for control of a robotic surgical device).
	Asif et al. does not appear to explicitly teach wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data …, input the input activation map into a current layer included in the neural network, output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter.
	However, Singh et al. teaches wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data … ([0125]: "an original image 1502 having some data to be analyzed is processed by a set of convolution kernels that apply each apply a different filter 1504A, 1504B to the original image 1502. The filters 1504A, 1504B are learnable and typically much smaller than the original image to which the filters will be applied. The convolution kernels output a set of feature maps" teaches that a feature map (input activation map) may be computed (output) from a input data), and 
input the input activation map into a current layer included in the neural network ([0153]: "Next, compute logic (e.g., the compute block, GPGPU logic, etc.) can be configured to generate feature map data for a CNN layer based on the kernel data, as shown at 2704. The feature map data for the CNN layer is then encoded during a write to memory, as shown at 2706. Computational logic can then read the encoded feature map data from memory and decode the encoded feature map data during the read, as shown at 2708. The computational logic can then process the feature map data as input feature map data for the next CNN layer, as shown at 2710" teaches that compute logic (e.g. the processor) can be used for inputting input feature map (input activation map) data into a CNN layer).
Asif et al. and Singh et al. are analogous to the claimed invention because they are directed to neural network implementation techniques.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate wherein the at least one processor is configured to, based on the instructions: output an input activation map from input data …, and input the input activation map into a current layer included in the neural network as taught by Singh et al. to the disclosed invention of Asif et al.
One of ordinary skill in the art would have been motivated to make this modification to "preserve memory bus bandwidth and reduce system memory access power requirements when performing CNN operations" (Singh et al. [0032]).
Asif et al. in view of Singh et al. does not appear to explicitly teach output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter.
However, Yao teaches output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer ([0013]: "fix-point quantization step for converting floating-point numbers into fixed-point numbers, including: weight quantization step, for converting weights of said convolutional layers CONV 1, CONV 2, . . . CONV n, and fully connected layers FC 1, FC 2, . . . , FC m of the compressed ANN from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different layers while remains static in one layer" teaches that a neural network is optimized by using quantized weights (converted from floating-point to fixed-point), where the input weights are quantized during a weight quantization step (i.e. prior to performing neural network operations, such as convolution, the layers of the neural network are optimized by having the corresponding weights quantized). [0043]: "A CONV layer takes a series of feature maps as input and convolves with convolutional kernels to obtain the output feature map" teaches that a convolutional layer convolves (convolution operation) an input feature (activation) map with convolutional kernels (i.e. the weights optimized during weight quantization) to output an output feature (activation) map. Table 2; teaches that weight bits (first representation bit number) are used to set the bit number during quantization (i.e. weights are quantized according to the selected weight bits bit number)), and 
output a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter ([0116]: "Specifically, for example, it conducts weight quantization for one of said CONV layers and FC layers in sequence; after conducting weight quantization for the present layer, but before conducting weight quantization for next layer of said CONV layers and FC layers, it conducts data quantization of feature map set output from said present layer" teaches that the feature map output from the current layer (output activation map) is quantized via data quantization. Table 2; teaches that data bits (second representation bit number) are used to set the bit number during quantization (i.e. feature/activation maps are quantized according to the selected data bits bit number). [0013]: "data quantization step, for converting data of feature map sets j from floating-point numbers into fixed-point numbers, wherein the numerical range of quantization is dynamically chosen for different feature map sets while remains static in one feature map set, wherein said feature map sets j are output by said CONV layers and FC layers of said ANN; compiling step, for compiling said compressed ANN to generate instructions to be executed by an ANN accelerator, so as to implement said ANN on said ANN accelerator; wherein the compiling step is conducted on the basis of the quantized weights of CONV and FC layers of said ANN, and the chosen quantization numerical range for respective feature map sets output by said CONV and FC layers" teaches that feature (activation) maps are quantized during data quantization according to a numerical range (activation quantization parameter) for the respective feature (activation) maps).
Asif et al., Singh et al., and Yao are analogous to the claimed invention because they are directed to neural network implementation techniques.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate output an output activation map by performing a convolution operation between the input activation map and a weight quantized with a first representation bit number of the current layer, and output a quantized activation map by quantizing the output activation map with a second representation bit number based on an activation quantization parameter as taught by Yao to the disclosed invention of Asif et al. in view of Singh et al.
	One of ordinary skill in the art would have been motivated to make this modification "to optimize a CNN from the algorithm perspective, in order to reduce both memory and computation resources it requires to implement a CNN, while suffer minimum loss of accuracy" (Yao [0072]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN J HALES whose telephone number is (571)272-0878. The examiner can normally be reached M-Th 8:00am - 5:00pm and F 8:00am - 2:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRIAN J HALES/Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125