Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is the initial office action that has been issued in response to patent application 16/160,444 filed on 10/15/2018. Claims 1-19, as originally filed, are currently pending and have been considered below. Claim 1 and 10 are independent claims.

Information Disclosure Statement
The information disclosure statement (IDS) are submitted on 06/10/2019 and 06/14/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Document number 2 under the NPL section of the IDS submitted on 06/10/2019 is not considered because of a concise explanation of relevance for non-English language information is not provided. See MPEP 609.049(a) section III.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Applicant cannot rely upon the certified copy of the foreign priority application to overcome this rejection because a translation of said application has not been made of record in accordance with 37 CFR 1.55. See MPEP §§ 215 and 216.
In particular, Applicant is reminded of requirements set forth in 27 C.F.R. 1.55(g)(3)-(4) Claim for foreign priority:
(3) An English language translation on a non-English language foreign application is not required except:
When the application is involved in an inference (see § 41.202 of this chapter) or  derivation (see part 42 of this chapter) proceeding;
When necessary to overcome the date of a reference relied upon by the examiner; or 
When specifically required by the examiner.
(4) If an English language translation of a non-English language foreign application is required, it must be filed together with a statement that the translation of the certified copy is accurate” (emphasis added).
	Since an English language translation of Application No. KR10-2017-0135868 has not been made of record, the Examiner notes that prior art references with filing date or publication date prior to the instant Application’s filing date of 10/15/2018 are considered applicable prior art references.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1,

Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
obtaining a parameter for the neural network in a floating-point format
applying a fractional length of a fixed-point format to the parameter in the floating-point format
performing an operation with an integer… to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process
performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the...
as drafted, claim 1 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses obtaining a parameter for the neural network in a floating-point format (corresponds to evaluation). Further, the claim encompasses applying a fractional length of a fixed-point format to 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with 
Regarding claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 2 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
extracting a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format
calculating a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point
calculating a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value

as drafted, claim 2 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses extracting a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format (corresponds to evaluation). Further, the claim encompasses calculating a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point (corresponds to mathematical calculations). Further, the claim calculating a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: 
performing an integer operation of subtracting, from the first exponent value, the bias constant
calculating the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant

as drafted, claim 3 is a process that, under its broadest reasonable interpretation, covers mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses performing an integer operation of subtracting, from the first exponent value, the bias constant (corresponds to mathematical calculations). Further, the claim encompasses calculating the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant (corresponds to mathematical calculations). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: 
updating the first mantissa value by adding a bit value of 1 to a position before the first mantissa value
comparing a number of bits of the first mantissa value with a number of bits of the 34012055.0456 second mantissa value
shifting the updated first mantissa value to the right, based on a result of the comparing of the number of bits of the first mantissa value with the number of bits of the second mantissa value.

as drafted, claim 4 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses updating the first mantissa value by adding a bit value of 1 to a position before the first mantissa value (corresponds to mathematical calculations). Further, the claim encompasses comparing a number of bits of the first mantissa value with a number of bits of the 34012055.0456 second mantissa value (corresponds to evaluation). Further, the claim encompasses shifting the updated first mantissa value to the right, based on a result of the comparing of the number of bits of the first mantissa value with the number of bits of the second mantissa value (corresponds to evaluation). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 5,

Step 1 Analysis: Claim 5 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
shifting the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point
extracting a least significant bit (LSB) value from the shifted first mantissa value
calculating the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value
wherein the LSB value is a factor that determines whether to round off the fixed point.

as drafted, claim 5 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
tuning a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format
quantizing the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value.

as drafted, claim 6 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses tuning a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format (corresponds to evaluation). Further, the claim encompasses quantizing the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value (corresponds to mathematical calculations). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: 
comparing the second exponent value with a value obtained by subtracting 2 from a bit width of the fixed point when it is determined that the second exponent value is equal to or greater than the number of bits of the first mantissa value
changing the format of the fixed point and then re-performing the operation when the second exponent value is greater than the value obtained by subtracting 2 from the bit width of the fixed point
shifting the updated first mantissa value to the left by a difference between the second exponent value and the number of bits of the first mantissa value and applying the sign to the 35012055.0456 left-shifted first mantissa value to quantize the parameter in the floating-point format to the fixed- point format when the second exponent value is less than or equal to the value obtained by subtracting 2 from the bit width of the fixed point.

as drafted, claim 7 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses comparing the second exponent value with a value obtained by subtracting 2 from a bit width of the fixed point when it is determined that the second exponent value is equal to or greater than the number of bits of the first mantissa value (corresponds to evaluation). Further, the claim encompasses changing the format of the fixed point and then re-performing 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22
when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa 

as drafted, claim 8 is a process that, under its broadest reasonable interpretation, covers mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22 (corresponds to mathematical calculation, expanding on the mathematical calculation of claim 2). Further, the claim encompasses when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is a decimal number of 51 (corresponds to mathematical calculation, expanding on the mathematical calculation of claim 2).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
converting the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network
providing the parameter in the floating-point format to the first layer
performing the operation with the integer… to quantize the parameter in the floating- point format processed in the first layer back to a parameter in the fixed-point format.

as drafted, claim 9 is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses converting the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network (corresponds to evaluation). Further, the claim encompasses providing the parameter in the floating-point format to the first layer (corresponds to evaluation). Further, the claim encompasses performing the operation with the integer… to quantize the parameter in 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit”, as drafted, is reciting generic computer components at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or 
Regarding claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
obtain a parameter for the neural network in a floating-point format
apply a fractional length of a fixed-point format to the parameter in the floating-point format
perform an operation with an integer… to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process
quantize the parameter in the floating-point format to a parameter in the fixed-point format, based on a result of the operation with the...

as drafted, claim 10 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of, “which includes the processor” and “arithmetic logic unit”, as drafted, is reciting a generic computer component at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
extract a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format
calculate a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point
calculate a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value

as drafted, claim 11 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses extracting a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format (corresponds to evaluation). Further, the claim encompasses calculating a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point (corresponds to mathematical calculations). Further, the claim calculating a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 12,

Step 1 Analysis: Claim 12 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
perform an integer operation of subtracting, from the first exponent value, the bias constant
calculate the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant

as drafted, claim 12 is a machine that, under its broadest reasonable interpretation, covers mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses performing an integer operation of subtracting, from the first exponent value, the bias constant (corresponds to mathematical calculations). Further, the claim encompasses calculating the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant (corresponds to mathematical calculations). 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 13,

Step 1 Analysis: Claim 13 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
update the first mantissa value by adding a bit value of 1 to a position before the first mantissa value
compare a number of bits of the first mantissa value with a number of bits of the 34012055.0456 second mantissa value
shift the updated first mantissa value to the right, based on a result of the comparing of the number of bits of the first mantissa value with the number of bits of the second mantissa value.

as drafted, claim 13 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses updating the first mantissa value by adding a bit value of 1 to a position before the first mantissa value (corresponds to mathematical calculations). Further, the claim encompasses comparing a number of bits of the first mantissa value with a number of bits of the 34012055.0456 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of, “which includes the processor” and “arithmetic logic unit”, as drafted, is reciting a generic computer component at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer 
Regarding claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
shift the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point
extract a least significant bit (LSB) value from the shifted first mantissa value
calculate the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value
wherein the LSB value is a factor that determines whether to round off the fixed point.

as drafted, claim 14 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses shifting the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point (corresponds to evaluation based on math calculation). Further, the claim encompasses extracting a least significant bit (LSB) value from the shifted first mantissa value (corresponds to evaluation). Further, the claim encompasses calculating the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value (corresponds to mathematical calculation). Further, the claim encompasses wherein the LSB value is a factor that determines whether to round off the fixed point (corresponds to evaluation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
tune a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format
quantize the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value.

as drafted, claim 15 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses tuning a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format (corresponds to evaluation). Further, the claim encompasses quantizing the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value (corresponds to mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 16,

Step 1 Analysis: Claim 16 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
compare the second exponent value with a value obtained by subtracting 2 from a bit width of the fixed point when it is determined that the second exponent value is equal to or greater than the number of bits of the first mantissa value
change the format of the fixed point and then re-performing the operation when the second exponent value is greater than the value obtained by subtracting 2 from the bit width of the fixed point
shift the updated first mantissa value to the left by a difference between the second exponent value and the number of bits of the first mantissa value and applying the sign to the 35012055.0456 left-shifted first mantissa value to quantize the parameter in the floating-point format to the fixed- point format when the second exponent value is less than or equal to the value obtained by subtracting 2 from the bit width of the fixed point.

as drafted, claim 16 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of, “which includes the processor” and “arithmetic logic unit”, as drafted, is reciting a generic computer component at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 17:
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: 
wherein when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22
when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is a decimal number of 51

as drafted, claim 17 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for mere instruction to apply language. For example, but for mere instruction to apply language, the above limitation in the context of this claim encompasses wherein when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22 (corresponds to mathematical calculation, expanding on the mathematical calculation of claim 11). Further, the claim encompasses when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of, “which includes the processor” and “arithmetic logic unit”, as drafted, is reciting a generic computer component at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or 
Regarding claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a neural network apparatus, which is directed to a machine, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: The claim recites a method for neural network parameter quantization. Each of the following limitation(s):
convert the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network
provide the parameter in the floating-point format to the first layer
perform the operation with the integer ALU to quantize the parameter in the floating- point format processed in the first layer back to a parameter in the fixed-point format.

as drafted, claim 18 is a machine that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including observation, evaluation, judgement, opinion)) and mathematical concept (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of, “which includes the processor” and “arithmetic logic unit”, as drafted, is reciting a generic computer component at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. The recitation of “performing training or an inference operation with a neural network” amounts to mere instruction to apply a neural network to implement the abstract idea as indicated above in Step 2A Prong One Analysis. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).
Regarding claim 19:
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a method, which is directed to a process, one of the statutory categories. See MPEP 2106.03.
Step 2A Prong One Analysis: Please see analysis of claim 1 above.
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The recitation of additional element(s) of “arithmetic logic unit” and “A non-transitory computer-readable recording 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a mere instruction to apply language cannot provide an inventive concept. The recitation of “performing training or an inference operation with a neural network” does not amount to more than a recitation of the words "apply it" (or an equivalent) or more than mere instructions to implement an abstract idea. See MPEP 2106.05(f).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9-10, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 10373050 B2; hereinafter “Lin-1”) in view of Lin et al. (“Fixed Point Quantization of Deep Convolutional Networks”; hereinafter “Lin-2”) in view of Kum et al. (“Combined world-length optimization and high-level synthesis of digital signal processing systems”)
Regarding Claim 1,
Lin-1 teaches a processor implemented method, the method comprising (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Lin-1, Col. 1 Lines 35-40, “Theses weight values are determined by the iterative flow of training data through the network (e.g., weight values are established during a training phase in which the network learns how to identify particular classes by their typical input data characteristics)” teaches training through the neural network).
obtaining a parameter for the neural network in a floating-point format (Lin-1, Fig. 8 and Col. 14 Lines 2-12, “In block 802, at least one moment of an input distribution of a floating point machine learning network is selected. The at least one moment of the input distribution of the floating point machine learning network may include a mean, a variance or other like moment of the input distribution. In block 804, quantizer parameters for quantizing values of the floating point machine learning network are determined based on the selected moment of the input distribution of the floating point machine learning network” teaches obtaining the parameters of the network in floating point values).
… performing an operation of quantizing the parameter in the floating-point format to a parameter in the fixed-point format (Lin-1, Col. 2 Lines 15-21, “The method may also include determining quantizer parameters for quantizing values of the floating point machine learning network based at least in part on the at least one selected moment of the input distribution of the floating point machine learning network to obtain corresponding values of the fixed point machine learning network” teaches determining quantized parameters for the floating point machine learning network to obtain corresponding values of the fixed point machine learning network).
Lin-1 does not appear to explicitly teach applying a fractional length of a fixed-point format to the parameter in the floating-point format
However, Lin-2, teaches applying a fractional length of a fixed-point format to the parameter in the floating-point format (Lin-2, pg. 4 Section 3.3, “Note that determining the fixed point format is equivalent to determining the resolution, which in turn means identifying the number of fractional bits it requires to represent the number. The following equations can be used to compute the number of fractional bits: • Determine the effective standard deviation of the quantity being quantized: ξ. • Calculate step size via Table 1: s = ξ · Stepsize(β). • Compute number of fractional bits: n = −[log2 s]” teaches computing the number of fractional bits (corresponds to fraction length) of a fixed point format. Pg. 3 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations” teaches parameter in float point format and converting float to fixed point).
Lin-1 in view of Lin-2 are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 with Lin-2, with motivation to apply a fractional length of a fixed-point format to the parameter in the floating-point format. “We show that the naive method of quantizing all the layers in the DCN with uniform bit-width value results in DCN networks with subpar performance in terms of error rates relative to our proposed 
Lin-1 in view of Lin-2 does not appear to explicitly teach performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process and based on a result of the operation with the ALU
However, Kum et al., teaches performing an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process (Kum et al., Fig. 9 and pg. 927 Section IV. B, “For a right shift, the most significant bits (MSBs) are sign extended and the MSB of truncated bits is used as the carry-in signal of the adders for rounding. For a left shift, the least significant bits (LSBs) are filled with zeros and the MSBs are thrown away, but overflows do not occur because the IWLs are carefully determined throughout the range estimation” teaches the arithmetic logic unit (see Fig. 9) and further teaches rounding of the adders based on the most significant bits and the MSBs being thrown away (corresponds to discarded) after quantization).
… based on a result of the operation with the ALU (Kum et al., Fig. 9 and pg. 928, teaches the operation with the arithmetic logic unit and its results).

Regarding Claim 9,
Lin-1 in view of Lin-2 in view of Kum et al. teaches the method of claim 1, further comprising
Lin-1 further teaches to quantize the parameter in the floating- point format processed in the first layer back to a parameter in the fixed-point format (Lin-1, Col. 2 Lines 15-21, “The method may also include determining quantizer parameters for quantizing values of the floating point machine learning network based at least in part on the at least one selected moment of the input distribution of the floating point machine learning network to obtain corresponding values of the fixed point machine learning network” teaches determining quantized parameters for the floating point machine learning network to obtain corresponding values of the fixed point machine learning network).
Lin-2 et al. further teaches converting the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network (Lin-2 et al., pg. 2 Section 3, “In this section, we will propose an algorithm to convert a floating point DCN to fixed point. For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point. This can be seen as a process of quantization” teaches a quantization process that converts the parameters of a given layer (corresponds to the first layer of the neural network) from floating point to fixed point. Pg. 3-4 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations. • Collect the statistics of weights, biases and activations for each layer. • Determine the fixed point formats of the weights, biases and activations for each layer” teaches the inputs (corresponds to input to the first layer of the neural network) which consist of weights, biases and activation (corresponds to the parameter) for the neural network being in floating point).
providing the parameter in the floating-point format to the first layer (Lin-2 et al., pg. 3-4 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations. • Collect the statistics of weights, biases and activations for each layer. • Determine the fixed point formats of the weights, biases and activations for each layer” teaches the inputs (corresponds to input to the first layer of the neural network) which consist of weights, biases and activation (corresponds to the parameter) for the neural network being in floating point).
Lin-1 in view of Lin-2 in view of Kum et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 and Lin-2 with Kum et al., with motivation to convert the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network and provide the parameter in the floating-point format to the first layer. “We show that the naive method of quantizing all the layers in the DCN with uniform bit-width value results in DCN networks with subpar performance in terms of error rates relative to our proposed approach of SQNR based optimization of bit-widths. Specifically, we present results for a floating point DCN trained CIFAR-10 benchmark, which on conversion to its fixed point counter-part results in >20 % reduction in model size without any loss in accuracy” (Lin-2, Conclusion). The proposed teaching is beneficial in that it helps in reduction of the model size without any loss in accuracy.
Kum et al. further teaches performing the operation with the integer ALU (Kum et al., Fig. 9 and pg. 928, teaches the operation with the arithmetic logic unit).  
Lin-1 in view of Lin-2 in view of Kum et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 and Lin-2 with Kum et al., with motivation of performing the operation with the integer ALU. “A combined WL optimization and high-level synthesis approach that results in a more efficient or cost-effective design when compared with the previous WL optimization followed by high-level synthesis approaches. The developed method also requires less time for optimization since the use of the hardware sharing information for signal grouping results in fewer signal groups” (Kum et al., Conclusion). The proposed teaching is beneficial in that it results in a more efficient or cost-effective design that also requires less time for optimization.
Regarding Claim 10,
Lin-1 teaches a neural network apparatus, the apparatus comprising: a processor configured to (Lin-1, FIG. 1 and Col. 4 Lines 52-58, “FIG. 1 illustrates an example implementation of the aforementioned reduction of computation complexity by quantizing a floating point neural network to obtain a fixed point neural network using a system-on-a-chip (SOC) 100, which may include a general-purpose processor (CPU) or multi-core general-purpose processors (CPUs) 102 in accordance with certain aspects of the present disclosure” teaches a neural network comprising of a processor)
perform training or an inference operation with a neural network, which includes the processor being further configured to (Lin-1, Col. 1 Lines 35-40, “Theses weight values are determined by the iterative flow of training data through the network (e.g., weight values are established during a training phase in which the network learns how to identify particular classes by their typical input data characteristics)” teaches training through the neural network).
obtain a parameter for the neural network in a floating-point format (Lin-1, Fig. 8 and Col. 14 Lines 2-12, “In block 802, at least one moment of an input distribution of a floating point machine learning network is selected. The at least one moment of the input distribution of the floating point machine learning network may include a mean, a variance or other like moment of the input distribution. In block 804, quantizer parameters for quantizing values of the floating point machine learning network are determined based on the selected moment of the input distribution of the floating point machine learning network” teaches obtaining the parameters of the network in floating point values).
… quantize the parameter in the floating-point format to a parameter in the fixed-point format (Lin-1, Col. 2 Lines 15-21, “The method may also include determining quantizer parameters for quantizing values of the floating point machine learning network based at least in part on the at least one selected moment of the input distribution of the floating point machine learning network to obtain corresponding values of the fixed point machine learning network” teaches 
Lin-1 does not appear to explicitly teach apply a fractional length of a fixed-point format to the parameter in the floating-point format 
However, Lin-2, teaches apply a fractional length of a fixed-point format to the floating-point format (Lin-2, pg. 4 Section 3.3, “Note that determining the fixed point format is equivalent to determining the resolution, which in turn means identifying the number of fractional bits it requires to represent the number. The following equations can be used to compute the number of fractional bits: • Determine the effective standard deviation of the quantity being quantized: ξ. • Calculate step size via Table 1: s = ξ · Stepsize(β). • Compute number of fractional bits: n = −[log2 s]” teaches computing the number of fractional bits (corresponds to fraction length) of a fixed point format. Pg. 3 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations” teaches parameter in float point format and converting float to fixed point).
Lin-1 in view of Lin-2 are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 with Lin-2, with motivation to apply a fractional length of a fixed-point format to the parameter in the floating-point format. “We show that the naive method of quantizing all the layers in the DCN with uniform bit-width value results in 
Lin-1 in view of Lin-2 does not appear to explicitly teach perform an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process and based on a result of the operation with the ALU
However, Kum et al., teaches perform an operation with an integer arithmetic logic unit (ALU) to determine whether to round off a fixed point based on a most significant bit among bit values to be discarded after a quantization process (Kum et al., Fig. 9 and pg. 927 Section IV. B, “For a right shift, the most significant bits (MSBs) are sign extended and the MSB of truncated bits is used as the carry-in signal of the adders for rounding. For a left shift, the least significant bits (LSBs) are filled with zeros and the MSBs are thrown away, but overflows do not occur because the IWLs are carefully determined throughout the range estimation” teaches the arithmetic logic unit (see Fig. 9) and further teaches rounding of the adders based on the most significant bits and the MSBs being thrown away (corresponds to discarded) after quantization).
… based on a result of the operation with the ALU (Kum et al., Fig. 9 and pg. 928, teaches the operation with the arithmetic logic unit and its results).

Regarding Claim 18,
Lin-1 in view of Lin-2 teaches the neural network apparatus of claim 10, wherein the processor is further38012055.0456 configured to
Lin-1 further teaches to quantize the parameter in the floating-point format processed in the first layer back to a parameter in the fixed-point format (Lin-1, Col. 2 Lines 15-21, “The method may also include determining quantizer parameters for quantizing values of the floating point machine learning network based at least in part on the at least one selected moment of the input distribution of the floating point machine learning network to obtain corresponding values of the fixed point machine learning network” teaches determining quantized parameters for the floating point machine learning network to obtain corresponding values of the fixed point machine learning network).
Lin-2 et al. further teaches convert the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network (Lin-2 et al., pg. 2 Section 3, “In this section, we will propose an algorithm to convert a floating point DCN to fixed point. For a given layer of DCN the goal of conversion is to represent the input activations, the output activations, and the parameters of that layer in fixed point. This can be seen as a process of quantization” teaches a quantization process that converts the parameters of a given layer (corresponds to the first layer of the neural network) from floating point to fixed point. Pg. 3-4 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations. • Collect the statistics of weights, biases and activations for each layer. • Determine the fixed point formats of the weights, biases and activations for each layer” teaches the inputs (corresponds to input to the first layer of the neural network) which consist of weights, biases and activation (corresponds to the parameter) for the neural network being in floating point).
provide the parameter in the floating-point format to the first layer (Lin-2 et al., pg. 3-4 Section 3.3, “Any floating point DCN model can be converted to fixed point by following these steps: • Run a forward pass in floating point using a large set of typical inputs and record the activations. • Collect the statistics of weights, biases and activations for each layer. • Determine the fixed point formats of the weights, biases and activations for each layer” teaches the inputs (corresponds to input to the first layer of the neural network) which consist of weights, biases and activation (corresponds to the parameter) for the neural network being in floating point).
Lin-1 in view of Lin-2 are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 with Lin-2, with motivation for converting the quantized parameter in the fixed-point format to the floating-point format based on processing conditions of a first layer of the neural network that receives the parameter in the floating-point format, from among layers of the neural network and providing the parameter in the floating-point format to the first layer. “We show that the naive method of quantizing all the layers in the DCN with uniform bit-width value results in DCN networks with subpar performance in terms of error rates relative to our proposed approach of SQNR based optimization of bit-widths. Specifically, we present results for a floating point DCN trained CIFAR-10 benchmark, which on conversion to its fixed point counter-part results in >20 % reduction in model size without any loss in accuracy” (Lin-2, Conclusion). The proposed teaching is beneficial in that it helps in reduction of the model size without any loss in accuracy.
Lin-1 in view of Lin-2 does not appear to explicitly teach perform the operation with the integer ALU
However, Kum et al., teaches perform the operation with the integer ALU (Kum et al., Fig. 9 and pg. 928, teaches the operation with the arithmetic logic unit).  
Lin-1 in view of Lin-2 in view of Kum et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 and Lin-2 with Kum et al., with motivation to perform the operation with the integer ALU. “A combined WL optimization and high-level synthesis approach that results in a more efficient or cost-effective design when compared with the previous WL optimization followed by high-level synthesis approaches. The developed method also requires less time for optimization since the use of the hardware sharing information for signal grouping results in fewer signal groups” (Kum et al., Conclusion). The proposed teaching is beneficial in that it results in a more efficient or cost-effective design that also requires less time for optimization.
Regarding Claim 19,
Lin-1 in view of Lin-2 in view of Kum et al. teaches the method of claim 1
Lin-1 further teaches non-transitory computer-readable recording medium having recorded thereon a computer program, which, when executed by a computer, performs the method… (Lin-1, Col. 2 Lines 46-59, “A non-transitory computer-readable medium having program code recorded thereon for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer when executed by a processor may include program code to select at least one moment of an input distribution of the floating point machine learning network. The non-transitory computer-readable medium may further include program code to determine quantizer parameters for quantizing values of the floating point machine learning network based at least in part on the at least one selected moment of the input distribution of the floating point machine learning network to obtain corresponding values of the fixed point machine learning network” teaches a non-transitory computer-readable medium containing program code).
Claims 2-4 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Lin-1 in view of Lin-2 in view of Kum et al. and in further view of Rao et al. (“IMPLEMENTATION OF THE STANDARD FLOATING POINT MAC USING IEEE 754 FLOATING POINT ADDER”)
Regarding Claim 2,
Lin-1 in view of Lin-2 in view of Kum et al. teaches the method of claim 1, 
Kum et al. further teaches wherein the performing of the operation with the ALU comprises (Kum et al., Fig. 9 and pg. 928, teaches the operation with the arithmetic logic unit).  
Lin-1 in view of Lin-2 in view of Kum et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the 
Lin-1 in view of Lin-2 in view of Kum et al. does not appear to explicitly teach extracting a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format, calculating a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point, and calculating a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value
However, Rao et al., teaches extracting a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format (Rao et al., Fig. 1 and pg. 719 Section III.A, “In half precision, the field of IEEE 754 standard can be represented as, for the sign, 1-bit; for the exponent, 4-bits; and for the mantissa, 11-bits” teaches the sign, an exponent value, and a mantissa value  being extracted from  the field of IEEE 754 standard (corresponds to floating-point format)).
Rao et al., pg. 721 Section III.C, “The exponents, E1and E2 of the two operands of N1 and N2 are added up from which the bias of 7 for half precision is subtracted and the exponent value is finalized based on the carry propagated from the result of multiplication of the two mantissas” teaches determining the finalized exponent value (corresponds to second exponent value) based on the exponents, E1and E2 of the two operands of N1 and N2 (corresponds to the first exponent value). Pg. 717 Section II, “The fixed point MAC is constituted by the fixed point adder, fixed point multiplier and a shifter. The sampled values which are x(n) will be given as input to the shifter. The shifter will shift the value of ‘n’ for different samples starting from first sample n=0 to the last sample i.e., n=N-1 where ‘n’ indicates number of samples and ‘N’ indicates length of the filter” teaches determining the filter length (corresponds to fractional length) of the fixed-point format. Pg. 719 Section III.A, “IEEE 754 uses biased representation for exponent which is nothing but, Value of exponent = Val(E) = E-Bias, where Bias is a constant” teaches determining a bias constant based on IEEE 754 (corresponds to floating-point format)).
calculating a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value (Rao et al., pg. 720 Section III.B, “While adding the two floating point numbers the smallest number is to be identified so that eight bit subtractor is required for the exponent. Similarly, it requires one 2×1 multiplexer to select the input data depending upon the status of the select line; one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches determining the highest mantissa (corresponds to the second mantissa) with respect to the smallest mantissa (corresponds to the one mantissa) based upon the value of the two exponents (corresponds to the second exponent value)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, and Kum et al. with Rao et al., with motivation to extract a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format, calculate a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point, and calculate a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value. “Hence, to improve the performance of the traditional fixed point MAC, in this work we implemented the standard floating point MAC using IEEE 754 floating point adder. This can be used to design all floating point DSP processors through the 
Regarding Claim 3,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the method of claim 2, 
Rao et al. further teaches wherein the calculating of the second exponent value comprises: performing an integer operation of subtracting, from the first exponent value, the bias constant (Rao et al., pg. 719 Section III.A, “IEEE 754 uses biased representation for exponent which is nothing but, Value of exponent = Val(E) = E-Bias, where Bias is a constant” teaches calculating the value of exponent (corresponds to the second exponent) by subtracting the exponent (corresponds to the first exponent) from the Bias (corresponds to the bias constant)).
 calculating the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant (Rao et al., Figure 3 and pg. 720 Section III.B, “Let S1, E1, M1 are the sign, exponent and mantissa of the first floating point operands of N1 and S2, E2, M2 are the sign, exponent and mantissa of the second floating point operands N2, then for the standard floating point adder, the explanation of the algorithm is as follows: a) Initially, the system reads the two operands of N1 and N2 for denormalization and infinity. Set the hidden bit of the fraction to 0 if numbers are denormalized otherwise set to 1. b) Using the 4-bit subtractor, the two exponents E1, E2 are compared. If E1 is less than E2, N1 and N2 are swapped which means that previous M2 is now referred to as M1 and vice versa. c) The smaller fraction, M2 is shifted right by the absolute difference result of the two exponents’ subtraction. Now both the numbers have the same exponent. d) Now the two mantissas of M1 and M2 are added. e) For the normalization, after addition the result is then passed through a leading one detector. f) Using the results from the leading one detector, if it is needed, the result is then shifted right by 1 bit to complete the normalization process. g) After normalization, using the default rounding mode the result is rounded to the nearest value. h) The exponent is adjusted using the results from the leading one detector. i) The sign is computed depending on the value of exponents of E1 and E2 which means that whichever the exponent is the maximum, that sign is computed. The result is registered after the overflow and underflow check” teaches an addition algorithm that determining the adjusted exponent (corresponds to second exponent) adding the fractions of M1 and M2 (corresponds to fractional length)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, and Kum et al. with Rao et al., with motivation wherein the calculating of the second exponent value comprises: performing an integer operation of subtracting, from the first exponent value, the bias constant and calculating the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant. “Hence, to improve the performance of the 
Regarding Claim 4,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the method of claim 2, 
Rao et al. further teaches wherein the calculating of the second mantissa value comprises (Rao et al., pg. 720 Section III.B, “While adding the two floating point numbers the smallest number is to be identified so that eight bit subtractor is required for the exponent. Similarly, it requires one 2×1 multiplexer to select the input data depending upon the status of the select line; one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches determining the highest mantissa (corresponds to the second mantissa) with respect to the smallest mantissa (corresponds to the one mantissa) based upon the value of the two exponents).
updating the first mantissa value by adding a bit value of 1 to a position before the first mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “Let S1, E1, M1 are the sign, exponent and mantissa of the first floating point operands of N1 and S2, E2, M2 are the sign, exponent and mantissa of the second floating point operands N2, then for the standard floating point adder, the explanation of the algorithm is as follows: a) Initially, the system reads the two operands of N1 and N2 for denormalization and infinity. Set the hidden bit of the fraction to 0 if numbers are denormalized otherwise set to 1. b) Using the 4-bit subtractor, the two exponents E1, E2 are compared. If E1 is less than E2, N1 and N2 are swapped which means that previous M2 is now referred to as M1 and vice versa. c) The smaller fraction, M2 is shifted right by the absolute difference result of the two exponents’ subtraction. Now both the numbers have the same exponent. d) Now the two mantissas of M1 and M2 are added. e) For the normalization, after addition the result is then passed through a leading one detector. f) Using the results from the leading one detector, if it is needed, the result is then shifted right by 1 bit to complete the normalization process. g) After normalization, using the default rounding mode the result is rounded to the nearest value. h) The exponent is adjusted using the results from the leading one detector. i) The sign is computed depending on the value of exponents of E1 and E2 which means that whichever the exponent is the maximum, that sign is computed. The result is registered after the overflow and underflow check” teaches updating the mantissa (corresponds to the first mantissa value) by shifting the result right by 1 bit (corresponds to adding a bit value of 1 to a position) to complete the normalization process).
comparing a number of bits of the first mantissa value with a number of bits of the 34012055.0456 second mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches comparing the smallest mantissa (corresponds to the first mantissa value) with the number of bits of the biggest mantissa value (corresponds to the second mantissa value)).
shifting the updated first mantissa value to the right, based on a result of the comparing of the number of bits of the first mantissa value with the number of bits of the second mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches shifting the smallest mantissa (corresponds to the first mantissa value) to the right side bit by bit after comparing the bits to the biggest mantissa value (corresponds to the second mantissa value) to make equal).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, and Kum et al. with Rao et al., with motivation of updating the first mantissa value by adding a bit value of 1 to a position before the first mantissa value, comparing a number of bits of the first mantissa value with a number of bits of the 34012055.0456 second mantissa value, and shifting the updated first mantissa value to the right, based on a result of the comparing 
Regarding Claim 11,
Lin-1 in view of Lin-2 in view of Kum et al. teaches the neural network apparatus of claim 10, 
Lin-1 further teaches wherein the processor is further configured to (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Lin-1 in view of Lin-2 in view of Kum et al. does not appear to explicitly teach 36012055.0456extract a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format, calculate a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point, and calculate a second mantissa value by performing a bit manipulation operation and an integer operation with respect to the first mantissa value, based on the second exponent value.  
However, Rao et al., teaches extract a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format (Rao et al., Fig. 1 and pg. 719 Section III. A, “In half precision, the field of IEEE 754 standard can be represented as, for the sign, 1-bit; for the exponent, 4-bits; and for the mantissa, 11-bits” teaches the sign, an exponent value, and a mantissa value  being extracted from  the field of IEEE 754 standard (corresponds to floating-point format)).
calculate a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point (Rao et al., pg. 721 Section III.C, “The exponents, E1and E2 of the two operands of N1 and N2 are added up from which the bias of 7 for half precision is subtracted and the exponent value is finalized based on the carry propagated from the result of multiplication of the two mantissas” teaches determining the finalized exponent value (corresponds to second exponent value) based on the exponents, E1and E2 of the two operands of N1 and N2 (corresponds to the first exponent value). Pg. 717 Section II, “The fixed point MAC is constituted by the fixed point adder, fixed point multiplier and a shifter. The sampled values which are x(n) will be given as input to the shifter. The shifter will shift the value of ‘n’ for different samples starting from first sample n=0 to the last sample i.e., n=N-1 where ‘n’ indicates number of samples and ‘N’ indicates length of the filter” teaches determining the filter length (corresponds to fractional length) of the fixed-point format. Section III.A, “IEEE 754 uses biased representation for exponent which is nothing but, Value of exponent = Val(E) = E-Bias, where Bias is a constant” teaches determining a bias constant based on IEEE 754 (corresponds to floating-point format)).
Rao et al., pg. 720 Section III.B, “While adding the two floating point numbers the smallest number is to be identified so that eight bit subtractor is required for the exponent. Similarly, it requires one 2×1 multiplexer to select the input data depending upon the status of the select line; one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches determining the highest mantissa (corresponds to the second mantissa) with respect to the smallest mantissa (corresponds to the one mantissa) based upon the value of the two exponents (corresponds to the second exponent value)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, and Kum et al. with Rao et al., with motivation to extract a sign, a first exponent value, and a first mantissa value from the parameter in the floating-point format, calculate a second exponent value based on the first exponent value, the fractional length of the fixed-point format, and a bias constant that is determined based on a format of the floating-point, and calculate a second mantissa value by performing a bit manipulation operation and 
Regarding Claim 12,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the neural network apparatus of claim 11, 
Lin-1 further teaches wherein the processor is further configured to (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Rao et al. further teaches perform an integer operation of subtracting, from the first exponent value, the bias constant (Rao et al., pg. 719 Section III.A, “IEEE 754 uses biased representation for exponent which is nothing but, Value of exponent = Val(E) = E-Bias, where Bias is a constant” teaches calculating the value of exponent (corresponds to the second exponent) by subtracting the exponent (corresponds to the first exponent) from the Bias (corresponds to the bias constant)).
calculate the second exponent value by performing an integer operation of adding the fractional length to a result of the integer operation of subtracting the bias constant (Rao et al., Figure 3 and pg. 720 Section III.B, “Let S1, E1, M1 are the sign, exponent and mantissa of the first floating point operands of N1 and S2, E2, M2 are the sign, exponent and mantissa of the second floating point operands N2, then for the standard floating point adder, the explanation of the algorithm is as follows: a) Initially, the system reads the two operands of N1 and N2 for denormalization and infinity. Set the hidden bit of the fraction to 0 if numbers are denormalized otherwise set to 1. b) Using the 4-bit subtractor, the two exponents E1, E2 are compared. If E1 is less than E2, N1 and N2 are swapped which means that previous M2 is now referred to as M1 and vice versa. c) The smaller fraction, M2 is shifted right by the absolute difference result of the two exponents’ subtraction. Now both the numbers have the same exponent. d) Now the two mantissas of M1 and M2 are added. e) For the normalization, after addition the result is then passed through a leading one detector. f) Using the results from the leading one detector, if it is needed, the result is then shifted right by 1 bit to complete the normalization process. g) After normalization, using the default rounding mode the result is rounded to the nearest value. h) The exponent is adjusted using the results from the leading one detector. i) The sign is computed depending on the value of exponents of E1 and E2 which means that whichever the exponent is the maximum, that sign is computed. The result is registered after the overflow and underflow check” teaches an addition algorithm that determining the adjusted exponent (corresponds to second exponent) adding the fractions of M1 and M2 (corresponds to fractional length)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem 
Regarding Claim 13,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the neural network apparatus of claim 11, 
Lin-1 further teaches wherein the processor is further configured to (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Rao et al. further teaches update the first mantissa value by adding a bit value of 1 to a position before the first mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “Let S1, E1, M1 are the sign, exponent and mantissa of the first floating point operands of N1 and S2, E2, M2 are the sign, exponent and mantissa of the second floating point operands N2, then for the standard floating point adder, the explanation of the algorithm is as follows: a) Initially, the system reads the two operands of N1 and N2 for denormalization and infinity. Set the hidden bit of the fraction to 0 if numbers are denormalized otherwise set to 1. b) Using the 4-bit subtractor, the two exponents E1, E2 are compared. If E1 is less than E2, N1 and N2 are swapped which means that previous M2 is now referred to as M1 and vice versa. c) The smaller fraction, M2 is shifted right by the absolute difference result of the two exponents’ subtraction. Now both the numbers have the same exponent. d) Now the two mantissas of M1 and M2 are added. e) For the normalization, after addition the result is then passed through a leading one detector. f) Using the results from the leading one detector, if it is needed, the result is then shifted right by 1 bit to complete the normalization process. g) After normalization, using the default rounding mode the result is rounded to the nearest value. h) The exponent is adjusted using the results from the leading one detector. i) The sign is computed depending on the value of exponents of E1 and E2 which means that whichever the exponent is the maximum, that sign is computed. The result is registered after the overflow and underflow check” teaches updating the mantissa (corresponds to the first mantissa value) by shifting the result right by 1 bit (corresponds to adding a bit value of 1 to a position) to complete the normalization process).
compare a number of bits of the first mantissa value with a number of bits of the second mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches comparing the smallest mantissa (corresponds to the first mantissa value) with the number of bits of the biggest mantissa value (corresponds to the second mantissa value)).
shift the updated first mantissa value to the right, based on a result of the comparing of the number of bits of the first mantissa value with the number of bits of the second mantissa value (Rao et al., Figure 3 and pg. 720 Section III.B, “one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches shifting the smallest mantissa (corresponds to the first mantissa value) to the right side bit by bit after comparing the bits to the biggest mantissa value (corresponds to the second mantissa value) to make equal).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, and Kum et al. with Rao et al., with motivation to update the first mantissa value by adding a bit value of 1 to a position before the first mantissa value, compare a number of bits of .
Claims 5-6, 8, 14-15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. and in further view of Lutz et al. (US 20160092168 A1) and Guardia et al. (“FPGA implementation of a binary32 floating point cube root”)
Regarding Claim 5,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the method of claim 4
Rao et al. further teaches wherein the calculating of the second mantissa value further comprises (Rao et al., pg. 720 Section III.B, “While adding the two floating point numbers the smallest number is to be identified so that eight bit subtractor is required for the exponent. Similarly, it requires one 2×1 multiplexer to select the input data depending upon the status of the select line; one 32 bit swap register to swap the smallest mantissa with the highest mantissa; one 16 bit register to shift right the smallest mantissa to the right side bit by bit to make equal the smallest mantissa to the biggest mantissa value depending upon the difference value of the two exponents” teaches determining the highest mantissa (corresponds to the second mantissa) with respect to the smallest mantissa (corresponds to the one mantissa) based upon the value of the two exponents). 
Lin-1 in view of Lin-2 in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1 and Lin-2 with Rao et al., with motivation wherein the calculating of the second mantissa value further comprises. “Hence, to improve the performance of the traditional fixed point MAC, in this work we implemented the standard floating point MAC using IEEE 754 floating point adder. This can be used to design all floating point DSP processors through the standard floating point MAC” (Rao et al., Abstract). The proposed teaching is beneficial in that it helps improve the performance of the traditional MAC.
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. does not appear to explicitly teach shifting the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point, calculating the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value and wherein the LSB value is a factor that determines whether to round off the fixed point.
Lutz et al., teaches shifting the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point (Lutz et al., Para. [0028], “The exponent is biased, which means that the true exponent differs from the one stored in the number. For example, biased SP exponents are 8-bits long and range from 0 to 255. Exponents 0 and 255 are special cases, but all other exponents have bias 127, meaning that the true exponent is 127 less than the biased exponent. The smallest biased exponent is 1, which corresponds to a true exponent of −126. The maximum biased exponent is 254, which corresponds to a true exponent of 127. HP and DP exponents work the same way, with the biases indicated in the table above” teaches determining when true exponent (corresponds to the second exponent value) being less than the number of bits of the biased exponent (corresponds to the first mantissa value)).
… calculating the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value (Lutz et al., Para [0045], “If we convert an FP number to integer or fixed-point we also have to round. The concept is basically the same as FP rounding” teaches rounding off the fixed-point number. Fig. 2-3 and Para. [0048], “The first floating-point value is placed in a register 22. A 3-input multiplexer selects the appropriate 64 bits to be input to the right shifter 12, according to one of the formats shown in FIG. 3” teaches a right shifter Fig. 8 and Para [0076], “FIG. 8 is a flow diagram showing how to determine the rounding increment at step 130 of FIG. 6. At step 200 the control circuitry 14 obtains the least significant bit L0, guard bit G0 and sticky bit S0 from the shifter 12. At step 202 it is determined whether the conversion is to an integer or fixed-point format and the first value is negative. If the second value is a floating-point value or the value is positive, then the least, guard and sticky bits L, G, S used for the rounding are the same as the bits L0, G0, S0 generated by the shifter 12(step 204)” teaches obtaining the least significant bit (LSB) to determine conversion to an integer or fixed-point format (corresponds to adding the obtained LSB)).
wherein the LSB value is a factor that determines whether to round off the fixed point (Lutz et al., Fig. 8 and Para [0076], “FIG. 8 is a flow diagram showing how to determine the rounding increment at step 130 of FIG. 6. At step 200 the control circuitry 14 obtains the least significant bit L0, guard bit G0 and sticky bit S0 from the shifter 12. At step 202 it is determined whether the conversion is to an integer or fixed-point format and the first value is negative. If the second value is a floating-point value or the value is positive, then the least, guard and sticky bits L, G, S used for the rounding are the same as the bits L0, G0, S0 generated by the shifter 12(step 204). If the conversion is to an integer or fixed-point value and the first value is negative, then at step 206 the bits LGS used for the rounding and determination are set according to: S=S0, G=(G0 ̂ S0), L=(L0 ̂ (G0|S0)). At step 208, the control circuitry 214 determines the rounding increment from L, G, S based on the rules set out in Table 1 above” teaches determining round increment from obtaining the least significant bit).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., and Rao et al. with Lutz et al., with motivation of shifting the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined based on a type of a floating point-format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point, calculating the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value and wherein the LSB value is a factor that determines whether to round off the fixed point. “Also, a value may have an integer format, representing an integer value with no fractional bits, or a fixed-point format, representing a numeric value using a fixed number of integer-valued bits and a fixed number of fractional-valued bits. In an apparatus supporting more than one format, it may be desirable to convert between the different formats and so a conversion operation may be performed. The present technique seeks to provide an improved apparatus and method for converting from a floating-point value to a value of a different format” (Lutz et 
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. does not appear to explicitly teach extracting a least significant bit (LSB) value from the shifted first mantissa value
However, Guardia et al., teaches extracting a least significant bit (LSB) value from the shifted first mantissa value (Guardia et al., pg. 3-4 Section V, “The least significant bit (LSB), guard (G), round (R) and sticky (STK) bit are generated capturing the LSB and subsequent bits of Q. The 3-bit MSB of the captured data represents the LSB, G, and R respectively. The STK is obtained by means of or-chain operations of the remaining bits of the captured data… In the rounding stage, special cases as overflow and underflow are tested again. In compliance with IEEE 754–2008 standard the mantissa is defined into ã1, 2]. Therefore a left-shift operation by 1-bit on Fcr is executed if its respective MSB is zero” teaches obtaining the least significant bit value from the 3-bit MSB of the capture data from the two shifted mantissa).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., Rao et al., and Lutz et al. with Guardia et al., with motivation of extracting a least significant bit (LSB) value from the shifted first 
Regarding Claim 6,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Guardia et al. in view of Lutz et al. teaches the method of claim 5, wherein the quantizing comprises: 
Lutz et al. further teaches tuning a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format (Lutz et al., Fig. 4 and Para [0049], “FIG. 4 shows different examples of second value formats that can be generated by the conversion circuitry. It will be appreciated that other formats could also be supported. The top two rows of FIG. 4 show examples where the second value is another floating-point value with a smaller significand than the first value (e.g. single or half precision compared to double or single precision for the first value). The last three examples show 64-bit, 32-bit and 16-bit fixed-point or integer values. If a fixed-point value is to be generated, a radix position parameter 24 is input to the first adder 10 as shown in FIG. 2, to indicate the number of fractional bits in the fixed-point value” teaches the second value (corresponds to second mantissa value) having the same bit as the fixed-point value).
quantizing the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value (Lutz et al., Para. [0021], “In this case, then the conversion circuitry may have shift control circuitry which determines the shift amount based on at least one control parameter which specifies one or both of the formats of the first and second values. For example, a conversion instruction which triggers the conversion circuitry to perform the conversion operation may specify the at least one control parameter for controlling the shift control circuitry to determine the appropriate shift amount” teaches the parameter specifying the format of the first and second value. Para. [0023], “The conversion circuitry may comprise inverting circuitry to invert the significand of the first floating-point value or the output of the shift circuitry if the first floating-point value represents a negative value and the second value is a fixed-point or integer value. Floating-point values are represented using sign-magnitude representation, while fixed-point or integer values are represented using two's complement representation. Therefore, when converting between floating-point values and fixed-point or integer values, an inversion may be applied to preserve the sign of the value” teaches converting the floating point value to fixed-point or integer value (corresponds to quantization) by applying the preserved sign of the value to the second value (corresponds to the tuned second mantissa value)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., Rao et al., and Lutz et al. with Guardia et al., 
Regarding Claim 8,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. teaches the method of claim 5, wherein 

Lutz et al. further teaches when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a Lutz et al., Para. [0027]: 
    PNG
    media_image1.png
    113
    388
    media_image1.png
    Greyscale

teaches for the SP (corresponds to single-precision floating-point format), the bias constant is 127, number of bits of the first mantissa value is 23 bits, and the predetermined number is 22).
when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is a decimal number of 51 (Lutz et al., Para. [0027], teaches for the DP (corresponds to double-precision floating-point format), the bias constant is 1023, number of bits of the first mantissa value is 52 bits, and the predetermined number is 51).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., Rao et al., and Lutz et al. with Guardia et al., with motivation of when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal 
Regarding Claim 14,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. teaches the neural network apparatus of claim 13
Lin-1 further teaches wherein the processor is further configured to (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. does not appear to explicitly teach shift the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number 
However, Lutz et al., teaches shift the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined depending on the type of a floating-point format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point (Lutz et al., Para. [0028], “The exponent is biased, which means that the true exponent differs from the one stored in the number. For example, biased SP exponents are 8-bits long and range from 0 to 255. Exponents 0 and 255 are special cases, but all other exponents have bias 127, meaning that the true exponent is 127 less than the biased exponent. The smallest biased exponent is 1, which corresponds to a true exponent of −126. The maximum biased exponent is 254, which corresponds to a true exponent of 127. HP and DP exponents work the same way, with the biases indicated in the table above” teaches determining when true exponent (corresponds to the second exponent value) being less than the number of bits of the biased exponent (corresponds to the first mantissa value)).
… calculate the second mantissa value by determining whether to round off the fixed point 37012055.0456 by shifting the shifted first mantissa value to the right by 1 one more time Lutz et al., Para [0045], “If we convert an FP number to integer or fixed-point we also have to round. The concept is basically the same as FP rounding” teaches rounding off the fixed-point number. Fig. 2-3 and Para. [0048], “The first floating-point value is placed in a register 22. A 3-input multiplexer selects the appropriate 64 bits to be input to the right shifter 12, according to one of the formats shown in FIG. 3” teaches a right shifter (corresponds to shifting value to the right by 1 or more) that shifts the shifted first floating point-value (corresponds to the shifted first mantissa value). Fig. 8 and Para [0076], “FIG. 8 is a flow diagram showing how to determine the rounding increment at step 130 of FIG. 6. At step 200 the control circuitry 14 obtains the least significant bit L0, guard bit G0 and sticky bit S0 from the shifter 12. At step 202 it is determined whether the conversion is to an integer or fixed-point format and the first value is negative. If the second value is a floating-point value or the value is positive, then the least, guard and sticky bits L, G, S used for the rounding are the same as the bits L0, G0, S0 generated by the shifter 12(step 204)” teaches obtaining the least significant bit (LSB) to determine conversion to an integer or fixed-point format (corresponds to adding the obtained LSB)).
wherein the LSB value is a factor that determines whether to round off the fixed point (Lutz et al., Fig. 8 and Para [0076], “FIG. 8 is a flow diagram showing how to determine the rounding increment at step 130 of FIG. 6. At step 200 the control circuitry 14 obtains the least significant bit L0, guard bit G0 and sticky bit S0 from the shifter 12. At step 202 it is determined whether the conversion is to an integer or fixed-point format and the first value is negative. If the second value is a floating-point value or the value is positive, then the least, guard and sticky bits L, G, S used for the rounding are the same as the bits L0, G0, S0 generated by the shifter 12(step 204). If the conversion is to an integer or fixed-point value and the first value is negative, then at step 206 the bits LGS used for the rounding and determination are set according to: S=S0, G=(G0 ̂ S0), L=(L0 ̂ (G0|S0)). At step 208, the control circuitry 214 determines the rounding increment from L, G, S based on the rules set out in Table 1 above” teaches determining round increment from obtaining the least significant bit).
Lin-1 in view of Lin-2 in view of Kum et al. Rao et al. in view of Lutz et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., and Rao et al. with Lutz et al., with motivation to shift the updated first mantissa value to the right by a value obtained by subtracting the second exponent value from a predetermined number determined depending on the type of a floating-point format when it is determined that the second exponent value is less than the number of bits of the first mantissa value, in order to determine whether to round off the fixed point, calculate the second mantissa value by determining whether to round off the fixed point by shifting the shifted first mantissa value to the right by 1 one more time and adding the extracted LSB value and wherein the LSB value is a factor that determines whether to round off the fixed point. “Also, a value may have an integer format, 
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. does not appear to explicitly teach extract a least significant bit (LSB) value from the shifted first mantissa value
However, Guardia et al., teaches extract a least significant bit (LSB) value from the shifted first mantissa value (Guardia et al., pg. 3-4 Section V, “The least significant bit (LSB), guard (G), round (R) and sticky (STK) bit are generated capturing the LSB and subsequent bits of Q. The 3-bit MSB of the captured data represents the LSB, G, and R respectively. The STK is obtained by means of or-chain operations of the remaining bits of the captured data… In the rounding stage, special cases as overflow and underflow are tested again. In compliance with IEEE 754–2008 standard the mantissa is defined into ã1, 2]. Therefore a left-shift operation by 1-bit on Fcr is executed if its respective MSB is zero” teaches obtaining the least significant bit value from the 3-bit MSB of the capture data from the two shifted mantissa).

Regarding Claim 15,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. teaches the neural network apparatus of claim 14
Lin-1 further teaches wherein the processor is further configured to (Lin-1, Col. 2 Lines 33-36, “An apparatus for quantizing a floating point machine learning network to obtain a fixed point machine learning network using a quantizer may include a memory unit and at least one processor coupled to the memory unit” teaches at least one processor).
Lutz et al. further teaches tune a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format (Lutz et al., Fig. 4 and Para [0049], “FIG. 4 shows different examples of second value formats that can be generated by the conversion circuitry. It will be appreciated that other formats could also be supported. The top two rows of FIG. 4 show examples where the second value is another floating-point value with a smaller significand than the first value (e.g. single or half precision compared to double or single precision for the first value). The last three examples show 64-bit, 32-bit and 16-bit fixed-point or integer values. If a fixed-point value is to be generated, a radix position parameter 24 is input to the first adder 10 as shown in FIG. 2, to indicate the number of fractional bits in the fixed-point value” teaches the second value (corresponds to second mantissa value) having the same bit as the fixed-point value).
quantize the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value (Lutz et al., Para. [0021], “In this case, then the conversion circuitry may have shift control circuitry which determines the shift amount based on at least one control parameter which specifies one or both of the formats of the first and second values. For example, a conversion instruction which triggers the conversion circuitry to perform the conversion operation may specify the at least one control parameter for controlling the shift control circuitry to determine the appropriate shift amount” teaches the parameter specifying the format of the first and second value. Para. [0023], “The conversion circuitry may comprise inverting circuitry to invert the significand of the first floating-point value or the output of the shift circuitry if the first floating-point value represents a negative value and the second value is a fixed-point or integer value. Floating-point values are represented using sign-magnitude representation, while fixed-point or integer values are represented using two's complement representation. Therefore, when converting between floating-point values and fixed-point or integer values, an inversion may be applied to preserve the sign of the value” teaches converting the floating point value to fixed-point or integer value (corresponds to quantization) by applying the preserved sign of the value to the second value (corresponds to the tuned second mantissa value)).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “neural network” and “”quantization”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Lin-1, Lin-2, Kum et al., Rao et al., and Lutz et al. with Guardia et al., with motivation to tune a number of bits of the calculated second mantissa value to be equal to a number of bits of the fixed-point format and quantize the parameter in the floating-point format to the fixed-point format by applying the extracted sign to the tuned second mantissa value. “Also, a value may have an integer format, representing an integer value with no fractional bits, or a fixed-point format, representing a numeric value using a fixed number of integer-valued bits and a fixed number of fractional-valued bits. In an apparatus supporting more than one format, it may be desirable to convert between the different formats and so a conversion operation may be performed. The present technique seeks to provide an improved apparatus and method for converting from a floating-point value to a value of a different format” (Lutz et al., Abstract). The proposed teaching is beneficial in that it is capable of converting values to different formats so a conversion operation may be performed.
Regarding Claim 17,
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. teaches the neural network apparatus of claim 14, wherein 
when the floating-point format is a single-precision floating-point format, the bias constant is a decimal number of 127, the number of bits of the first mantissa value is a decimal number of 23, and the predetermined number is a decimal number of 22 (Lutz et al., Para. [0027]: 
    PNG
    media_image1.png
    113
    388
    media_image1.png
    Greyscale

teaches for the SP (corresponds to single-precision floating-point format), the bias constant is 127, number of bits of the first mantissa value is 23 bits, and the predetermined number is 22).
when the floating-point format is a double-precision floating-point format, the bias constant is a decimal number of 1023, the number of bits of the first mantissa value is a decimal number of 52, and the predetermined number is a decimal number of 51 (Lutz et al., Para. [0027], teaches for the DP (corresponds to double-precision floating-point format), the bias constant is 1023, number of bits of the first mantissa value is 52 bits, and the predetermined number is 51).
Lin-1 in view of Lin-2 in view of Kum et al. in view of Rao et al. in view of Lutz et al. in view of Guardia et al. are analogous art because they are from the same field of 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 7:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HENRY TRONG NGUYEN/Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125