DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 07/12/2022. In the current amendments, claims 1-4 and 7-11 are amended. Claims 1-11 are pending and have been examined.
In response to amendments and remarks filed on 07/12/2022, the 35 U.S.C. 112(a) rejection to claims 1-11, the 35 U.S.C. 112(b) rejection to claims 2 and 7-9, the 35 U.S.C. 101 software per se rejection to claims 1-9 and 11, and the 35 U.S.C. 102(a)(1) rejection to claims 1, 4-7, and 10-11 put forth in the previous Office Action have been withdrawn. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/29/2021. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. All documents listed on the IDS submitted on 11/29/2021 have been considered except for the following:
KUBO, Speech Recognition with Deep Neural Net: After Its First Introduction, Journal of the Japanese Society for Artificial Intelligence, March 2016, pp. 180-188, vol. 31, no. 2
KAMYIA et al., Acceleration of discrimination calculation by Binarized-DCNN and model compression, IEICE Technical Report PRMU2016-122, December 15-16, 2016, pp. 47-52, vol. 116, no. 366
TAKEDA et al., Acoustic Model Training based on Weight Boundary Model for Discrete Deep Neural Networks, JSAI Technical Report SIG-Challenge-046-02, November 9, 2016, pp. 2-11

The above documents have not considered because they are not in the English language, and neither an English translation nor concise explanation of relevance has been provided for each of the documents. See MPEP 609.04(a). The IDS submitted on 11/29/2021 indicates “see EP Search Report below for concise relevance”; however, the European Search Report dated October 14, 2021 does not provide any description of concise relevance for the above documents.  

Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required:
Claim 1 recites “An information processing apparatus comprising: circuitry”; however, the Specification does not recite “circuitry”.
Claim 11 recites “computer-readable storage medium”; however, the Specification does not recite this phrase, and therefore the Specification does not provide antecedent basis for this claimed subject matter. A “removable medium 1012” that is computer readable and stores information is described in Specification paragraphs [0133]-[0136].

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-11 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.


Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 1 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution...wherein the predetermined probability distribution is determined based on a quantization target.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation)...wherein the predetermined probability distribution is determined based on a quantization target (determining a predetermined probability distribution based on a quantization target corresponds to mathematical relationships and mathematical calculations of determining a probability distribution).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 2,
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 2 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the machine learning operation is an operation in deep learning, and the quantization is performed on a basis of a distribution of gradients calculated by the machine learning operation based on the deep learning being based on the predetermined probability distribution.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the machine learning operation is an operation in deep learning, and the quantization is performed on a basis of a distribution of gradients calculated by the machine learning operation based on the deep learning being based on the predetermined probability distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with deep learning operation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 3 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the quantization is performed when a value obtained by machine learning 
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and generally linking the use of a judicial exception to a particular technological environment or field of use language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the quantization is performed when a value obtained by machine learning (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the additional element limitation “machine learning in one apparatus is supplied to another apparatus in distributed learning in which machine learning is performed by a plurality of apparatuses in a distributed manner” amounts to generally linking the use of a judicial exception to a particular technological environment or field of use, namely the distributed machine learning environment, therefore the limitation does not integrate a judicial exception into a practical application. See MPEP 2106.05(h) (“limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application”). Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the additional element limitation “machine learning in one apparatus is supplied to another apparatus in distributed learning in which machine learning is performed by a plurality of apparatuses in a distributed manner” amounts to generally linking the use of a judicial exception to a particular technological environment or field of use, namely the distributed machine learning environment, therefore the limitation does not amount to significantly more than the exception itself. See MPEP 2106.05(h) (“limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not amount to significantly more than the exception itself, and cannot integrate a judicial exception into a practical application”). Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 4 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the predetermined probability distribution is a distribution that forms a left-right symmetrical graph with a peak value as a central axis of symmetry
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the predetermined probability distribution is a distribution that forms a left-right symmetrical graph with a peak value as a central axis of symmetry (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution of a specific form (“left-right symmetrical graph with a peak value as a central axis”) and calculations associated with machine learning operation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 5 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the predetermined probability distribution is a distribution for which one mean or one median is calculable
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the predetermined probability distribution is a distribution for which one mean or one median is calculable (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution of a specific form (“a distribution for which one mean or one median is calculable”) and calculations associated with machine learning operation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 6 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the predetermined probability distribution is any one of a normalized distribution, a Laplace distribution, a Cauchy distribution, and a Student-T distribution
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the predetermined probability distribution is any one of a normalized distribution, a Laplace distribution, a Cauchy distribution, and a Student-T distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution of a specific form and calculations associated with machine learning operation).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.


Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 7 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein a constant of a function of the predetermined probability distribution is obtained from the values calculated by the machine learning operation
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein a constant of a function of the predetermined probability distribution is obtained from the values calculated by the machine learning operation (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation and a constant of a function of the predetermined probability distribution).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 8 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein a ratio of quantization is set, a value in the predetermined probability distribution corresponding to the ratio is set as a threshold value of the values calculated by the machine learning operation, and at least one of a value equal to or larger than the threshold value or equal to or smaller than the threshold value is extracted
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) and mental processes – concepts performed in the human mind (including an observation, evaluation, judgment, opinion) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein a ratio of quantization is set (setting a ratio of quantization corresponds to establishing a mathematical relationship between at least two numbers), a value in the predetermined probability distribution corresponding to the ratio is set as a threshold value of the values calculated by the machine learning operation (setting a threshold value corresponds to evaluation and judgment), and at least one of a value equal to or larger than the threshold value or equal to or smaller than the threshold value is extracted (extracting a value by comparing the value to a threshold corresponds to evaluation and judgment).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 9 is directed to an information processing apparatus, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
wherein the quantization is performed for a gradient itself as the quantization target or for a cumulative gradient obtained by cumulatively adding the gradients as the quantization target.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses wherein the quantization is performed for a gradient itself as the quantization target or for a cumulative gradient obtained by cumulatively adding the gradients as the quantization target (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization for the gradient itself or by adding the gradients, which correspond to mathematical calculations).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 10 is directed to an information processing method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution;...wherein the predetermined probability distribution is determined based on a quantization target
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation)...wherein the predetermined probability distribution is determined based on a quantization target (determining a predetermined probability distribution based on a quantization target corresponds to mathematical relationships and mathematical calculations of determining a probability distribution).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the recitation of “transmitting a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The recitation of “transmitting a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.
Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1 Analysis: Claim 11 is directed a non-transitory computer-readable storage medium, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The following limitation:
a method, the method comprising: performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution...wherein the predetermined probability distribution is determined based on a quantization target.
as drafted, under the broadest reasonable interpretation, covers mathematical concepts (mathematical relationships, mathematical formulas or equations, mathematical calculations) but for the recitation of mere instructions to apply an exception language and insignificant extra-solution activity language. In particular, the above limitation in the context of this claim encompasses a method, the method comprising: performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation)...wherein the predetermined probability distribution is determined based on a quantization target (determining a predetermined probability distribution based on a quantization target corresponds to mathematical relationships and mathematical calculations of determining a probability distribution).
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element of “A non-transitory computer-readable storage medium having embodied thereon a program, which when executed by a computer causes the computer to execute”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmitting a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d).  
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmitting a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”). Therefore, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over LIN et al. (US 2016/0328646 A1) in view of Zhou et al. (“DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH GRADIENTS”) and further in view of Dryden et al. (“Communication Quantization for Data-parallel Training of Deep Neural Networks”).
Regarding Claim 1,
LIN et al. teaches An information processing apparatus comprising: circuitry configured to (Fig. 1 teaches an information processing apparatus; pg. 8 [0086] teaches circuitry)
perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (pg. 6 [0064]: “Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gannna distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance. Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). In one configuration, both the weights and activation values are assumed to have Gaussian distributions, however, other distributions are also contemplated” teaches performing quantization to weights, bias, and activation values in a neural network assuming determination of step size based on a predetermined probability distribution (including Gaussian, Laplacian, and Gannna distributions) wherein the determination of step size is calculated with the assumption that the distribution has zero mean; pg. 6 [0069]: “modifications of the input distribution 700 of the activation value calculations are performed so that the activation values have a zero mean. In this aspect of the disclosure, modifications of the weight and/or activation value calculations is performed by absorbing the mean value(μ) into a bias of the artificial neural network, for example, as shown in FIG. 7B” teaches determination of step size based on a predetermined probability distribution assuming the distribution has zero mean includes determining a modified distribution of values (corresponds to calculating a distribution of values) by absorbing the mean value(μ) into a bias of the artificial neural network (corresponds to a machine learning operation); also see pg. 5 [0063], which teaches input distributions can be in the form of probability distributions).
LIN et al. does not appear to explicitly teach wherein the predetermined probability distribution is determined based on a quantization target.
However, Zhou et al. teaches wherein the predetermined probability distribution is determined based on a quantization target (pg. 5 third to fourth full paragraphs:

    PNG
    media_image1.png
    320
    775
    media_image1.png
    Greyscale

teaches determining to incorporate a uniform distribution (corresponds to a predetermined probability distribution) based on the quantization target of gradient quantization).
LIN et al. and Zhou et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Zhou et al. to the disclosed invention of LIN et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage an extra noise function based on a uniform distribution because artificial noise is critical for achieving good performance in gradient quantization (Zhou et al. pg. 5 third to fourth full paragraphs).
LIN et al. in view of Zhou et al. does not appear to explicitly teach transmit a result of the performed quantization.
However, Dryden et al. teaches transmit a result of the performed quantization (pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teaches communicating (transmitting) result of performed quantization).
LIN et al., Zhou et al., and Dryden et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Dryden et al. to the disclosed invention of LIN et al. in view of Zhou et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “data parallelism can step in to allow continued scalability by doing separate matrix operations on fewer processors to maintain higher efficiency” and “[t]he scaling trend we see here is excellent and we achieve a 7.5-times speedup when using eight model replicas. This further validates the viability of large-scale data-parallel training” (Dryden et al. pg. 5 left Col. lines 2-5 and pg. 7 first full paragraph).
Regarding Claim 2,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. further teaches wherein the machine learning operation is an operation in deep learning (pg. 6 [0069]: “modifications of the input distribution 700 of the activation value calculations are performed so that the activation values have a zero mean. In this aspect of the disclosure, modifications of the weight and/or activation value calculations is performed by absorbing the mean value(μ) into a bias of the artificial neural network, for example, as shown in FIG. 7B” teaches determination of step size based on a predetermined probability distribution assuming the distribution has zero mean includes determining a modified distribution of values (corresponds to calculating a distribution of values) by absorbing the mean value(μ) into a bias of the artificial neural network (corresponds to a machine learning operation); pg. 2 [0032]: “In some artificial neural networks (ANNs), such as a deep convolutional network (DCN), quantization may be applied to activations of the normalization layer; weights, biases, and activations of the fully connected layer; and/or weights, biases, and activations of the convolution layer” teaches the artificial neural network can be a deep convolutional network, thus rendering the machine learning operation to be an operation in deep learning).
Zhou et al. further teaches and the quantization is performed on a basis of a distribution of gradients calculated by the machine learning operation based on the deep learning being based on the predetermined probability distribution (pg. 5 third to fourth full paragraphs:

    PNG
    media_image1.png
    320
    775
    media_image1.png
    Greyscale

teaches performing quantization of gradients on the basis that gradients being calculated in the backward pass of training in a convolutional neural network (corresponds to gradients calculated by the machine learning operation based on the deep learning; see pg. 2 second full paragraph, which teaches training a convolutional neural network) is based on a predetermined uniform distribution; Fig. 2 teaches a distribution of gradients that have been calculated).
LIN et al. and Zhou et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Zhou et al. to the disclosed invention of LIN et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage an extra noise function based on a uniform distribution because artificial noise is critical for achieving good performance in gradient quantization (Zhou et al. pg. 5 third to fourth full paragraphs).
Regarding Claim 3,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
Dryden et al. further teaches wherein the quantization is performed when a value obtained by machine learning in one apparatus is supplied to another apparatus in distributed learning in which machine learning is performed by a plurality of apparatuses in a distributed manner (Fig. 1 and caption: “Figure 1. The LBANN model- and data-parallel architecture. This shows two-way data parallelism via model replication and four-way model parallelism with distributed mini-batches in each replica. Within each model, the appropriate parameters of each mini-batch are fed to ranks, and these ranks implement training with distributed matrix operations. Once the mini-batch completes, corresponding ranks in each model communicate their parameter updates using peer-wise collective communication. This communication is quantized to reduce bandwidth requirements” and pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teach performing quantization when gradient updates (value obtained by learning) obtained from one apparatus is supplied to another apparatus in a distributed learning environment in which gradient calculation and model training (machine learning) are performed by a plurality of apparatuses; pg. 4 fourth full paragraph: “To overcome this, we implement our own allreduce operation on top of primitive MPI non-blocking send and receive calls. Our allreduce consists of two steps, a pairwise exchange-based reduce-scatter followed by a ring-based allgather as described in [18], and recommended for large messages. This results in a...bandwidth term, which is superior to recursive-doubling even at small numbers of processors, and ensures that a portion of the communication is nearest-neighbor” teaches communication between apparatus in a distributed environment; also see pg. 4 Section 4).
LIN et al., Zhou et al., and Dryden et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Dryden et al. to the disclosed invention of LIN et al. in view of Zhou et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “data parallelism can step in to allow continued scalability by doing separate matrix operations on fewer processors to maintain higher efficiency” and “[t]he scaling trend we see here is excellent and we achieve a 7.5-times speedup when using eight model replicas. This further validates the viability of large-scale data-parallel training” (Dryden et al. pg. 5 left Col. lines 2-5 and pg. 7 first full paragraph).
Regarding Claim 4,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. further teaches wherein the predetermined probability distribution is a distribution that forms a left-right symmetrical graph with a peak value as a central axis of symmetry (Fig. 4 teaches a predetermined probability distribution that forms a left-right symmetric graph with peak value as the central axis of symmetry).
Regarding Claim 5,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. further teaches wherein the predetermined probability distribution is a distribution for which one mean or one median is calculable (pg. 6 [0069]: “FIG. 7A illustrates an input distribution 700 of activation values for an exemplary deep convolutional network also having a mean (μ) and a variance” teaches the input (predetermined) probability distribution is a distribution for which the mean can be calculated, or is calculable).
Regarding Claim 6,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. further teaches wherein the predetermined probability distribution is any one of a normalized distribution, a Laplace distribution, a Cauchy distribution, and a Student-T distribution (pg. 6 [0064]: “Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gannna distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance. Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). In one configuration, both the weights and activation values are assumed to have Gaussian distributions, however, other distributions are also contemplated” teaches the predetermined probability distribution can be Gaussian (corresponds to normal) or Laplacian distribution).
Regarding Claim 7,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. further teaches wherein a constant of a function of the predetermined probability distribution is obtained from the values calculated by the machine learning operation (pg. 6 [0069]: “FIG. 7A illustrates an input distribution 700 of activation values for an exemplary deep convolutional network also having a mean (μ) and a variance” teaches obtaining mean and variance (correspond to constants) of a function of the input distribution (predetermined probability distribution, also see Fig. 4); pg. 6 [0064] teaches the mean and variance of the distribution are obtained from calculated weights or activations, which are values calculated by the artificial neural network (corresponds to machine learning operation)).
Regarding Claim 9,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 2.
Zhou et al. further teaches wherein the quantization is performed for a gradient itself as the quantization target (pg. 4-5 Section 2.5 teaches quantization is performed for the gradient as a quantization target).
LIN et al. and Zhou et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Zhou et al. to the disclosed invention of LIN et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage an extra noise function based on a uniform distribution because artificial noise is critical for achieving good performance in gradient quantization (Zhou et al. pg. 5 third to fourth full paragraphs).
Dryden et al. further teaches or for a cumulative gradient obtained by cumulatively adding the gradients as the quantization target (pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teaches obtaining cumulative gradient by summing the gradients as a quantization target).
LIN et al., Zhou et al., and Dryden et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Dryden et al. to the disclosed invention of LIN et al. in view of Zhou et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “data parallelism can step in to allow continued scalability by doing separate matrix operations on fewer processors to maintain higher efficiency” and “[t]he scaling trend we see here is excellent and we achieve a 7.5-times speedup when using eight model replicas. This further validates the viability of large-scale data-parallel training” (Dryden et al. pg. 5 left Col. lines 2-5 and pg. 7 first full paragraph).
Regarding Claim 10,
Claim Interpretation: claim 10 is a method claim that contains the contingent limitation “assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution”, which is not given patentable weight because “[t]he broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.” See MPEP 2111.04. The prior art rejection to claim 10 provides mapping for the contingent limitation as a way to further explain the mapping of the claim. It is maintained that the contingent limitation in the method claim is not given patentable weight.
LIN et al. teaches An information processing method comprising: performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (pg. 6 [0064]: “Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance. Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). In one configuration, both the weights and activation values are assumed to have Gaussian distributions, however, other distributions are also contemplated” teaches performing quantization to weights, bias, and activation values in a neural network assuming determination of step size based on a predetermined probability distribution (including Gaussian, Laplacian, and Gamma distributions) wherein the determination of step size is calculated with the assumption that the distribution has zero mean; pg. 6 [0069]: “modifications of the input distribution 700 of the activation value calculations are performed so that the activation values have a zero mean. In this aspect of the disclosure, modifications of the weight and/or activation value calculations is performed by absorbing the mean value(μ) into a bias of the artificial neural network, for example, as shown in FIG. 7B” teaches determination of step size based on a predetermined probability distribution assuming the distribution has zero mean includes determining a modified distribution of values (corresponds to calculating a distribution of values) by absorbing the mean value(μ) into a bias of the artificial neural network (corresponds to a machine learning operation); also see pg. 5 [0063], which teaches input distributions can be in the form of probability distributions).
LIN et al. does not appear to explicitly teach wherein the predetermined probability distribution is determined based on a quantization target.
However, Zhou et al. teaches wherein the predetermined probability distribution is determined based on a quantization target (pg. 5 third to fourth full paragraphs:

    PNG
    media_image1.png
    320
    775
    media_image1.png
    Greyscale

teaches determining to incorporate a uniform distribution (corresponds to a predetermined probability distribution) based on the quantization target of gradient quantization).
LIN et al. and Zhou et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Zhou et al. to the disclosed invention of LIN et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage an extra noise function based on a uniform distribution because artificial noise is critical for achieving good performance in gradient quantization (Zhou et al. pg. 5 third to fourth full paragraphs).
LIN et al. in view of Zhou et al. does not appear to explicitly teach transmitting a result of the performed quantization.
However, Dryden et al. teaches transmitting a result of the performed quantization (pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teaches communicating (transmitting) result of performed quantization).
LIN et al., Zhou et al., and Dryden et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Dryden et al. to the disclosed invention of LIN et al. in view of Zhou et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “data parallelism can step in to allow continued scalability by doing separate matrix operations on fewer processors to maintain higher efficiency” and “[t]he scaling trend we see here is excellent and we achieve a 7.5-times speedup when using eight model replicas. This further validates the viability of large-scale data-parallel training” (Dryden et al. pg. 5 left Col. lines 2-5 and pg. 7 first full paragraph).
Regarding Claim 11,
LIN et al. teaches A non-transitory computer-readable storage medium having embodied thereon a program, which when executed by a computer causes the computer to execute (see pg. 8 [0090])
a method, the method comprising: performing quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (pg. 6 [0064]: “Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gamma distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance. Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). In one configuration, both the weights and activation values are assumed to have Gaussian distributions, however, other distributions are also contemplated” teaches performing quantization to weights, bias, and activation values in a neural network assuming determination of step size based on a predetermined probability distribution (including Gaussian, Laplacian, and Gamma distributions) wherein the determination of step size is calculated with the assumption that the distribution has zero mean; pg. 6 [0069]: “modifications of the input distribution 700 of the activation value calculations are performed so that the activation values have a zero mean. In this aspect of the disclosure, modifications of the weight and/or activation value calculations is performed by absorbing the mean value(μ) into a bias of the artificial neural network, for example, as shown in FIG. 7B” teaches determination of step size based on a predetermined probability distribution assuming the distribution has zero mean includes determining a modified distribution of values (corresponds to calculating a distribution of values) by absorbing the mean value(μ) into a bias of the artificial neural network (corresponds to a machine learning operation); also see pg. 5 [0063], which teaches input distributions can be in the form of probability distributions).
LIN et al. does not appear to explicitly teach wherein the predetermined probability distribution is determined based on a quantization target.
However, Zhou et al. teaches wherein the predetermined probability distribution is determined based on a quantization target (pg. 5 third to fourth full paragraphs:

    PNG
    media_image1.png
    320
    775
    media_image1.png
    Greyscale

teaches determining to incorporate a uniform distribution (corresponds to a predetermined probability distribution) based on the quantization target of gradient quantization).
LIN et al. and Zhou et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Zhou et al. to the disclosed invention of LIN et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage an extra noise function based on a uniform distribution because artificial noise is critical for achieving good performance in gradient quantization (Zhou et al. pg. 5 third to fourth full paragraphs).
LIN et al. in view of Zhou et al. does not appear to explicitly teach transmitting a result of the performed quantization.
However, Dryden et al. teaches transmitting a result of the performed quantization (pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teaches communicating (transmitting) result of performed quantization).
LIN et al., Zhou et al., and Dryden et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Dryden et al. to the disclosed invention of LIN et al. in view of Zhou et al.
One of ordinary skill in the arts would have been motivated to make this modification because of the following: “data parallelism can step in to allow continued scalability by doing separate matrix operations on fewer processors to maintain higher efficiency” and “[t]he scaling trend we see here is excellent and we achieve a 7.5-times speedup when using eight model replicas. This further validates the viability of large-scale data-parallel training” (Dryden et al. pg. 5 left Col. lines 2-5 and pg. 7 first full paragraph).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over LIN et al. (US 2016/0328646 A1) in view of Zhou et al. (“DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH GRADIENTS”) in view of Dryden et al. (“Communication Quantization for Data-parallel Training of Deep Neural Networks”) and further in view of Li et al. (“Ternary weight networks”).
Regarding Claim 8,
LIN et al. in view of Zhou et al. in view of Dryden et al. teaches the information processing apparatus according to claim 1.
LIN et al. in view of Zhou et al. in view of Dryden et al. does not appear to explicitly teach wherein a ratio of quantization is set, a value in the predetermined probability distribution corresponding to the ratio is set as a threshold value of the values calculated by the machine learning operation, and at least one of a value equal to or larger than the threshold value or equal to or smaller than the threshold value is extracted.
However, Li et al. teaches wherein a ratio of quantization is set, a value in the predetermined probability distribution corresponding to the ratio is set as a threshold value of the values calculated by the machine learning operation, and at least one of a value equal to or larger than the threshold value or equal to or smaller than the threshold value is extracted (pg. 3 second full paragraph: 
    PNG
    media_image2.png
    173
    781
    media_image2.png
    Greyscale
teaches determining 
    PNG
    media_image3.png
    25
    73
    media_image3.png
    Greyscale
, which corresponds to setting a ratio of quantization since 
    PNG
    media_image4.png
    22
    31
    media_image4.png
    Greyscale
is the approximation of 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
; the value for 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
 is set as the threshold value corresponding to the ratio and 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
 is a value in a uniform distribution (corresponds to predetermined probability distribution) wherein 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
 is set as the threshold value of neural network weight values (correspond to values calculated by the machine learning operation); pg. 2 Section 2.2:


    PNG
    media_image6.png
    178
    764
    media_image6.png
    Greyscale
teaches extracting a value larger than the threshold 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
 and a value equal to or smaller than the threshold 
    PNG
    media_image5.png
    22
    23
    media_image5.png
    Greyscale
).
LIN et al., Zhou et al., Dryden et al., and Li et al. are analogous art to the claimed invention because they are directed to analyzing neural network operations.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the above limitation(s) as taught by Li et al. to the disclosed invention of LIN et al. in view of Zhou et al. in view of Dryden et al.
One of ordinary skill in the arts would have been motivated to make this modification to leverage “an approximated solution with a simple but accurate ternary function” to the “ternary weight networks optimization problem” (Li et al. pg. 4 Section 4).

Response to Arguments
Regarding documents listed on the IDS submitted on 11/29/2021 that have not been considered, Applicant asserts “Applicant respectfully submits that the references were listed in the IDS because they were included in the European Search Report, and so Applicant has provided their best understanding of the relevance of the listed references, which satisfies the requirements for a concise explanation of relevance” (Remarks, pg. 7).
Examiner’s Response:
MPEP 609.04(a))(III) provides the following:
“Each information disclosure statement must further include a concise explanation of the relevance, as it is presently understood by the individual designated in 37 CFR 1.56(c)  most knowledgeable about the content of the information listed that is not in the English language...Where the information listed is not in the English language, but was cited in a search report or other action by a foreign patent office in a counterpart foreign application, the requirement for a concise explanation of relevance can be satisfied by submitting an English-language version of the search report or action which indicates the degree of relevance found by the foreign office. This may be an explanation of which portion of the reference is particularly relevant, to which claims it applies, or merely an "X", "Y", or "A" indication on a search report.”

In accordance with MPEP 609.04(a))(III), in order to fulfill the requirement for a concise explanation of relevance through the inclusion of cited references in a search report, the search report should indicate “the degree of relevance found by the foreign office.” However, in the copy of the European Search Report (October 14, 2021) submitted by Applicant, the three references listed on the IDS submitted on 11/29/2021 that have not been considered are merely listed on the search report and no “degree of relevance found by the foreign office” has been disclosed. Moreover, Applicant’s assertion that the references “were included in the European Search Report” does not fulfill the requirement for a concise explanation of relevance because the assertion does not explain why the references are relevant. Therefore, the requirement for a concise explanation of relevance has not been satisfied, and the three lined-through references listed on the IDS submitted on 11/29/2021 are not considered. 

Applicant's arguments filed on 07/12/2022 with respect to the 35 U.S.C. 101 abstract idea rejection to claims 1-11 have been fully considered but they are not persuasive. 
Applicant asserts “the present recitations of at least amended independent claim 1 amount to significantly more than merely an abstract idea such as a mental steps or processes, particularly in view of the specifically recited technical solution directed toward a particular configuration of the various technical elements set forth by the particular claim recitations...That is, Applicant submits that the presently recited claims, taken as a whole, recite significant claim elements that cannot be satisfied by mental steps or processes, are not merely abstract ideas as defined by case law, and the claimed subject matter relates to a specific improvement in a computer-related technology including technical structural elements. Even assuming, arguendo, that the claims are determined to broadly relate to an abstract idea, which Applicant does not concede, there are sufficient recitations of claim elements that amount to significantly more than an abstract idea itself” (Remarks, pg. 10-11) and “Applicant submits that the particular elements recited by independent claim 1 relate to an improvement at the time of invention to computer functionality itself, in that they are directed to a specific implementation of the recited solution to a problem in the relevant field...Applicant submits that the elements recited by independent claim 1 relate to an improvement at the time of invention to computer functionality itself, and so the claims are directed to subject matter that clearly provides significantly more than any underlying abstract idea” (Remarks, pg. 11-12).
Examiner’s Response:
The Examiner respectfully disagrees. Applicant has made general assertions that claim 1 recites claim elements that are not directed to abstract idea and that the claimed subject matter relates to an improvement in computer-related technology. However, Applicant has not made specific arguments regarding why certain claim elements do not amount to an abstract idea or which additional element(s) in the claim reflect an improvement.
MPEP 2106.05(a) provides the following, “It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. See the discussion of Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II, below. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception. See MPEP § 2106.04(d) (discussing Finjan, Inc. v. Blue Coat Sys., Inc., 879 F.3d 1299, 1303-04, 125 USPQ2d 1282, 1285-87 (Fed. Cir. 2018))”. 
Amended claim 1 recites perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (corresponds to mathematical relationships and mathematical calculations because quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a smaller set, and the limitation performs quantization based on other mathematical relationships and/or calculations, including probability distribution and calculations associated with machine learning operation); wherein the predetermined probability distribution is determined based on a quantization target (corresponds to mathematical relationships and mathematical calculations of determining a probability distribution). Since these limitations are directed to a judicial exception, they cannot provide any alleged improvement.
In amended claim 1, the additional element of “an information processing apparatus comprising: circuitry configured to”, as drafted, amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Moreover, the recitation of “transmit a result of the performed quantization” amounts to mere data outputting by transmitting an output, which is an insignificant extra-solution activity that does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. See MPEP 2106.04(d). 
Furthermore, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Mere instructions to apply an exception cannot provide an inventive concept. Moreover, the recitation of “transmit a result of the performed quantization” amounts to transmitting data, which is an insignificant extra-solution activity that is well-understood, routine, and conventional. See MPEP 2106.05(d) (“The courts have recognized the following computer functions as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. i. Receiving or transmitting data over a network”).
In other words, the additional elements recited in claim 1 are directed to mere instructions to implement an abstract idea on a computer and insignificant extra-solution activity that is well-understood, routine, and conventional. Therefore, claim 1 does not recite additional element(s) that can provide any alleged improvement. 
Applicant relies on the arguments above regarding independent claims 10-11 and dependent claims 2-9, therefore the response above is applicable to those claims. 

Applicant's arguments filed on 07/12/2022 asserting that the cited references do not teach amended claim 1 have been fully considered but they are not persuasive. 
Applicant asserts “the asserted references, alone or in combination, lack all of the claimed elements of independent claim 1”, including "to perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution, and transmit a result of the performed quantization, wherein the predetermined probability distribution is determined based on a quantization target" (Remarks, pg. 16-18).
Examiner’s Response:
The Examiner respectfully disagrees. Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.
As discussed in the current rejection above, LIN et al. (US 2016/0328646 A1) in view of Zhou et al. (“DOREFA-NET: TRAINING LOW BITWIDTH CONVOLUTIONAL NEURAL NETWORKS WITH LOW BITWIDTH GRADIENTS”) and further in view of Dryden et al. (“Communication Quantization for Data-parallel Training of Deep Neural Networks”) teaches the limitations of claim 1. 
LIN et al. teaches perform quantization assuming that a distribution of values calculated by a machine learning operation is based on a predetermined probability distribution (pg. 6 [0064]: “Application of quantization to the weights, biases, and activation values in artificial neural networks includes the determination of a step size. For example, the step sizes of a symmetric uniform quantizer for Gaussian, Laplacian, and Gannna distributions may be calculated with a deterministic function of the standard deviation of the input distribution, if it is assumed that the distributions have zero mean and unit variance. Accordingly, aspects of the present disclosure are directed toward modifications of the weight and/or activation value calculations so that the distributions have a zero mean (e.g., an approximately zero mean). In one configuration, both the weights and activation values are assumed to have Gaussian distributions, however, other distributions are also contemplated” teaches performing quantization to weights, bias, and activation values in a neural network assuming determination of step size based on a predetermined probability distribution (including Gaussian, Laplacian, and Gannna distributions) wherein the determination of step size is calculated with the assumption that the distribution has zero mean; pg. 6 [0069]: “modifications of the input distribution 700 of the activation value calculations are performed so that the activation values have a zero mean. In this aspect of the disclosure, modifications of the weight and/or activation value calculations is performed by absorbing the mean value(μ) into a bias of the artificial neural network, for example, as shown in FIG. 7B” teaches determination of step size based on a predetermined probability distribution assuming the distribution has zero mean includes determining a modified distribution of values (corresponds to calculating a distribution of values) by absorbing the mean value(μ) into a bias of the artificial neural network (corresponds to a machine learning operation); also see pg. 5 [0063], which teaches input distributions can be in the form of probability distributions).
Zhou et al. teaches wherein the predetermined probability distribution is determined based on a quantization target (pg. 5 third to fourth full paragraphs:

    PNG
    media_image1.png
    320
    775
    media_image1.png
    Greyscale

teaches determining to incorporate a uniform distribution (corresponds to a predetermined probability distribution) based on the quantization target of gradient quantization).
Dryden et al. teaches transmit a result of the performed quantization (pg. 4 fifth full paragraph: “Our data-parallel communication thus goes as follows: gradient updates are quantized and then split up and scattered. These slices are then unquantized, summed, and the result is quantized again. In the case of threshold quantization, this uses the same τ parameter; adaptive quantization uses the same π. The quantized reductions are then distributed using the allgather, and finally every model unquantizes the results” teaches communicating (transmitting) result of performed quantization).
Please see the current rejection for more information. Applicant relies on the arguments above regarding independent claims 10-11 and dependent claims 2-9, therefore the response above is applicable to those claims. 



Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484. The examiner can normally be reached Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YING YU CHEN/Examiner, Art Unit 2125