DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application, filed on 01/24/2018, claims foreign priority to JP2017-011699 filed in Japan on 01/25/2017.
Claims 1-2 are pending and have been examined.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Applicant cannot rely upon the certified copy of the foreign priority application to overcome the prior art rejections presented in this Office Action because a translation of said application has not been made of record in accordance with 37 CFR 1.55. See MPEP §§ 215 and 216. 
In particular, Applicant is reminded of requirements set forth in 37 C.F.R. 1.55(g)(3)-(4) Claim for foreign priority:
“(3) An English language translation of a non-English language foreign application is not required except: 
(i) When the application is involved in an interference (see § 41.202 of this chapter) or derivation (see part 42  of this chapter) proceeding; 
(ii) When necessary to overcome the date of a reference relied upon by the examiner; or 
(iii) When specifically required by the examiner. 
(4) If an English language translation of a non-English language foreign application is required, it must be filed together with a statement that the translation of the certified copy is accurate” (emphasis added).


Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 06/03/2018 and 12/05/2019.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
However, Document No. 3 under “Foreign Patent Documents” and Document No. 3 under “Non-Patent Literature Documents” on IDS submitted on 06/03/2018 are documents not in the English language, and a sufficient concise explanation of the relevance has not been provided for each of the documents. Therefore, these documents are not considered. See MPEP 609.04(a)(III) (“Each information disclosure statement must further include a concise explanation of the relevance, as it is presently understood by the individual designated in 37 CFR 1.56(c) most knowledgeable about the content of the information listed that is not in the English language. The concise explanation may be either separate from the specification or part of the specification. If the concise explanation is part of the specification, the IDS listing should include the page(s) or line(s) numbers where the concise explanation is located in the specification. The requirement for a concise explanation of relevance is limited to information that is not in the English language”).

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are (generic place holders in bold):
Claim 1:
A distributed deep learning device
a communicator that exchanges the quantized gradient by communication with another learning device (Specification [0017] reiterates the function, but does not provide description of the structure)
a gradient calculator that calculates a gradient of a current parameter; (Specification [0018] reiterates the function, but does not provide description of the structure)
a quantization remainder adder that adds, to the gradient obtained by the gradient calculator, a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1; (Specification [0019] reiterates the function, but does not provide description of the structure)
a gradient quantizer that quantizes the gradient obtained by adding the remainder after the predetermined multiplication by the quantization remainder adder; (Specification [0020] reiterates the function, but does not provide description of the structure)
a gradient restorer that restores a quantized gradient received by the communicator to a gradient of an original accuracy; (Specification [0021] reiterates the function, but does not provide description of the structure)
a quantization remainder storage that stores a remainder at the time of quantizing the gradient in the gradient quantizer; (Specification [0022] reiterates the function, but does not provide description of the structure)
a gradient aggregator that aggregates gradients collected by the communicator and calculates an aggregated gradient (Specification [0023] reiterates the function, but does not provide description of the structure)
a parameter updater that updates the parameter on the basis of the gradient aggregated by the gradient aggregator. (Specification [0024] reiterates the function, but does not provide description of the structure)
Claim 2:
A distributed deep learning system that exchanges a quantized gradient among one or more master nodes and one or more slave nodes and performs deep learning in a distributed manner, (the corresponding structure is described Specification [0011] & [0030], which reiterates the claim language without clearly identifying the structure. Specification [0030]: “distributed deep learning system may include one master node and one or more slave nodes” (emphasis added) describes that the “distributed deep learning system” may include the various nodes but do not clearly and definitively specify that the nodes provide structure for the system)
a communicator that exchanges the quantized gradient by communication with one of the slave nodes; (Specification [0017] reiterates the function, but does not provide description of the structure)
a gradient calculator that calculates a gradient of a current parameter; (Specification [0018] reiterates the function, but does not provide description of the structure)
a quantization remainder adder that adds, to the gradient obtained by the gradient calculator, a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1; (Specification [0019] reiterates the function, but does not provide description of the structure)
a gradient quantizer that quantizes the gradient obtained by adding the remainder after the predetermined multiplication by the quantization remainder adder; (Specification [0020] reiterates the function, but does not provide description of the structure)
a gradient restorer
a quantization remainder storage that stores a remainder at the time of quantizing the gradient in the gradient quantizer; (Specification [0022] reiterates the function, but does not provide description of the structure)
a gradient aggregator that aggregates gradients collected by the communicator and calculates an aggregated gradient; (Specification [0023] reiterates the function, but does not provide description of the structure)
an aggregate gradient remainder adder that adds, to the gradient aggregated in the gradient aggregator, a value obtained by multiplying an aggregate gradient remainder at the time of quantizing a previous aggregate gradient by a predetermined multiplying factor larger than 0 and smaller than 1; (Specification [0030] identifies the “aggregate gradient remainder adder”, but does not provide description of the structure)
an aggregate gradient quantizer that performs quantization on the aggregate gradient added with the remainder in the aggregate gradient remainder adder; (Specification [0030] identifies the “aggregate gradient remainder adder”, but does not provide description of the structure)
an aggregate gradient remainder storage that stores a remainder at the time of quantizing in the aggregate gradient quantizer; (Specification [0030] identifies the “aggregate gradient remainder adder”, but does not provide description of the structure)
a parameter updater that updates the parameter on the basis of the gradient aggregated by the gradient aggregator (Specification [0024] reiterates the function, but does not provide description of the structure)
a communicator
a gradient calculator that calculates a gradient of a current parameter; (Specification [0018] reiterates the function, but does not provide description of the structure)
a quantization remainder adder that adds, to the gradient obtained by the gradient calculator, a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1; (Specification [0019] reiterates the function, but does not provide description of the structure)
a gradient quantizer that quantizes the gradient obtained by adding the remainder after the predetermined multiplication by the quantization remainder adder; (Specification [0020] reiterates the function, but does not provide description of the structure)
a gradient restorer that restores the quantized aggregate gradient received by the communicator to a gradient of an original accuracy; (Specification [0021] reiterates the function, but does not provide description of the structure)
a quantization remainder storage that stores a remainder at the time of quantizing the gradient in the gradient quantizer; (Specification [0022] reiterates the function, but does not provide description of the structure)
a parameter updater that updates the parameter on the basis of the aggregate gradient restored by the gradient restorer. (Specification [0024] reiterates the function, but does not provide description of the structure)

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-2 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Each of the limitations in claims 1 and 2 that contains the following generic placeholders:
communicator 
gradient calculator 
quantization remainder adder 
gradient quantizer 
gradient restorer 
quantization remainder storage
gradient aggregator 
parameter updater 
aggregate gradient remainder adder 
aggregate gradient quantizer 
aggregate gradient remainder storage 
distributed deep learning system
 invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 7 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, each of claims 1 and 2 is rejected under 35 U.S.C. 112(a) for lack of written description. See MPEP 2181 (IV) (“the means- (or step-) plus- function claim must still be analyzed to determine whether there exists corresponding adequate support for such claim limitation under 35 U.S.C. 112(a) or pre-AIA  35 U.S.C. 112, first paragraph. In considering whether there is 35 U.S.C. 112(a)  or pre-AIA  35 U.S.C. 112, first paragraph support for the claim limitation, the examiner must consider whether the specification describes the claimed invention in sufficient detail to establish that the inventor or joint inventor(s) had possession of the claimed invention as of the application's filing date”).

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claim 1 recites the limitation "the time" in line 8.  There is insufficient antecedent basis for this limitation in the claim. It is recommended that "the time" is amended to "a time” (interpretation for examination purposes).
Claim 1 recites the limitation "the parameter" in the last line.  There is insufficient antecedent basis for this limitation in the claim. It is recommended that "the parameter" is amended to "a parameter" (interpretation for examination purposes).
Claim 2 recites the limitation "the time" in line 9.  There is insufficient antecedent basis for this limitation in the claim. It is recommended that "the time" is amended to "a time” (interpretation for examination purposes).
Claim 2 recites the limitation "the parameter" in line 27.  There is insufficient antecedent basis for this limitation in the claim. It is recommended that "the parameter" is amended to "a parameter" (interpretation for examination purposes).

Each of the limitations in claims 1 and 2 that contains the following generic placeholders:
communicator 
gradient calculator 
quantization remainder adder 
gradient quantizer 
gradient restorer 
quantization remainder storage 
gradient aggregator 
parameter updater 
aggregate gradient remainder adder 
aggregate gradient quantizer
aggregate gradient remainder storage 
distributed deep learning system
 invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. In particular, the corresponding description found in the Specification (see Section 7 of the Office Action) of each of the generic placeholders listed above substantially reiterates the claim language and does not provide description of the structure that performs the corresponding functions. Therefore, each of claims 1 and 2 is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. For examination purposes, it is interpreted that each of the functions for which 35 U.S.C. 112(f) is invoked (but without sufficient description of structure in the Specification) is performed by a general purpose computer. 
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 

(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al. (US 2019/0171935 A1) in view of Alistarh et al. (US 2018/0075347 A1) and further in view of LANGFORD et al. (US 2017/0308789 A1).
Regarding Claim 1,
Agrawal et al. teaches A distributed deep learning device that exchanges a quantized gradient with at least one or more learning devices and performs deep learning in a distributed manner, the pg. 8 [0074]: “At block 718, the compressed current residue vectors are exchanged among each learner of the system (e.g., by learner processing systems 610, 620, 630, 640) and/or transmitted to a parameter server” and pg. 7 [0068] “when multi-workers (e.g., plurality of learner processing systems 610, 620, 630, 640) train one neural network, each worker computes the subset of training data and communication among workers are required; this is usually called data-parallelism. To save communication bandwidth, each worker sends partial value of gradients and keep the reminding residues locally” teach a distributed system in which a deep learning worker (device) exchanges a compressed (quantized) gradient with another worker and performs training (learning) of a neural network in a distributed manner; pg. 5 [0058] and pg. 7 [0069] teach a computer with a CPU, GPU, a memory, and a storage such as a hard disk drive; also see pg. 9 [0082]):
a communicator that exchanges the quantized gradient by communication with another learning device (pg. 8 [0073]: “the compressed current residue vector that is generated at block 716 is generated based, at least in part, on dividing the residual gradient weights of the current residue vector into a plurality of bins, of a uniform size, and then quantizing a subset of the residual gradient weights” and pg. 8 [0074]: “At block 718, the compressed current residue vectors are exchanged among each learner of the system (e.g., by learner processing systems 610, 620, 630, 640) and/or transmitted to a parameter server” teach exchanging compressed residue vectors containing quantized residual gradient weights by communicating with other learners/workers; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a gradient calculator that calculates a gradient of a current parameter (pg. 7-8 [0071]: “At block 714, a current gradient vector is obtained on a layer-by-layer basis by each learner of the system (e.g., by learner processing systems 610, 620, 630, 640)...the current gradient vector for each given neural network layer includes gradient weights of parameters of the given neural network layer...the gradient weights are calculated from a mini-batch of training data” teaches calculating gradient pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a quantization remainder adder that adds, to the gradient obtained by the gradient calculator, a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0... (pg. 8 [0077]: “At block 802, a scaled current residue vector is generated at each learner of the system. The scaled current residue vector includes scaled residual gradient weights for the given mini-batch. The scaled current residue vector is generated by multiplying the current gradient vector by a scaling parameter and then summing the prior residue vector with the multiplied gradient vector” teaches multiplying a scaling parameter (multiplying factor) and the current gradient vector (a remainder at the time) and adding the result of the multiplication to the prior residue vector; pg. 8 [0072]: “in some embodiments of the present invention, the summation of the current gradient vector and a prior residue vector results in a current residue vector being obtained that is the same as the current gradient vector” teaches that the current gradient vector can be the same as the current residue vector, thus rendering the current gradient vector can be represented as a “residue”, or a remainder at the time; pg. 5 [0055] “Various suitable scale factors may be used in accordance with one or more embodiments of the present invention. For example, in some embodiments of the present invention the scale-factor ranges from about 1.5 to about 3. In some embodiments of the present invention, the scale factor is 2” teaches that the scaling factor is larger than 0; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a gradient quantizer that quantizes the gradient obtained by adding the remainder after the predetermined multiplication by the quantization remainder adder (pg. 8 [0079]: “At block 810, upon determining that a residual gradient weight that has a corresponding scaled residual gradient weight exceeds the local maximum of the given bin, a quantizing value for the give residual gradient weight is generated and the current residue vector is updated” teaches quantizing the gradient in the scaled pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a gradient restorer that restores a quantized gradient received by the communicator to a gradient... (pg. 8 [0074]: “At block 720, the compressed current reduce vectors are decompressed at each learner of the plurality of learns (e.g., by learner processing systems 610, 620, 630, 640)” teaches decompressing (restoring) compressed (quantized) gradient received by the computer; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a quantization remainder storage that stores a remainder at the time of quantizing the gradient in the gradient quantizer (pg. 8 [0072]: “the compressed current residue vector is a layer-wise or chunk-wise compressed current residue vector. In some embodiments of the present invention, the current residue vector includes residual gradient weights for a given layer of a mini-batch” teaches the current residue vector stores the residual gradient weights; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system);
a gradient aggregator that aggregates gradients collected by the communicator and calculates an aggregated gradient (pg. 7 [0068]: “For example, in some embodiments of the present invention, a residue that is computed for each mini-batch by summing a previous residue and a latest gradient value obtained from backpropagation. If the sum of its previous residue plus its latest gradient, with a scale-factor, exceeds the maximum in the bin, those additional residues are included in the set of values to be sent and/or centrally updated at a server, such as a parameter server” teaches summing (aggregating) gradients collected by the worker computer; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system); and
pg. 7 [0068]: “For example, in some embodiments of the present invention, a residue that is computed for each mini-batch by summing a previous residue and a latest gradient value obtained from backpropagation. If the sum of its previous residue plus its latest gradient, with a scale-factor, exceeds the maximum in the bin, those additional residues are included in the set of values to be sent and/or centrally updated at a server, such as a parameter server” teaches summing (aggregating) gradients collected by the worker computer and update the parameters in the server; pg. 5 [0058] and pg. 7 [0069] teach a computer to implement various functions of the system).
Agrawal et al. does not appear to explicitly teach a gradient restorer that restores a quantized gradient received by the communicator to a gradient of an original accuracy.
However, Alistarh et al. teaches a gradient restorer that restores a quantized gradient received by the communicator to a gradient of an original accuracy (pg. 2 [0017]: “In some examples a loss-less integer encoding scheme is applied to the output of the lossy compression process. This further compresses the neural network data. A loss-less integer encoding scheme is a way of compressing a plurality of integers in such a manner that a decoding process recovers the complete information” and pg. 3 [0030]: “The decoder 110 acts to decode compressed stochastic gradients 104 received from peers. The processor has functionality to update the local copy of the parameter vector 106 in the light of stochastic gradients received from the peers and available at the computation node itself” teach decoding (restoring) a compressed (quantized) gradients received from peers (communicator, or computer) in which the encoding process can be a loss-less encoding scheme such that the corresponding decoding process recovers the complete information, which corresponds to restoring to a gradient of original accuracy since the complete information is recovered; Fig. 5 teaches a computer to implement various functions of the system).
 are analogous art to the claimed invention because they are directed to compression or quantization of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a gradient restorer that restores a quantized gradient received by the communicator to a gradient of an original accuracy as taught by Alistarh et al. to the disclosed invention of Agrawal et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to leverage “a decoding process” that “recovers the complete information” in view of the corresponding loss-less encoding process “is mathematically shown to give further improvements in performance” (Alistarh et al. pg. 2 [0017] & pg. 6 [0057]).
Agrawal et al. in view of Alistarh et al. does not appear to explicitly teach a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1.
However, LANGFORD et al. teaches a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1 pg. 12 [0105]:
    PNG
    media_image1.png
    423
    515
    media_image1.png
    Greyscale
teaches obtaining a value by multiplying M (gradients) by a predetermined factor 1/K in which K>1, thus rendering 1/K is larger than 0 and smaller than 1; the M (gradients) can be gradients remaining (remainder) at the time of quantizing a previous gradient since the updating algorithm runs multiple computation iterations, see Fig. 5 and pg. 8 [0059]: “For instance, model updates to the DNN 304, which involve the exchange of data between processing units, are used for the computation iterations of the algorithm 220”).
Agrawal et al., Alistarh et al., and LANGFORD et al. are analogous art to the claimed invention because they are directed to compression or quantization of neural networks.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate a value obtained by multiplying a remainder at the time of quantizing a previous gradient by a predetermined multiplying factor larger than 0 and smaller than 1 as taught by LANGFORD et al. to the disclosed invention of Agrawal et al. in view of Alistarh et al.
One of ordinary skill in the arts would have been motivated to make this modification in order to determine “the time required to perform the data exchanges...is approximately independent of the number of nodes K, when simultaneous transfers are used. This advantageously permits increasing the 

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Chilimbi et al. (US 2017/0193361 A1) teaches a neural network training tool selects from a plurality of parallelizing techniques and selects from a plurality of forward-propagation computation techniques.
He et al. (“EFFECTIVE QUANTIZATION METHODS FOR RECURRENT NEURAL NETWORKS”) teaches methods to quantize weights deterministically and adaptively to balanced distributions.

A prior art rejection has not been applied to claim 2 because none of the prior arts, either alone or in combination, discloses at least the following limitations:
“A distributed deep learning system that exchanges a quantized gradient among one or more master nodes and one or more slave nodes and performs deep learning in a distributed manner, wherein each of the master nodes comprises:...each of the slave nodes comprises: a communicator that transmits a quantized gradient to one of the master nodes and receives the aggregate gradient quantized in the aggregate gradient quantizer from the master node;...”



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YING YU CHEN whose telephone number is (571)270-1484.  The examiner can normally be reached on Monday-Friday 7:30 am-5:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/YING YU CHEN/               Examiner, Art Unit 2125