DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The present application is being examined under the claims filed 02/22/2019. 
Claims 1-15 are pending.

Claim Interpretation
Claim Rejections - 35 USC § 101 – Abstract Idea
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-15 are rejected under 35 U.S.C. 101 for containing an abstract idea without significantly more. 

Regarding Claim 1:
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is a process.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“determining differences between the previous values and current values of the set of parameters;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. The additional elements:
“A computer-implemented method, comprising:” – This limitation is directed to merely using a generic computer as a tool (see MPEP 2106.04(d)).
“receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;” – This limitation is directed to insignificant extra-solution activity (see MPEP 2106.05(g)). 
“and updating the current values based on the feedback data and the differences to obtain updated values of the set of the parameters.” – This limitation is directed to mere data gathering which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)). 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
	No, there are no additional elements that amount to significantly more than the judicial exception.
“receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;” – This limitation is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (see MPEP 2106.05(d) II.).

Regarding Claim 2:
Claim 2 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the receiving limitation which was directed to well-understood, routine, conventional activity. The additional limitation “wherein the feedback data indicate significant trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters” is directed to field of use (see MPEP 2106.05(h)) as it is merely limiting the field of the feedback data. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 3:
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the updating limitation from claim 1 which was directed to insignificant extra-solution activity. The claim cites additional abstract ideas:
“determining coefficients of a transformation based on the significant trends of change;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
“and determining differential amounts between the current values and the updated values by applying the transformation on the differences.” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 4:
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the determining coefficients of a transformation limitation from claim 3 which was directed to the abstract idea of a mental process. The additional limitation:
“wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the significant trends of change are represented by a gradient of the optimization objective with respect to the previous values of the set of parameters.” – This limitation is directed to the field of use (see MPEP 2106.05(h)) as it merely limiting the fields of the transformation, coefficients, and trends of change. 
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 5:
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the determining coefficients of a transformation limitation from claim 3 which was directed to the abstract idea of a mental process. The additional limitation:
“computing a tensor product of the gradient as unbiased estimates of the linear rates of change.” – This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2)) as it involves calculations of tensor products. 
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 6:
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the determining coefficients of a transformation limitation from claim 3 which was directed to the abstract idea of a mental process. The additional limitations:
“determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
“and determining the linear rates of change based on the magnitudes of the rates of change.” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 7:
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the determining linear rates of change limitation from claim 6 which was directed to the abstract idea of a mental process. The additional limitations:
“computing squares of the magnitudes of the rates of change;” – This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2)) as it involves computing squares. 
“and determining the linear rates of change based on the squares of the magnitudes of the rates of change.” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 8:
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). The additional limitations:
“receiving a request for the set of parameters from the worker;” – This limitation is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (see MPEP 2106.05(d) II.).
“and in response to the request, transmitting the updated values of the set of parameters to the worker.” – This limitation is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (see MPEP 2106.05(d) II.).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 9:
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 1 which included an abstract idea (see rejection for claim 1). This claim merely recites a further limitation on the machine learning model of the receiving limitation from claim 1 which was directed to well-understood, routine, conventional activity. The additional limitation:
“wherein the machine learning model includes a neural network model and the optimization objective is represented by a cross entropy loss function.” – This limitation is directed to the field of use (see MPEP 2106.05(h)) as it merely limiting the fields of the machine learning model and optimization objective. 
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 10:
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is a product.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“determining differences between the previous values and current values of the set of parameters;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. The additional elements:
“An electronic device, comprising: a processing unit; a memory coupled to the processing unit and storing instructions for execution by the processing unit, the instructions, when executed by the processing unit, causing the electronic device to perform acts comprising:” – This limitation is directed to merely using a generic computer as a tool (see MPEP 2106.04(d)).
“receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;” – This limitation is directed to insignificant extra-solution activity (see MPEP 2106.05(g)). 
“and updating the current values based on the feedback data and the differences to obtain updated values of the set of the parameters.” – This limitation is directed to mere data gathering which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)). 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
	No, there are no additional elements that amount to significantly more than the judicial exception.
“receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;” – This limitation is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (see MPEP 2106.05(d) II.).

Regarding Claim 11:
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 10 which included an abstract idea (see rejection for claim 10). This claim merely recites a further limitation on the receiving limitation which was directed to well-understood, routine, conventional activity. The additional limitation “wherein the feedback data indicate significant trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters” is directed to field of use (see MPEP 2106.05(h)) as it is merely limiting the field of the feedback data. Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 12:
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 10 which included an abstract idea (see rejection for claim 10). This claim merely recites a further limitation on the updating limitation from claim 10 which was directed to insignificant extra-solution activity. The claim cites additional abstract ideas:
“determining coefficients of a transformation based on the significant trends of change;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
“and determining differential amounts between the current values and the updated values by applying the transformation on the differences.” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim13:
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 10 which included an abstract idea (see rejection for claim 10). This claim merely recites a further limitation on the determining coefficients of a transformation limitation from claim 12 which was directed to the abstract idea of a mental process. The additional limitation:
“wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the significant trends of change are represented by a gradient of the optimization objective with respect to the previous values of the set of parameters.” – This limitation is directed to the field of use (see MPEP 2106.05(h)) as it merely limiting the fields of the transformation, coefficients, and trends of change. 
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 14:
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim is dependent on claim 10 which included an abstract idea (see rejection for claim 10). This claim merely recites a further limitation on the determining coefficients of a transformation limitation from claim 12 which was directed to the abstract idea of a mental process. The additional limitation:
“computing a tensor product of the gradient as unbiased estimates of the linear rates of change.” – This limitation is directed to the abstract idea of mathematical concepts (see MPEP 2106.04(a)(2)) as it involves calculations of tensor products. 
Thus, the judicial exception is not integrated into a practical application (see MPEP 2106.04(d) I.), failing step 2A Prong 2. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception under step 2B. 

Regarding Claim 15:
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
Yes, the claim is a product.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
“determine differences between the previous values and current values of the set of parameters;” – This limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgment, opinion) which can be performed in the human mind, or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, there are no additional elements that integrate the judicial exception into a practical application. The additional elements:
“A computer program product stored in a computer storage medium and comprising machine executable instructions which, when executed in a device, cause the device to:” – This limitation is directed to merely using a generic computer as a tool (see MPEP 2106.04(d)).
“receive, from the worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of the set of parameters of the machine learning model at the worker;” – This limitation is directed to insignificant extra-solution activity (see MPEP 2106.05(g)). 
“and update the current values based on the feedback data and the differences to obtain updated values of the set of the parameters.” – This limitation is directed to mere data gathering which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity (see MPEP 2106.05(g)). 
Step 2B - Does the claim recite additional elements that amount to significantly more than the judicial exception?
	No, there are no additional elements that amount to significantly more than the judicial exception.
“receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker;” – This limitation is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well‐understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (see MPEP 2106.05(d) II.).

Claim Rejections - 35 USC § 101 – Software per se
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 15 is rejected under 35 U.S.C. 101 as being software per se and subsequently not falling under the four statutory categories of patent eligible subject matter. 
Regarding Claim 15:
Step 1 – Is the claim to a process, machine, manufacture or composition of matter?
No, the claim does not fall within at least one of the four categories of patent eligible subject matter. The claim does not recite the system structure that would perform the functions claimed, therefore, the claim is directed to software per se (see MPEP 2106.03 I.). The preamble recites that the computer program product is stored in a computer storage medium and a device. However, the computer storage medium and device are external and not part of the system claimed. 
Furthermore, if “computer storage medium” was part of the system claimed, the specification does not define a “computer storage medium.” The specification cites support for the “computer storage medium” in ¶72-73 and 79. ¶79 cites, “The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.” This shows that a computer readable storage medium does not have to be hardware and cites transitory forms of signal transmission. Thus, if the “computer storage medium” was part of system claimed, the broadest reasonable interpretation would cover signals per se. 

Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-4, 11-13 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The term “significant” in claims 2-4, 11-13 is a relative term which renders the claim indefinite. The term “significant” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The specification cites in ¶32-33, “For example, in some implementations, the significant trends of change can be the largest trends of change and thus may be represented by a gradient g(wt) of the optimization objective with respect to the current values Wt of the model parameters. Particularly, it should be noted that the scope of the subject matter described herein is not limited to the mathematical representation of the "significant trend of change" or other physical quantities.” It is unclear from this when a trend of change would be considered “significant.” 

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by De et al. (“Scaling Up Distributed Stochastic Gradient Descent Using Variance Reduction”) (herein thereafter De).

Regarding Claim 1:
	De teaches:
A computer-implemented method, comprising: (De discloses methods for stochastic gradient descent implemented on distributed systems in sec. 4 ¶1-2, “We now consider the distributed setting, with a single central server and p local client servers, each of which contains a portion of the data set. […] Our goal is to derive stochastic algorithms in this distributed setting that scale linearly to high p, while remaining stable even under low communication frequencies between local and central nodes.”)
receiving, from a worker, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; (De discloses receiving data from workers in Algorithm 3, shown highlighted below. The data received, Δx, is associated with previous values of a set of parameters. 

    PNG
    media_image1.png
    523
    422
    media_image1.png
    Greyscale


determining differences between the previous values and current values of the set of parameters; (De discloses calculating differences between previous and current parameters in Algorithm 3, shown highlighted below. 

    PNG
    media_image2.png
    523
    422
    media_image2.png
    Greyscale

and updating the current values based on the feedback data and the differences to obtain updated values of the set of the parameters. (De discloses updating in Algorithm 3, shown highlighted below. 

    PNG
    media_image3.png
    523
    422
    media_image3.png
    Greyscale


Regarding Claim 2:
De teaches “The method of claim 1” as seen above.
De further teaches: 

    PNG
    media_image4.png
    29
    254
    media_image4.png
    Greyscale
wherein the feedback data indicate significant trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters. (Examiner notes that significant is not defined and is interpreted as any trend of change. Examiner further notes that trends of change is equivalent to gradient as gradient is rate of change. De discloses that the feedback data received from workers is used to show the trend of change (i.e. gradient) of the objective function, f(x), in sec. 4 ¶5, “Thus, when the central server receives parameters from a local node s, the updates it performs have the form                          where                                 
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                             are now given by 

    PNG
    media_image5.png
    94
    394
    media_image5.png
    Greyscale
)

Regarding Claim 3:
De teaches “The method of claim 2” as seen above.  
De further teaches: 
wherein updating the current values comprises: (De discloses updating in Algorithm 3, shown highlighted below. 

    PNG
    media_image3.png
    523
    422
    media_image3.png
    Greyscale

determining coefficients of a transformation based on the significant trends of change; (Examiner notes that “transformation” is not well defined and is interpreted to be anything that results in a change of values. De discloses a transformation in which the coefficients Δx and Δg are based on the significant trends of change, shown highlighted below. 

    PNG
    media_image6.png
    523
    422
    media_image6.png
    Greyscale

and determining differential amounts between the current values and the updated values by applying the transformation on the differences. (As per ¶38 lines 7-8 of the instant specifications, “differential amounts” is interpreted to be the update amounts of model parameters. De discloses determining differential amounts in sec. 4 ¶6, “Sending the change in the local parameter values, rather than the local parameters themselves, ensures that when updating the central parameter, the previous contribution to the average from that local worker is just replaced by the new value.” De further discloses that the transformation (shown in Algorithm 3 line 20 below) is applied to the differences (shown in Algorithm 3 line 13 below). 

    PNG
    media_image7.png
    523
    422
    media_image7.png
    Greyscale


Regarding Claim 4:
De teaches “The method of claim 3” as seen above.  
	De further teaches:
wherein the transformation is a linear transformation, the coefficients are linear rates of change, (De discloses the transformations in Algorithm 3, shown highlighted below. It can be seen that the transformations are linear.  

    PNG
    media_image6.png
    523
    422
    media_image6.png
    Greyscale


    PNG
    media_image4.png
    29
    254
    media_image4.png
    Greyscale
and the significant trends of change are represented by a gradient of the optimization objective with respect to the previous values of the set of parameters. (De discloses that the significant trends of change are gradients in in sec. 4 ¶5, “Thus, when the central server receives parameters from a local node s, the updates it performs have the form                          where                                 
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                             and                                 
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                             are now given by 

    PNG
    media_image5.png
    94
    394
    media_image5.png
    Greyscale
)

Regarding Claim 5:
De teaches “The method of claim 4,” as seen above.
De further teaches:  
wherein determining the coefficients of the transformation comprises: (Examiner notes that “transformation” is not well defined and is interpreted to be anything that results in a change of values. De discloses a transformation in which the coefficients Δx and Δg are determined based on the significant trends of change, shown highlighted below. 

    PNG
    media_image6.png
    523
    422
    media_image6.png
    Greyscale

computing a tensor product of the gradient as unbiased estimates of the linear rates of change. (De discloses unbiased estimates in sec. 2 ¶4, “Second, in one epoch, we traverse over the dataset using a random permutation over the indices (i.e., indices are chosen without replacement), instead of a random access (with replacement, as in SVRG or SAGA). This ensures that the average gradient we accumulate over one epoch is unbiased, and thus is a good estimate of the true gradient.” De further discloses computing a tensor product (i.e. multiplication of a scalar and vector which are types of tensors) for the unbiased estimate in the highlighted section of Algorithm 3, shown below. 

    PNG
    media_image8.png
    523
    422
    media_image8.png
    Greyscale


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over De in view of Hsu et al. ("Parallel Online Learning”) (herein thereafter Hsu). 

Regarding Claim 6:
De teaches “The method of claim 4,” as seen above. 
De further teaches: 
wherein determining the coefficients of the transformation comprises: (Examiner notes that “transformation” is not well defined and is interpreted to be anything that results in a change of values. De discloses a transformation in which the coefficients Δx and Δg are determined based on the significant trends of change, shown highlighted below. 

    PNG
    media_image6.png
    523
    422
    media_image6.png
    Greyscale

De does not teach “determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters; and determining the linear rates of change based on the magnitudes of the rates of change.”
Hsu teaches:
determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters; (Hsu discloses calculating the magnitude of the gradient (i.e. rates of change) of the optimization objective (referred to as l by Hsu) in sec. 0.6.5 ¶2, “Apart from the weight vector wt, nonlinear CG maintains a direction vector dt and updates are performed in the following way: 
                
                    
                        
                            d
                        
                        
                            t
                        
                    
                    =
                     
                    -
                    
                        
                            g
                        
                        
                            t
                        
                    
                    +
                    
                        
                            β
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                            -
                            1
                        
                    
                
            
                
                    
                        
                            w
                        
                        
                            t
                            +
                            1
                        
                    
                    =
                    
                        
                            w
                        
                        
                            t
                        
                    
                    +
                    
                        
                            α
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                        
                    
                
            
where                         
                            
                                
                                    g
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        τ
                                        ∈
                                        m
                                        (
                                        t
                                        )
                                    
                                
                                
                                    
                                        
                                            ∇
                                        
                                        
                                            w
                                        
                                    
                                    l
                                    (
                                    
                                        
                                            w
                                            ,
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            t
                                        
                                    
                                    )
                                    
                                        
                                            
                                                
                                                    ​
                                                
                                            
                                        
                                        
                                            w
                                            =
                                            
                                                
                                                    w
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     is the gradient computed on the t-th minibatch of examples, denoted by m(t). We set                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                        
                     according to a widely used formula (Gilbert and Nocedal, 1992):                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                            =
                            m
                            a
                            x
                            {
                            0
                            ,
                            
                                
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                    -
                                                    1
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            g
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     ”)
and determining the linear rates of change based on the magnitudes of the rates of change. (Hsu discloses determining linear rates of change in the form of βt in sec. 0.6.5 ¶2, “Apart from the weight vector wt, nonlinear CG maintains a direction vector dt and updates are performed in the following way: 
                
                    
                        
                            d
                        
                        
                            t
                        
                    
                    =
                     
                    -
                    
                        
                            g
                        
                        
                            t
                        
                    
                    +
                    
                        
                            β
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                            -
                            1
                        
                    
                
            
                
                    
                        
                            w
                        
                        
                            t
                            +
                            1
                        
                    
                    =
                    
                        
                            w
                        
                        
                            t
                        
                    
                    +
                    
                        
                            α
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                        
                    
                
            
where                         
                            
                                
                                    g
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        τ
                                        ∈
                                        m
                                        (
                                        t
                                        )
                                    
                                
                                
                                    
                                        
                                            ∇
                                        
                                        
                                            w
                                        
                                    
                                    l
                                    (
                                    
                                        
                                            w
                                            ,
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            t
                                        
                                    
                                    )
                                    
                                        
                                            
                                                
                                                    ​
                                                
                                            
                                        
                                        
                                            w
                                            =
                                            
                                                
                                                    w
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     is the gradient computed on the t-th minibatch of examples, denoted by m(t). We set                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                        
                     according to a widely used formula (Gilbert and Nocedal, 1992):                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                            =
                            m
                            a
                            x
                            {
                            0
                            ,
                            
                                
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                    -
                                                    1
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            g
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     ”)
De, Hsu, and the instant application are analogous art because they are all directed to gradient methods for optimization methods for training machine learning models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the distributed stochastic gradient descent method disclosed by De to include the “determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters; and determining the linear rates of change based on the magnitudes of the rates of change” taught by Hsu. One would be motivated to do so to reduce training time, as suggested by Hsu (Hsu sec. 0.6.5 ¶1: “An algorithm that is slightly more sophisticated than gradient descent is the nonlinear conjugate gradient (CG) method. Nonlinear CG can be thought as gradient descent with momentum where principled ways for setting the momentum and the step sizes are used. Empirically, CG can converge much faster than gradient descent when noise does not drive it too far astray.”).

Regarding Claim 7:
De in view of Hsu teaches “The method of claim 6,” as seen above.  
Hsu further teaches:
wherein determining the linear rates of change based on the magnitudes of the rates of change comprises: (Hsu discloses determining linear rates of change in the form of βt in sec. 0.6.5 ¶2, “Apart from the weight vector wt, nonlinear CG maintains a direction vector dt and updates are performed in the following way: 
                
                    
                        
                            d
                        
                        
                            t
                        
                    
                    =
                     
                    -
                    
                        
                            g
                        
                        
                            t
                        
                    
                    +
                    
                        
                            β
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                            -
                            1
                        
                    
                
            
                
                    
                        
                            w
                        
                        
                            t
                            +
                            1
                        
                    
                    =
                    
                        
                            w
                        
                        
                            t
                        
                    
                    +
                    
                        
                            α
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                        
                    
                
            
where                         
                            
                                
                                    g
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        τ
                                        ∈
                                        m
                                        (
                                        t
                                        )
                                    
                                
                                
                                    
                                        
                                            ∇
                                        
                                        
                                            w
                                        
                                    
                                    l
                                    (
                                    
                                        
                                            w
                                            ,
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            t
                                        
                                    
                                    )
                                    
                                        
                                            
                                                
                                                    ​
                                                
                                            
                                        
                                        
                                            w
                                            =
                                            
                                                
                                                    w
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     is the gradient computed on the t-th minibatch of examples, denoted by m(t). We set                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                        
                     according to a widely used formula (Gilbert and Nocedal, 1992):                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                            =
                            m
                            a
                            x
                            {
                            0
                            ,
                            
                                
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                    -
                                                    1
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            g
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     ”)
computing squares of the magnitudes of the rates of change; (Hsu discloses calculating square of the magnitude of the gradient (i.e. rates of change) in sec. 0.6.5 ¶2, “Apart from the weight vector wt, nonlinear CG maintains a direction vector dt and updates are performed in the following way: 
                
                    
                        
                            d
                        
                        
                            t
                        
                    
                    =
                     
                    -
                    
                        
                            g
                        
                        
                            t
                        
                    
                    +
                    
                        
                            β
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                            -
                            1
                        
                    
                
            
                
                    
                        
                            w
                        
                        
                            t
                            +
                            1
                        
                    
                    =
                    
                        
                            w
                        
                        
                            t
                        
                    
                    +
                    
                        
                            α
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                        
                    
                
            
where                         
                            
                                
                                    g
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        τ
                                        ∈
                                        m
                                        (
                                        t
                                        )
                                    
                                
                                
                                    
                                        
                                            ∇
                                        
                                        
                                            w
                                        
                                    
                                    l
                                    (
                                    
                                        
                                            w
                                            ,
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            t
                                        
                                    
                                    )
                                    
                                        
                                            
                                                
                                                    ​
                                                
                                            
                                        
                                        
                                            w
                                            =
                                            
                                                
                                                    w
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     is the gradient computed on the t-th minibatch of examples, denoted by m(t). We set                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                        
                     according to a widely used formula (Gilbert and Nocedal, 1992):                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                            =
                            m
                            a
                            x
                            {
                            0
                            ,
                            
                                
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                    -
                                                    1
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            g
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     ”)
and determining the linear rates of change based on the squares of the magnitudes of the rates of change. (Hsu discloses linear rates of change in the form of βt via squaring the magnitude of the gradient in sec. 0.6.5 ¶2, “Apart from the weight vector wt, nonlinear CG maintains a direction vector dt and updates are performed in the following way: 
                
                    
                        
                            d
                        
                        
                            t
                        
                    
                    =
                     
                    -
                    
                        
                            g
                        
                        
                            t
                        
                    
                    +
                    
                        
                            β
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                            -
                            1
                        
                    
                
            
                
                    
                        
                            w
                        
                        
                            t
                            +
                            1
                        
                    
                    =
                    
                        
                            w
                        
                        
                            t
                        
                    
                    +
                    
                        
                            α
                        
                        
                            t
                        
                    
                    
                        
                            d
                        
                        
                            t
                        
                    
                
            
where                         
                            
                                
                                    g
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        τ
                                        ∈
                                        m
                                        (
                                        t
                                        )
                                    
                                
                                
                                    
                                        
                                            ∇
                                        
                                        
                                            w
                                        
                                    
                                    l
                                    (
                                    
                                        
                                            w
                                            ,
                                            
                                                
                                                    x
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    
                                        
                                            y
                                        
                                        
                                            t
                                        
                                    
                                    )
                                    
                                        
                                            
                                                
                                                    ​
                                                
                                            
                                        
                                        
                                            w
                                            =
                                            
                                                
                                                    w
                                                
                                                
                                                    t
                                                
                                            
                                        
                                    
                                
                            
                        
                     is the gradient computed on the t-th minibatch of examples, denoted by m(t). We set                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                        
                     according to a widely used formula (Gilbert and Nocedal, 1992):                         
                            
                                
                                    β
                                
                                
                                    t
                                
                            
                            =
                            m
                            a
                            x
                            {
                            0
                            ,
                            
                                
                                    
                                        
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            ,
                                             
                                             
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                
                                            
                                            -
                                            
                                                
                                                    g
                                                
                                                
                                                    t
                                                    -
                                                    1
                                                
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            g
                                                        
                                                        
                                                            t
                                                            -
                                                            1
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                     ”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify De with the teachings of Hsu for at least the same reasons as discussed above in claim 6.

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over De in view of Corrado et al. (US9218573) (herein thereafter Corrado). 

Regarding Claim 8:
De teaches “The method of claim 1” as seen above. 
De further teaches:
[and in response to the request,] transmitting the updated values of the set of parameters to the worker. (De discloses in transmitting the updated values in Algorithm 3, shown highlighted below. 

    PNG
    media_image3.png
    523
    422
    media_image3.png
    Greyscale

De does not explicitly teach “further comprising: receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker.”
Corrado teaches:
further comprising: receiving a request for the set of parameters from the worker; (Corrado discloses in col 3 lines 57-60, “The replica obtains the refreshed value of a parameter by submitting a request to the parameter server shard that maintains the values of the parameter.” The worker is referred to as the replica by Corrado.) 
and in response to the request, transmitting the updated values of the set of parameters to the worker. (Corrado discloses that the updated parameters are transmitted to the worker (i.e. the replica) in col 3 lines 55-57, “As part of the parameter updating aspect 210, the replica obtains refreshed parameter values (step 211) and overwrites current values of the parameters (data 212).”) 
De, Corrado, and the instant application are analogous art because they are all directed to asynchronous training of machine learning models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the distributed stochastic gradient descent method disclosed by De to include the “receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker” taught by Corrado. One would be motivated to do so to efficiently train machine learning models, as suggested by Corrado (Corrado col 1-2 lines 58-3: “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Machine learning models with large numbers of parameters can be trained efficiently and effectively. […] Because model replicas operate asynchronously, problems caused by hardware failures and slow processing speeds are mitigated.”).

Regarding Claim 10:
Claim 10 is a product claim, corresponding to computer-implemented method claim 1. The only difference is that claim 10 recites an electronic device with a processor and memory. 
Corrado teaches:
An electronic device, comprising: a processing unit; (Corrado discloses in col 8 lines 16-19, “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”) 
a memory coupled to the processing unit and storing instructions for execution by the processing unit, the instructions, when executed by the processing unit, causing the electronic device to perform acts comprising: (Corrado discloses in col 8 lines 19-24, “Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”) 
De, Corrado, and the instant application are analogous art because they are all directed to asynchronous training of machine learning models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the distributed stochastic gradient descent method disclosed by De to include the “receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker” taught by Corrado. One would be motivated to do so to efficiently train machine learning models, as suggested by Corrado (Corrado col 1-2 lines 58-3: “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Machine learning models with large numbers of parameters can be trained efficiently and effectively. […] Because model replicas operate asynchronously, problems caused by hardware failures and slow processing speeds are mitigated.”).
The rest of the limitations of claim 10 are rejected for the same reasons as claim 1.

Regarding Claim 11:
Claim 11 is a computer implemented method claim, corresponding to product claim 2. The only difference is that claim 11 recites an electronic device with a processor and memory, taught above. Claim 11 is rejected for the same reasons as claim 2.

Regarding Claim 12:
Claim 12 is a computer implemented method claim, corresponding to product claim 3. The only difference is that claim 12 recites an electronic device with a processor and memory, taught above. Claim 12 is rejected for the same reasons as claim 3.

Regarding Claim 13:
Claim 13 is a computer implemented method claim, corresponding to product claim 4. The only difference is that claim 13 recites an electronic device with a processor and memory, taught above. Claim 13 is rejected for the same reasons as claim 4.

Regarding Claim 14:
Claim 14 is a computer implemented method claim, corresponding to product claim 5. The only difference is that claim 14 recites an electronic device with a processor and memory, taught above. Claim 14 is rejected for the same reasons as claim 5.

Regarding Claim 15:
Claim 15 is a product claim, corresponding to computer-implemented method claim 1. The only difference is that claim 10 recites the product being stored in a computer storage medium. 
Corrado teaches: 
A computer program product stored in a computer storage medium and comprising machine executable instructions which, when executed in a device, cause the device to: (Corrado discloses in col 7 lines 31-36, “Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of data processing apparatus.”) 
De, Corrado, and the instant application are analogous art because they are all directed to asynchronous training of machine learning models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the distributed stochastic gradient descent method disclosed by De to include the “receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker” taught by Corrado. One would be motivated to do so to efficiently train machine learning models, as suggested by Corrado (Corrado col 1-2 lines 58-3: “Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. Machine learning models with large numbers of parameters can be trained efficiently and effectively. […] Because model replicas operate asynchronously, problems caused by hardware failures and slow processing speeds are mitigated.”).
The rest of the limitations of claim 15 are rejected for the same reasons as claim 1.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over De in view of Strom et al. (US10152676) (herein thereafter Strom). 

Regarding Claim 9:
De teaches “The method of claim 1,” as seen above. 
De does not explicitly teach “wherein the machine learning model includes a neural network model and the optimization objective is represented by a cross entropy loss function.”
Strom teaches:
wherein the machine learning model includes a neural network model (Strom teaches in col 4 lines 29-32, “aspects of the embodiments described in the disclosure will focus, for the purpose of illustration, on distributed execution of stochastic gradient descent to train neural network-based models”)
and the optimization objective is represented by a cross entropy loss function. (Strom discloses in col 9 lines 49-53, “As shown in FIG. 3, the gradient computation module may use an objective function 308 to determine the error 310 for the output vector 306 in comparison with the known correct output for the particular input vector 302. For example, L2-norm or cross entropy may be used.”) 
De, Strom, and the instant application are analogous art because they are all directed to training of machine learning models in distributed environments.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the distributed stochastic gradient descent method disclosed by De to include the “wherein the machine learning model includes a neural network model and the optimization objective is represented by a cross entropy loss function” taught by Strom. One would be motivated to do so to efficiently train machine learning models and reduce bandwidth, as suggested by Strom (Strom col 3 lines 3-42: “Aspects of this disclosure relate to efficiently distributing the training of models across multiple computing nodes (e.g., two or more separate computing devices). […] In order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices, only those updates which are expected to provide a substantive change to the model may be applied and exchanged. […] This can improve the efficiency of the distributed training process by substantially reducing the volume of data that is transmitted and the number of times a given parameter is updated.”).

Prior Art of Record
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Agarwal et al. (“Distributed Delayed Stochastic Optimization”) discloses in the abstract, “We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony.” 
Reddi et al. (“On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants”) teaches various methods for asynchronous versions of optimization algorithms (Reddi Abstract: “We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). […] Subsequently, we propose an asynchronous algorithm grounded in our framework, and prove its fast convergence. An important consequence of our general approach is that it yields asynchronous versions of variance reduction algorithms such as SVRG and SAGA as a byproduct.”) 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Somie Park whose telephone number is (571)272-1056. The examiner can normally be reached 9:00am - 5:00pm, Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571)272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SOMIE PARK/Examiner, Art Unit 2126        
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126