DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Claims 1-4, 6-21, and 23 are pending and have been examined. Claims 1, 6, 13, 16-17, 19, and 23 are amended. Claims 5 and 22 are canceled.

Claim Objections
Claim 6 is objected to because the language “further comprising” in line 1 should recite “wherein”, and the language “remote parameters in response” in lines 2-3 should recite “remote parameters is in response.” Examiner is interpreting Claim 6 as if it had recited, “The method of claim 1, wherein updating the local neural network with the remote parameters is in response to receiving the remote parameters…” Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-4 and 6-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter 
Claim 1 recites the limitation “the final output” in line 10. There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the claim as if it had recited “a final output”.
Claim 13 recites the limitation "the neural network" in line 5. There is insufficient antecedent basis for this limitation in the claims. For examining purposes, examiner is interpreting the claim as if line 5 had recited “a neural network” and line 8 had recited “the neural network”. Examiner interprets all recitations of “the neural network” in Claim 13’s claim tree as referring to the one recited in line 5. 
Claim 13 recites the limitation “the final output” in line 13 and claim 18 recites “the final output” in the second-to-last line. There is insufficient antecedent basis for these limitations in the claims. Examiner interprets the claims as if claim 13 had recited “a final output”.
Claim 17 recites the limitation “the plurality of layers” in line 8. There is insufficient antecedent basis for this limitation in the claims. Examiner interprets the claim as if it had recited “a plurality of layers”. 
Claim 18 recites the limitation “the local neural network” in lines 3, 7, 10. There is insufficient antecedent basis for this limitation in the claims. Examiner interprets the claim as if line 3 had recited “a local neural network”.

Dependent claims 2-4 and 6-12 are rejected for failing to cure the deficiencies of independent claim 1.
Dependent claims 14-18 are rejected for failing to cure the deficiencies of independent claim 13.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 6-21 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over “Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines” (2015) to Zhang et al., hereinafter Zhang in view of “S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters” (Feb. 4-8, 2017) to Awan et al., hereinafter Awan.

Regarding Claim 1, Zhang teaches: A method, comprising: 
initiating execution of a forward pass in a local neural network in a local computing node; (A local computing node is the left client in Fig. 2 (p. 6). Algorithm 1’s “Slave nodes” step 6 teaches performing a forward pass in the local node.)
receiving remote parameters from a set of one or more remote computing nodes; (A remote computing node is the middle client computing node in Fig. 2. Page 9 col. 1, top paragraph:

    PNG
    media_image1.png
    196
    550
    media_image1.png
    Greyscale

 In Algorithm 3 step 8 (p. 9), the local client node receives initial parameters                         
                            
                                
                                    u
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                             
                            ,
                             
                            
                                
                                    v
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            j
                            ≠
                            p
                        
                     from all other workers.) 
updating the local neural network with the remote parameters; (Page 9 col. 1, top paragraph: “In a distributed setting with P workers, on worker p, instead of directly communicating two full matrices                         
                            ∇
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     with the master node, we recast it to three steps: …
    PNG
    media_image2.png
    58
    537
    media_image2.png
    Greyscale
”)
…
initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network; and (Algorithm 1, “Slave node” step 7 teaches initiating execution of a backwards pass by following the DWBP Algorithm 2 on p. 7. The updated parameters are gradients in Algorithm 2, steps 3/6.)
prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes. (Examiner interprets a subset of the updated parameters as one or more gradients. The subset is sent out prior to the completion of the entire backward pass, as shown in Fig. 3b, in Algorithm 2 step 9, and in the top sentence on p. 7 col. 2.)

However, Zhang does not explicitly teach: receiving parameters after initiating the execution of the forward pass, 
completing the execution of the forward pass to calculate the final output based on the remote parameters; 

But Awan teaches: receiving parameters after initiating the execution of the forward pass, (Awan Fig. 5 teaches communication overlapping with a forward pass. The parameters are received at the layers after initiating the forward pass.)
completing the execution of the forward pass to calculate the final output based on the remote parameters; (Awan Fig. 5 teaches communication overlapping with a forward pass. The last layer generates a final output.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Awan’s system into Zhang’s system by overlapping parameter updates and forward propagation with a motivation of updating parameters to layers in an on-demand fashion (p. 198, col. 2, top paragraph).

    PNG
    media_image3.png
    325
    220
    media_image3.png
    Greyscale

Awan Fig. 5. Overlapped Data Propagation with Forward

Regarding claim 2, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: wherein: the execution of the backward pass comprises calculating the subset of the updated parameters for a first layer in the local neural network, and (Zhang teaches in Algorithm 2 steps 3/6 computing gradients for a layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                    .)
transmitting the subset of the updated parameters is performed in response to determining that the subset of updated parameters for the layer differs from a set of corresponding prior parameters for the layer. (Zhang teaches on p. 10 § 4.3.3 that the bandwidth manager “allocates network bandwidth according to the messages’ contribution to convergence.” The bandwidth manager would not allocate bandwidth for transmitting parameters that have converged.)

Regarding claim 3, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: wherein the execution of the backward pass comprises calculating the subset of the updated parameters for a first layer in the local neural network, and (Zhang teaches in Algorithm 2 step 6 on p. 7 computing gradients                         
                            ∇
                            
                                
                                    A
                                
                                
                                    L
                                
                            
                        
                     for layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                    )
wherein the transmitting the subset of the updated parameters is performed contemporaneously with executing the backward pass for one or more subsequent layers in the local neural network. (The Fig. 3b timeline shows a given layer’s gradient is “pushed” or transmitted before every layer’s error message is computed, that is, before the whole network completes backpropagation)

Regarding claim 4, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: wherein the subset of the updated parameters (a single gradient                         
                            ∇
                            
                                
                                    A
                                
                                
                                    L
                                
                            
                        
                    ) is one of a plurality of subsets of the updated parameters (another subset is gradient                         
                            ∇
                            
                                
                                    A
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    ) transmitted in response to completion of every nth layer in the execution of the backward pass (each gradient is transmitted after completion of the layer), and wherein the method further comprises dynamically adjusting a value of n during the execution of the backward pass. (According to Algorithm 3 (p. 9), col. 2, steps 2 and 11, the gradient is sent to the master node only if the layer is not fully-connected. Therefore, n, which controls the frequency with which updates are sent with respect to layers, is dynamically updated during backpropagation according the sequence of FC and non-FC layers in the network.)

Regarding claim 6, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: further comprising: -41-AMD Reference No.: 170255-US-NPupdating the local neural network with the remote parameters in response to receiving the remote parameters, (The claim objections notes this is interpreted as “wherein updating the local neural network with the remote parameters is in response to receiving the remote parameters”. Zhang teaches this on Page 9 col. 1, top paragraph: “In a distributed setting…”) 
wherein transmitting the subset of the updated parameters is performed during execution of the backward pass in response to determining the updated parameters for one of the plurality of layers in the neural network, (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)
wherein the updating of the local neural network and the transmitting of the subset of updated parameters are performed asynchronously with respect to the completion of the backward pass. (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)

Regarding claim 7, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: further comprising: in response to receiving the remote parameters, updating the local neural network with the remote parameters by performing a direct memory write of the remote parameters to a local memory in the local computing node. (P. 10, col. 2, ¶ 1: Each node has a local memory which stores the local parameters.)

Regarding claim 8, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: wherein: the updated parameters are determined for a first subset of neurons in the local neural network; and the method further comprises updating a second subset of neurons in the local neural network with the remote parameters, wherein the first subset of neurons and the second subset of neurons are mutually exclusive. (The broadest reasonable interpretation allows Examiner to classify Fig. 3b’s layers                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                     and                         
                            
                                
                                    l
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                     as comprising the local neural network, where the first subset is layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                     and the second subset is layer                         
                            
                                
                                    l
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    . Under this interpretation, the neurons are mutually exclusive.)

Regarding claim 9, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: further comprising, for each remote computing node of the set of remote computing nodes: (Algorithm 1 SN ln. 4 on p. 6: “foreach worker thread p” where each p denotes a different remote computing node)
at the remote computing node, receiving the subset of the updated parameters from the local computing node; (At Algorithm 3, step 7,the remote computing node receives                         
                            
                                
                                    u
                                
                                
                                    i
                                
                                
                                    p
                                
                            
                        
                    ,                        
                             
                            
                                
                                    v
                                
                                
                                    i
                                
                                
                                    p
                                
                            
                        
                     from local node “worker p”.) 
based on the updated parameters, executing a remote forward pass in a copy of the local neural network stored in the remote computing node; (Algorithm 1 SN step 6)
determining the remote parameters during execution of a remote backward pass in the copy of the local neural network; and (Algorithm 1 SN step 7)
prior to completing the remote backward pass, transmitting the remote parameters from the remote computing node to the local computing node. (Algorithm 2 step 9)

Regarding claim 10, the combination of Zhang and Awan teaches: The method of claim 9, 
Further, Zhang teaches: further comprising: executing the forward pass in the local neural network contemporaneously with executing the remote backward pass in at least one of the set of remote computing nodes. (The Fig. 3b timeline shows a given layer’s gradient is “pushed” or transmitted before every layer’s error message is computed, that is, before the whole network completes backpropagation)


Regarding claim 11, the combination of Zhang and Awan teaches: The method of claim 1, 
Further, Zhang teaches: further comprising: executing the forward pass by, for each layer of a plurality of layers in the local neural network, calculating an intermediate output for the layer based on at least one of a set of inputs provided by a training instance, and -43-AMD Reference No.: 170255-US-NPan intermediate output of a preceding layer in the local neural network. (Algorithm 1 is titled “CNN training”. Claim 11 essentially describes propagating training data forward through a neural network with at least 2 layers (interpreting the last layer’s “intermediate output” as “output”), which is taught by Algorithm 1, S.N. step 6.)  

Regarding claim 12, the combination of Zhang and Awan teaches: The method of claim 11, 
Further, Zhang teaches: further comprising: executing the backward pass by, for each layer of the plurality of layers in the local neural network, calculating the subset of the updated parameters for the layer based on an error between… and a desired output for the training instance. (Fig. 3b and p. 7, col. 2, ¶ 2)
Awan teaches: the final output determined from the forward pass (Awan Fig. 5 teaches communication overlapping with a forward pass. The last layer generates a final output.)

Regarding claim 13, Zhang teaches: A computing node, comprising: (Fig. 2 on p. 6, the left client)
a communication network interface configured to; (Fig. 2 communication arrows)
receive remote parameters from a set of one or more remote computing nodes, and (A remote computing node is the middle client computing node in Fig. 2. Page 9 col. 1, top paragraph:

    PNG
    media_image1.png
    196
    550
    media_image1.png
    Greyscale

 In Algorithm 3 step 8 (p. 9), the local client node receives initial parameters                         
                            
                                
                                    u
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                             
                            ,
                             
                            
                                
                                    v
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            j
                            ≠
                            p
                        
                     from all other workers.) 
update the neural network by storing the remote parameters in a memory; (Page 9 col. 1, top paragraph: “In a distributed setting with P workers, on worker p, instead of directly communicating two full matrices                         
                            ∇
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     with the master node, we recast it to three steps: …
    PNG
    media_image2.png
    58
    537
    media_image2.png
    Greyscale
” Memory - P. 10, col. 2, ¶ 1: Each node has a local memory which stores the local parameters.)
a processing unit configured to: (P. 6, col. 2, top: “GPUs”)
updating of the neural network with the remote parameters, (Page 9 col. 1, top paragraph: “In a distributed setting with P workers, on worker p, instead of directly communicating two full matrices                         
                            ∇
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     and                         
                            
                                
                                    W
                                
                                
                                    p
                                
                            
                        
                     with the master node, we recast it to three steps: …
    PNG
    media_image2.png
    58
    537
    media_image2.png
    Greyscale
”)
…
		execute a backward pass in the neural network to determine updated parameters for the neural network; and (Algorithm 1, “Slave node” step 7 teaches initiating execution of a backwards pass by following the DWBP Algorithm 2 on p. 7. The updated parameters are gradients in Algorithm 2, steps 3/6.)
	a parameter server (contained in local client because it transmits parameters as taught by Fig. 4b) coupled with the processing unit and configured to, prior to completion of the backward pass, transmit a subset of the updated parameters to the set of remote computing nodes.(Examiner interprets a subset of the updated parameters as one or more gradients. The subset is sent out prior to the completion of the entire backward pass, as shown in Fig. 3b, in Algorithm 2 step 9, and in the top sentence on p. 7 col. 2.)
	However, Zhang does not explicitly teach: initiate the execution of a forward pass in a neural network prior to the receiving remote parameters
	after updating, complete the execution of the forward pass to calculate the final output based on the remote parameters, and
	But Awan teaches: initiate the execution of a forward pass in a neural network prior to the receiving remote parameters (Awan Fig. 5 teaches communication overlapping with a forward pass. The parameters are received at the layers after initiating the forward pass.)
after updating, complete the execution of the forward pass to calculate the final output based on the remote parameters, and (Awan Fig. 5 teaches communication overlapping with a forward pass. The last layer generates a final output.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Awan’s system into Zhang’s system by overlapping parameter updates and forward propagation with a motivation of updating parameters to layers in an on-demand fashion (p. 198, col. 2, top paragraph).

Regarding claim 14, the combination of Zhang and Awan teaches: The computing node of claim 13, wherein: 
Further, Zhang teaches: the processing unit is further configured to execute the backward pass by calculating the subset of the updated parameters for a layer in the neural network, and (Zhang teaches in Algorithm 2 steps 3/6 computing gradients for a layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                    .)
the parameter server is further configured to transmit the subset of the updated parameters in response to determining that the subset of updated parameters differs from a set of corresponding prior parameters for the layer (Zhang teaches on p. 10 § 4.3.3 that the bandwidth manager “allocates network bandwidth according to the messages’ contribution to convergence.” The bandwidth manager would not allocate bandwidth for transmitting parameters that have converged.)

Regarding claim 15, the combination of Zhang and Awan teaches: The computing node of claim 13, 
Further, Zhang teaches: wherein: the parameter server is configured to transmit one (a single gradient                         
                            ∇
                            
                                
                                    A
                                
                                
                                    L
                                
                            
                        
                    ) or more of a plurality of subsets of the updated parameters (another subset is gradient                         
                            ∇
                            
                                
                                    A
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    ) in response to completion of every nth layer in the execution of the backward pass (each gradient is transmitted after completion of the layer), and the processing unit is further configured to dynamically adjust a value of n during the execution of the backward pass. (According to Algorithm 3 on p. 9, col. 2, steps 2 and 11, the gradient is sent to the master node only if the layer is not fully-connected. Therefore, n, which controls the frequency with which updates are sent with respect to layers, is dynamically updated during backpropagation according the sequence of FC and non-FC layers in the network.)

Regarding claim 16, the combination of Zhang and Awan teaches: The computing node of claim 13, 
Further, Zhang teaches: wherein the communication network interface is further configured to store the remote parameters in the memory via a direct memory write. (P. 10, col. 2, ¶ 1: Each node has a local memory which stores the local parameters.)

Regarding claim 17, the combination of Zhang and Awan teaches: The computing node of claim 13, wherein:
Further, Zhang teaches: the communication network interface is further configured to, in response to receiving the remote parameters, update the neural network with the remote parameters (Zhang teaches this on Page 9 col. 1, top paragraph: “In a distributed setting…”) asynchronously with respect to the completion of the backward pass, and (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)
the parameter server is further configured to, during execution of the backward pass and in response to determining the updated parameters for one of the plurality of layers in the neural network, transmit the subset of updated parameters asynchronously with respect to completion of the backward pass. (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)

Regarding claim 18, the combination of Zhang and Awan teaches: The computing node of claim 13, 
Further, Zhang teaches: wherein the processing unit is further configured to: execute the forward pass by, for each layer of a plurality of layers in the local neural network, calculating an intermediate output for the layer based at least one of a set of inputs provided by a training instance, and an intermediate output of a preceding layer in the local neural network; (Algorithm 1 is titled “CNN training”. Claim 18 essentially describes propagating training data forward through a neural network with at least 2 layers (interpreting the last layer’s “intermediate output” as “output”), which is taught by Algorithm 1, S.N. step 6.)  
and execute the backward pass by, for each layer of the plurality of layers in the local neural network, calculating the subset of the updated parameters for the layer based on an error between the final output determined from the forward pass and a desired output for the training instance. (Fig. 3b and p. 7, col. 2, ¶ 2)

Regarding claim 19, Zhang teaches: A computing system, comprising: a communication network (Fig. 2 on p. 6, communication arrows); and a plurality of computing nodes (Fig. 2 left and middle clients) coupled with the communication network, wherein each computing node of the plurality of computing nodes comprises: 
a communication network interface (Fig. 2 communication arrows between left and middle clients) configured to 
receive remote parameters from the other computing nodes of the plurality of computing nodes; (A remote computing node is the middle client computing node in Fig. 2. Page 9 col. 1, top paragraph:

    PNG
    media_image1.png
    196
    550
    media_image1.png
    Greyscale

 In Algorithm 3 step 8 (p. 9), the local client node receives initial parameters                         
                            
                                
                                    u
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                             
                            ,
                             
                            
                                
                                    v
                                
                                
                                    i
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            j
                            ≠
                            p
                        
                     from all other workers.) 
a processing unit configured to: (P. 6, col. 2, top: “GPUs”)
execute a forward pass in a neural network to calculate a final output based on the remote parameters (Algorithm 1 on p. 6: SN step 6 teaches performing a forward pass on the local node (e.g., left node) based on parameters from the remote node (e.g., middle node))
and execute a backward pass in the neural network to determine updated parameters for the neural network; and (Algorithm 1, “Slave node” step 7 teaches initiating execution of a backwards pass by following the DWBP Algorithm 2 on p. 7. The updated parameters are gradients in Algorithm 2, steps 3/6.) 
a parameter server coupled with the processing unit and configured to, (parameter server is contained in local client because it transmits parameters as taught by Fig. 4b)
prior to completion of the backward pass, transmit a subset of the updated parameters to the other computing nodes, (Examiner interprets a subset of the updated parameters as one or more gradients. The subset is sent out prior to the completion of the entire backward pass, as shown in Fig. 3b, in Algorithm 2 step 9, and in the top sentence on p. 7 col. 2.)
	However, Zhang does not explicitly teach: wherein for each computing node of the plurality of computing nodes, the communication network interface of the computing node is further configured to update the neural network with the remote parameters after initiation of the forward pass and prior to completion of the forward pass.
	But Awan teaches this limitation. Awan Fig. 5 teaches communication overlapping with a forward pass. The parameters are received at the layers after initiating the forward pass.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Awan’s system into Zhang’s system by overlapping parameter updates and forward propagation with a motivation of updating parameters to layers in an on-demand fashion (p. 198, col. 2, top paragraph).

Regarding claim 20, the combination of Zhang and Awan teaches: The computing system of claim 19, 
Further, Zhang teaches: wherein the forward pass for a first computing node of the plurality of computing nodes is executed contemporaneously with the execution of the backward pass for a second computing node of the plurality of computing nodes. (The Fig. 3b timeline shows a given layer’s gradient is “pushed” or transmitted before every layer’s error message is computed, that is, before the whole network completes backpropagation.)

Regarding claim 21, the combination of Zhang and Awan teaches: The computing system of claim 19, 
Further, Zhang teaches: wherein each computing node of the plurality of computing nodes is configured to determine the updated parameters for a different subset of neurons in the neural network. (Examiner is interpreting “different” to mean different from other nodes’ subset of neurons, even if the neurons in the other nodes are copies. Algorithm 2, step 6 teaches computing gradients aka the parameters.)


Regarding claim 23, the combination of Zhang and Awan teaches: The computing system of claim 19, 
Further, Zhang teaches: wherein for each computing node of the plurality of computing nodes: 
the communication network interface of the computing node is further configured to,
in response to receiving the remote parameters, update the neural network with the remote parameters by performing a direct memory write of the remote parameters to a local memory in the local computing node, (Zhang P. 10, col. 2, ¶ 1: Each node has a local memory which stores the local parameters) wherein the update of the neural network is asynchronous with respect to the completion of the backward pass; and (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)
the parameter server of the computing node is further configured to, 
during execution of the backward pass, transmit the subset of updated parameters in response to the processing unit determining the updated parameters for one of the plurality of layers in the neural network, wherein the transmitting the subset of updated parameters is asynchronous with Application No.: 15/898,433-11- Atty. Docket No.: 170255-US-NPrespect to the completion of the backward pass. (Zhang P. 6, col. 2, ¶ 3: “At the beginning of training, every worker thread starts its Caffe engine to perform feedforward and then backpropagation pass for some number of times, via the distributed wait-free backpropagation algorithm, during which they communicate asynchronously” [edited by examiner].)

Response to Arguments
Information Disclosure Statements: Applicant did not respond to examiner’s contentions about the information disclosure statement. Applicant should submit an argument under the heading “Remarks” pointing out disagreements with the examiner’s contentions.  

Claim Interpretation: The term “server” in claims 13, 15, 17, 19, and 23 is no longer being interpreted under 35 U.S.C. 112(f) due to Applicant’s persuasive arguments.

Claim Rejections under 35 U.S.C. 112 (Remarks p. 13-15): The rejection of claims 13, 15, 17, 19, and 23 under 35 U.S.C. 112(b) for reciting the term “server” is withdrawn due to Applicant’s persuasive arguments. The rejections of claims 14, 16, 18, 20, and 21 for failing to cure the deficiencies of the claims upon which they depend are withdrawn.
The rejection of claims 6, 17, and 23 for reciting the relative term “asynchronously” is withdrawn due to the claim amendments filed 05/18/2021 clarifying that the term is with respect to completion of the backward pass.

Claim Rejections under 35 U.S.C. 102 and 103: Applicant's arguments filed 05/18/2021 have been fully considered but they are not persuasive.

Applicant’s Argument #1 (Remarks p. 16-17):

    PNG
    media_image4.png
    127
    585
    media_image4.png
    Greyscale
 

    PNG
    media_image5.png
    252
    579
    media_image5.png
    Greyscale


Examiner’s Response #1: Applicant states “one having ordinary skill in the art would understand that the forward pass is completed prior to starting the backpropagation pass shown at line 7” and “the forward pass is also completed before any update… occurs.” This is taught by the combination of Zhang and Awan.


Applicant’s Argument #2 (Remarks p. 12):

    PNG
    media_image6.png
    417
    591
    media_image6.png
    Greyscale


Examiner’s Response #2: At Fig. 5 of Awan, the update is requested in the step Loop{iBcast(i)}, but the updates are not received until the wait step is completed since the system is waiting for the updates to be received before proceeding. Claim 1 only requires receiving parameters after initiating the execution of the forward pass. Fig. 5 shows the parameters, which are represented by the rectangles on the left side, are received at each layer             
                
                    
                        L
                    
                    
                        n
                    
                
            
         after starting the forward pass, thus “after initiating execution.”


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122