DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/22/2021 has been entered.
 
Amendments
NOTE: Claim 16 has the status identifier “Currently amended” but it contains no markups or amendments. Claim 16 is being examined as it was previously presented.
Claims 1-4, 6-21, and 23 are pending and have been examined. Claims 1, 6, 13, and 17-19 are amended.

Claim Objections
Claims 13 and 18-19 are objected to because of the following informalities:  
In claim 13, on page 7, the first line is missing a punctuation mark.  
In Claim 18, lines 4-5 should read “based on
In claim 19, on p. 10, line 4, the indentation makes it unclear where the parameter server is located relative to the other components. For purposes of examination, Examiner interprets claim 19 as if line 4 were indented left one time. 
Appropriate correction is required.


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

“a processing unit” in claim 13, line 7; claim 14, line 2; claim 15, line 5; claim 18, line 2; and claim 19, line 8. 
“a parameter server” in claim 13, line 19; claim 14, line 5; claim 15, line 2; claim 17, line 6; claim 19, line 13; and claim 23, line 9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:



Claims 1-4, 6-21, and 23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In claim 1, last 2 lines, it is unclear if the execution of the backward pass must pause until the subset of the updated parameters has been transmitted.
In claim 9, last 2 lines, unclear if the execution of the remote backward pass must pause until the remote parameters have been transmitted.
For purposes of examination, Examiner interprets claims 1, last 2 lines and claim 9, last 2 lines to mean transmitting the parameters while the backward pass executes.
Claims 2-4 and 6-12 are rejected for failing to cure the deficiencies of claim 1 upon which they depend.
	In claim 13, it is unclear whether each computing node contains a separate, complete neural network or a portion of a shared neural network. For purposes of examination, Examiner interprets claim 13 to mean the former.
In claim 13, lines 5-6, “updating” means storing remote parameters. In lines 14-15 (from “after updating” to ”forward pass”), it is unclear whether “updating” means storing remote parameters or modifying the neural network’s values with the remote parameters’ values. It is unclear whether the execution of the forward pass must pause until remote parameters are received before completing execution. In lines 15-16, a final output is calculated based on the remote parameters, but the claim does not positively recite applying the stored remote parameters to a calculation. It is unclear to Examiner how to calculate an output value based on data which was never applied to the computation. 

For purposes of examination, Examiner interprets claim 13, lines 5-16 to mean executing a forward pass and applying remote parameters to the forward pass operations whenever they become available. Examiner interprets claim 13, last 3 lines to mean a server transmitting parameters while the backward pass executes.
In claim 17, lines 3-5, it is unclear whether “update” means storing remote parameters or modifying the neural network’s values with the remote parameters’ values. In claim 17, lines 6-10, it is unclear which “updated parameters” these lines refer to. It is unclear how the parameter server accesses the updated parameters. It is unclear if the parameter server executes a backward pass. For purposes of examination, Examiner interprets claim 17, lines 3-5 to mean storing, and lines 6-10 to mean the parameter server transmitting the subset during execution of the backward pass.
Claims 14-18 are rejected for failing to cure the deficiencies of claim 13 upon which they depend.
	Claim 19 contains the same indefiniteness issues as claim 13 except for those regarding the “storing” limitation in claim 13. Claim 19 is rejected for the reasons set forth in the 35 U.S.C. 112(b) rejection of claim 13.
Claims 20-21 and 23 are rejected for failing to cure the deficiencies of claim 19 upon which they depend.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 6, 8-13, 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines”, see PTO-892 filed 08/31/2021) in view of Sridharan et al. (US 20180322386 A1).

	Regarding CLAIM 1, Zhang teaches: A method, comprising: 
initiating execution of a forward pass in a local neural network in a local computing node; (The first client in Fig. 2 on p. 6 is a local computing node. The architecture is taught on p. 6, col. 2, first full paragraph, in the sentence starting “Second and most importantly…” A forward pass is taught on P. 6, Algorithm 1, S.N. step 6. The term “data-parallelism” in the title of Algorithm 1 indicates that each node has a local neural network.)
…
initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network; and (P. 5, col. 1, second full paragraph, lines 8-14; P. 6, Algorithm 1, S.N. step 7)
prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes. (P. 6, end of col. 2 states: “The DWBP algorithm enables communication to be overlapped with the error propagation computations.” Also, p. 7, Algorithm 2, line 9 teaches pushing out a gradient from a worker node during backpropagation.)
	However, Zhang does not explicitly teach: after initiating the execution of the forward pass, receiving remote parameters from a set of one or more remote computing nodes wherein the remote parameters are calculated in the set of one or more remote computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass;
updating the local neural network with the remote parameters;
	completing the execution of the forward pass to calculate a final output based on the remote parameters;
	But Sridharan teaches: after initiating the execution of the forward pass, receiving remote parameters from a set of one or more remote computing nodes wherein the remote parameters are calculated in the set of one or more remote computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass; (¶ [0216], lines 7-13. The BRI of execution of the forward pass includes waiting to finish receiving communication of activation data to be used as input data for the forward compute.)
updating the local neural network with the remote parameters; (¶ [0216], lines 12-13)
	completing the execution of the forward pass to calculate a final output based on the remote parameters; (¶ [0216], lines 14-20)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have waited to finish receiving communication of activation data to be used as input data for the forward compute and perform the forward compute. A motivation for the combination is to perform distributed machine learning with the most recent activation gradients from other nodes. (Sridharan ¶ [0216])

Regarding CLAIM 3, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: wherein the execution of the backward pass comprises calculating the subset of the updated parameters for a first layer in the local neural network, and (A first layer is interpreted as a given layer i on p. 7, Algorithm 2. Steps 3 and 6 of Algorithm 2 teaches computing gradients.) 
wherein the transmitting the subset of the updated parameters is performed contemporaneously with executing the backward pass for one or more subsequent layers in the local neural network. (P. 7, col. 2, last paragraph, lines 1-7.) 

Regarding CLAIM 6, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: wherein: updating the local neural network with the remote parameters is in response to receiving the remote parameters, (The BRI of remote parameters includes inputs to a backward pass. P. 7, Algorithm 2, step 9: pull in updated                         
                            
                                
                                    A
                                
                                
                                    i
                                
                            
                        
                    )
wherein transmitting the subset of the updated parameters is performed during execution of the backward pass in response to determining the updated parameters for one of the plurality of layers in the neural network, and (P. 7, Algorithm 2, step 9: pull in updated                         
                            
                                
                                    A
                                
                                
                                    i
                                
                            
                        
                    . Algorithm 2 occurs during execution of a backward pass)
wherein the updating of the local neural network and the transmitting of the subset of updated parameters are performed asynchronously with respect to the completion of the backward pass. (P. 7, all of col. 2)
	Additionally, Sridharan teaches transmitting remote parameters as inputs for a local forward compute asynchronously with completion of a local backward compute (¶ [0210] teaches overlapping compute operations; per ¶ [0216]-[0217] and Fig. 16A, step 1a in node 0 may happen asynchronously with backward compute 1624 in node 1.)

	Regarding CLAIM 8, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: wherein: the updated parameters are determined for a first subset of neurons in the local neural network; and the method further comprises updating a second subset of neurons in the local neural network with the remote parameters, wherein the first subset of neurons and the second subset of neurons are mutually exclusive. (The broadest reasonable interpretation allows Examiner to classify Fig. 3b’s layers                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                     and                         
                            
                                
                                    l
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                     as comprising the local neural network, where the first subset is layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                     and the second subset is layer                         
                            
                                
                                    l
                                
                                
                                    L
                                    -
                                    1
                                
                            
                        
                    . Under this interpretation, the neurons are mutually exclusive.)

Regarding CLAIM 9, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: further comprising, for each remote Application No.: 15/898,433-4- Atty. Docket No.: 170255-US-NPcomputing node of the set of remote computing nodes: (P. 6, Algorithm 1 S.N. step 4: “foreach worker thread p” where each p denotes a different remote computing node)
at the remote computing node, receiving the subset of the updated parameters from the local computing node; (P. 9, Algorithm 3, step 7, the remote computing node receives                         
                            
                                
                                    u
                                
                                
                                    i
                                
                                
                                    p
                                
                            
                        
                    ,                        
                             
                            
                                
                                    v
                                
                                
                                    i
                                
                                
                                    p
                                
                            
                        
                     from local node “worker p”.)
based on the updated parameters, executing a remote forward pass in a copy of the local neural network stored in the remote computing node; (P. 6, Algorithm 1 S.N. step 6. Since the architecture uses data-parallelism, each node has a local copy of the neural network.)
determining the remote parameters during execution of a remote backward pass in the copy of the local neural network; and (P. 6, Algorithm 1, S.N. step 7)
prior to completing the remote backward pass, transmitting the remote parameters from the remote computing node to the local computing node. (P.7, Algorithm 2, step 9)

Regarding CLAIM 10, the combination of Zhang and Sridharan teaches: The method of claim 9, 
However, Zhang does not explicitly teach: further comprising: executing the forward pass in the local neural network contemporaneously with executing the remote backward pass in at least one of the set of remote computing nodes.
further comprising: executing the forward pass in the local neural network contemporaneously with executing the remote backward pass in at least one of the set of remote computing nodes. (¶ [0210] teaches overlapping compute operations. According to Fig. 16A (¶ [0216]-[0217]), forward compute 1612 in Node 0 may happen contemporaneously with backward compute 1624 in Node 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed forward and backward compute operations in different nodes contemporaneously. A motivation for the combination is to optimize compute and communication efficiency and throughput. (¶ [0210], lines 7-8)

Regarding CLAIM 11, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: further comprising: executing the forward pass by, for each layer of a plurality of layers in the local neural network, calculating an intermediate output for the layer based on at least one of a set of inputs provided by a training instance, and Application No.: 15/898,433-5- Atty. Docket No.: 170255-US-NPan intermediate output of a preceding layer in the local neural network. (P. 5, col. 1, second full paragraph, lines 5-8 teaches a forward pass in stochastic gradient descent; P. 6, Algorithm 1, S.N. step 6.)

Regarding CLAIM 12, the combination of Zhang and Sridharan teaches: The method of claim 11, 
Zhang teaches: further comprising: executing the backward pass by, for each layer of the plurality of layers in the local neural network, calculating the subset of the updated parameters for the layer based on an error between the final output determined from the forward pass and a desired output for the training instance. (P. 5, col. 1, second full paragraph, lines 8-14 teaches backpropagation in stochastic gradient descent; P. 6, Algorithm 1, S.N. line 7.)

CLAIM 13, Zhang teaches: A computing node, comprising: (The first client in Fig. 2 on p. 6 is a local computing node. The architecture is taught on p. 6, col. 2, first full paragraph, in the sentence starting “Second and most importantly…”) 
a communication network interface configured to; (Fig. 2 shows server-client communication and peer-to-peer communications. The architecture is taught on p. 6, col. 2, first full paragraph, in the sentence starting “Second and most importantly…”)
receive remote parameters from a set of one or more remote computing nodes, and update a neural network by storing the remote parameters in a memory; (Remote computing nodes include the second and third clients from the left in Fig. 2. Receive and storing are taught by p. 9 col. 1, top paragraph, lines 6-8 (step 2). Storing training data is taught by p. 10, § 4.4, lines 6-11. Memory is taught by p. 10, col. 2, line 4-5.)
a processing unit configured to: (P. 6, last line in col. 1 to first line in col. 2)
execute a backward pass in the neural network to determine updated parameters for the neural network; and (P. 6, Algorithm 1, S.N. step 7)
a parameter server coupled with the processing unit and configured to, prior to completion of the backward pass, transmit a subset of the updated parameters to the set of remote computing nodes. (On p. 6, Fig. 2 shows a server/master at the top. P. 6, end of col. 2 states: “The DWBP algorithm enables communication to be overlapped with the error propagation computations.” Also, p. 7, Algorithm 2, line 9 teaches pushing out a gradient from a worker node during backpropagation.)
	However, Zhang does not explicitly teach: initiate the execution of a forward pass in the neural network prior to the receiving of the remote parameters, wherein the remote parameters are calculated in the set of one or more remote computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass
after updating of the neural network with the remote parameters, complete the execution of the forward pass to calculate a final output based on the remote parameters, and
	But Sridharan teaches: initiate the execution of a forward pass in the neural network prior to the receiving of the remote parameters, wherein the remote parameters are calculated in the set of one or more remote computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass (¶ [0216], lines 7-13. The BRI of execution of the forward pass includes waiting to finish receiving communication of activation data to be used as input data for the forward compute.)
	after updating of the neural network with the remote parameters, complete the execution of the forward pass to calculate a final output based on the remote parameters, and (¶ [0216], lines 12-20)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have waited to finish receiving communication of activation data to be used as input data for the forward compute and perform the forward compute. A motivation for the combination is to perform distributed machine learning with the most recent activation gradients from other nodes. (Sridharan ¶ [0216])

Regarding CLAIM 17, the combination of Zhang and Sridharan teaches: The computing node of claim 13, wherein: 
Zhang teaches: the communication network interface is further configured to, in response to receiving the remote parameters, update the neural network with the remote parameters asynchronously with respect to the completion of the backward pass, and (The BRI of remote parameters includes inputs to a backward pass. P. 7, col. 2, first paragraph.  Asynchronously is also push means transmitting from worker to server and pull means transmitting from server to worker during the backpropagation.)
the parameter server is further configured to, during execution of the backward pass and in response to determining the updated parameters for one of a plurality of layers in the neural network, transmit the subset of updated parameters asynchronously with respect to completion of the backward pass. (P. 7, col. 2, first paragraph. In Fig. 3b, push means transmitting from worker to server and pull means transmitting from server to worker during the backpropagation.)
	Additionally, Sridharan teaches transmitting remote parameters as inputs for a local forward compute asynchronously with completion of a local backward compute (¶ [0210] teaches overlapping compute operations; per ¶ [0216]-[0217] and Fig. 16A, step 1a in node 0 may happen asynchronously with backward compute 1624 in node 1.)

Regarding CLAIM 18, the combination of Zhang and Sridharan teaches: The computing node of claim 13, 
Zhang teaches: wherein the processing unit is further configured to: execute the forward pass by, for each layer of a plurality of layers in the neural network, calculating an intermediate output for the layer based at least one of a set of inputs provided by a training instance, and an intermediate output of a preceding layer in the neural network; and (P. 5, col. 1, second full paragraph, lines 5-8 teaches a forward pass in stochastic gradient descent; P. 6, Algorithm 1, S.N. step 6.)
execute the backward pass by, for each layer of the plurality of layers in the neural network, calculating the subset of the updated parameters for the layer based on an error between the final output determined from the forward pass and a desired output for the training instance. (P. 5, col. 1, second full paragraph, lines 8-14 teaches backpropagation in stochastic gradient descent; P. 6, Algorithm 1, S.N. line 7.)

	Regarding CLAIM 19, Zhang teaches: A computing system, comprising: (Fig. 2 and p. 6, col. 2, first full paragraph.)
a communication network; and (Fig. 2 shows server-client communication and peer-to-peer communications. The architecture is taught on p. 6, col. 2, first full paragraph, in the sentence starting “Second and most importantly…”)
a plurality of computing nodes coupled with the communication network, wherein each computing node of the plurality of computing nodes comprises: (In Fig. 2, the 3 clients are a plurality of computing nodes. See p. 6, col. 2, first full paragraph.)
a communication network interface configured to receive remote parameters from the other computing nodes of the plurality of computing nodes; (Fig. 2 shows server-client communication and peer-to-peer communications. The architecture is taught on p. 6, col. 2, first full paragraph, in the sentence starting “Second and most importantly…”)
a processing unit configured to: (P. 6, last line in col. 1 to first line in col. 2)
execute a backward pass in the neural network to determine updated parameters for the neural network; and (P. 5, col. 1, second full paragraph, lines 8-14; P. 6, Algorithm 1, S.N. step 7)
a parameter server coupled with the processing unit and configured to, prior to completion of the backward pass, transmit a subset of the updated parameters to the other computing nodes, (On p. 6, Fig. 2 shows a server/master at the top. P. 6, end of col. 2 states: “The DWBP algorithm enables communication to be overlapped with the error propagation computations.” Also, p. 7, Algorithm 2, line 9 teaches pushing out a gradient from a worker node during backpropagation.)
execute a forward pass in a neural network to calculate a Application No.: 15/898,433-9- Atty. Docket No.: 170255-US-NPfinal output based on the remote parameters, and
wherein for each computing node of the plurality of computing nodes, the communication network interface of the computing node is further configured to update the neural network with the remote parameters after initiation of the forward pass and prior to completion of the forward pass, wherein the remote parameters are calculated in one or more of the other computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass.
	But Sridharan teaches: execute a forward pass in a neural network to calculate a Application No.: 15/898,433-9- Atty. Docket No.: 170255-US-NPfinal output based on the remote parameters, and (¶ [0216], lines 7-12 teaches a node receives remote parameters for use in a forward pass. ¶ [0216], lines 12-20 teaches calculating a final output for a forward pass.)
wherein for each computing node of the plurality of computing nodes, the communication network interface of the computing node is further configured to update the neural network with the remote parameters after initiation of the forward pass and prior to completion of the forward pass, wherein the remote parameters are calculated in one or more of the other computing nodes during the execution of the forward pass, and the remote parameters are transmitted during the execution of the forward pass. (¶ [0216], lines 7-20, where the BRI of execution of a forward pass include waiting to finish receiving communication of activation data to be used as input data for the forward compute.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have waited to finish receiving communication of activation data to be used as input data for the forward compute and perform the forward compute. A motivation for the combination is to perform distributed machine learning with the most recent activation gradients from other nodes. (Sridharan ¶ [0216])

CLAIM 20,  the combination of Zhang and Sridharan teaches: The computing system of claim 19, 
However, Zhang does not explicitly teach: wherein the forward pass for a first computing node of the plurality of computing nodes is executed contemporaneously with the execution of the backward pass for a second computing node of the plurality of computing nodes. 
	But Sridharan teaches: wherein the forward pass for a first computing node of the plurality of computing nodes is executed contemporaneously with the execution of the backward pass for a second computing node of the plurality of computing nodes. (¶ [0210] teaches overlapping compute operations. According to Fig. 16A (¶ [0216]-[0217]), forward compute 1612 in Node 0 may happen contemporaneously with backward compute 1624 in Node 1.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have performed forward and backward compute operations in different nodes contemporaneously. A motivation for the combination is to optimize compute and communication efficiency and throughput. (¶ [0210], lines 7-8)

Regarding CLAIM 21, the combination of Zhang and Sridharan teaches: The computing system of claim 19, 
Zhang teaches: wherein each computing node Application No.: 15/898,433-10- Atty. Docket No.: 170255-US-NPof the plurality of computing nodes is configured to determine the updated parameters for a different subset of neurons in the neural network. (P. 12, col. 1, paragraph “Settings” states, “We train the CNN with fully data-parallelism”. Each computing node has a local copy of the network in data-parallelism. Under the BRI of the claim, Zhang’s data-parallelism teaches the claim limitations.)

Claims 2 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines”, see PTO-892 filed 08/31/2021) in view of Sridharan et al. (US 20180322386 A1) and Zhao et al. (U.S. Patent 7,782,851).
	
	Regarding CLAIM 2, the combination of Zhang and Sridharan teaches: The method of claim 1, 
Zhang teaches: wherein: the execution of the backward pass comprises calculating the subset of the updated parameters for a first layer in the local neural network, and (On p. 7, Algorithm 2, steps 3 and 6 compute gradients for a layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                    .)
However, neither Zhang nor Sridharan explicitly teaches: transmitting the subset of the updated parameters is performed in response to determining that the subset of updated parameters for the layer differs from a set of corresponding prior parameters for the layer.
	But Zhao teaches: transmitting the subset of the updated parameters is performed in response to determining that the subset of updated parameters for the layer differs from a set of corresponding prior parameters for the layer. (The BRI of this limitation is that the parameter server does not transmit redundant data. Zhao teaches this in C. 8, L. 11-12)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have configured a server not no send redundant information. A motivation for the combination is to conserve bandwidth. (Zhao, C. 8, L. 11-12)

	Regarding CLAIM 14, the combination of Zhang and Sridharan teaches: The computing node of claim 13, 
 wherein: the processing unit is further configured to execute the backward pass by calculating the subset of the updated parameters for a layer in the neural network, and (On p. 7, Algorithm 2, steps 3 and 6 compute gradients for a layer                         
                            
                                
                                    l
                                
                                
                                    L
                                
                            
                        
                    .)
However, neither Zhang nor Sridharan explicitly teaches: the parameter server is further configured to transmit the subset of the updated parameters in response to determining that the subset of updated parameters differs from a set of corresponding prior parameters for the layer. 
	But Zhao teaches: the parameter server is further configured to transmit the subset of the updated parameters in response to determining that the subset of updated parameters differs from a set of corresponding prior parameters for the layer. (The BRI of this limitation is that the parameter server does not transmit redundant data. Zhao teaches this in C. 8, L. 11-12)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have configured a server not no send redundant information. A motivation for the combination is to conserve bandwidth. (Zhao, C. 8, L. 11-12)

Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines”, see PTO-892 filed 08/31/2021) in view of Sridharan et al. (US 20180322386 A1) and Lilienthal et al. (U.S. Patent 10,951,683)

	Regarding CLAIM 4, the combination of Zhang and Sridharan teaches:  The method of claim 1, 
Zhang teaches: wherein the subset of the updated parameters is one of a plurality of subsets of the updated parameters transmitted in response to completion of every nth layer in the execution of the backward pass, and wherein the method further comprises… during the execution of the backward pass. (On p. 7, Algorithm 2, line 9 teach that the updated                         
                            
                                
                                    A
                                
                                
                                    i
                                
                            
                        
                     is transmitted from the master 
	However, neither Zhang nor Sridharan explicitly teaches: wherein the method further comprises dynamically adjusting a value of n 
	But Lilienthal teaches: wherein the method further comprises dynamically adjusting a value of n (C. 6, L. 16-18)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have dynamically adjusted the master node’s transmission rate, as taught by Lilienthal. A motivation for the combination is to accommodate the network bandwidth available to the devices. (Lilienthal, C. 6, L. 16-18)

	Regarding CLAIM 15, the combination of Zhang and Sridharan teaches: The computing node of claim 13, 
Zhang teaches: wherein: the parameter server is configured to transmit one or more of a plurality of subsets of the updated parameters in response to completion of every nth layer in the execution of the backward pass, and…  during the execution of the backward pass. (On p. 7, Algorithm 2, line 9 teach that the updated                         
                            
                                
                                    A
                                
                                
                                    i
                                
                            
                        
                     is transmitted from the master node to the slave/worker node. Algorithm 2, line 1 teaches this happens after the completion of every layer in the execution of the backward pass.)
	However, neither Zhang nor Sridharan explicitly teaches: the processing unit is further configured to dynamically adjust a value of n 
	But Lilienthal teaches: the processing unit is further configured to dynamically adjust a value of n (C. 6, L. 16-18)
(Lilienthal, C. 6, L. 16-18)

Claims 7, 16, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (“Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines”, see PTO-892 filed 08/31/2021) in view of Sridharan et al. (US 20180322386 A1) and Qian (US 20160330283 A1).

	Regarding CLAIM 7, the combination of Zhang and Sridharan teaches: The method of claim 1, further comprising: in response to receiving the remote parameters, updating the local neural network with the remote parameters (The BRI of remote parameters includes inputs to a backward pass. P. 7, col. 2, first paragraph.  Asynchronously is also taught by p. 6, col. 2, last paragraph, line 7. In Fig. 3b, push means transmitting from worker to server and pull means transmitting from server to worker during the backpropagation.)
	However, neither Zhang nor Sridharan explicitly teaches: updating by performing a direct memory write of the remote parameters to a local memory in the local computing node.
	But Qian teaches: updating by performing a direct memory write of the remote parameters to a local memory in the local computing node. (Last 4 lines of ¶ [0026] in col. 2 teach DMA.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have stored Zhang/Sridharan’s parameter update in the system’s memory via a direct memory write, as taught by Qian. A motivation for the combination is that DMA write operation (Qian, ¶ [0044], last 4 lines)

	Regarding CLAIM 16, the combination of Zhang and Sridharan teaches: The computing node of claim 13, 
	However, neither Zhang nor Sridharan explicitly teaches: wherein the communication network interface is further configured to store the remote parameters in the memory via a direct memory write. 
	But Qian teaches: wherein the communication network interface is further configured to store the remote parameters in the memory via a direct memory write. (Last 4 lines of ¶ [0026] in col. 2 teach DMA.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have stored Zhang/Sridharan’s parameter update in the system’s memory via a direct memory write, as taught by Qian. A motivation for the combination is that DMA write operation reduces the consumption of CPU resources which further reduces impact on processing rates of other application programs. (Qian, ¶ [0044], last 4 lines)

Regarding CLAIM 23, the combination of Zhang and Sridharan teaches: The computing system of claim 19, 
Zhang teaches: wherein for each computing node of the plurality of computing nodes: 
the communication network interface of the computing node is further configured to, in response to receiving the remote parameters, update the neural network with the remote parameters…, wherein the update of the neural network is asynchronous with respect to the completion of the backward pass; and  (The BRI of remote parameters includes inputs to a backward pass. P. 7, col. 2, first Asynchronously is also taught by p. 6, col. 2, last paragraph, line 7. In Fig. 3b, push means transmitting from worker to server and pull means transmitting from server to worker during the backpropagation.)
the parameter server of the computing node is further configured to, during execution of the backward pass, transmit the subset of updated parameters in response to the processing unit determining the updated parameters for one of the plurality of layers in the neural network, wherein the transmitting the subset of updated parameters is asynchronous with respect to the completion of the backward pass. (P. 7, col. 2, first paragraph. In Fig. 3b, push means transmitting from worker to server and pull means transmitting from server to worker during the backpropagation.)
	Additionally, Sridharan teaches transmitting remote parameters as inputs for a local forward compute asynchronously with completion of a local backward compute (¶ [0210] teaches overlapping compute operations; per ¶ [0216]-[0217] and Fig. 16A, step 1a in node 0 may happen asynchronously with backward compute 1624 in node 1.)
	However, Zhang does not explicitly teach: update by performing a direct memory write of the remote parameters to a local memory in the local computing node
	But Qian teaches: update by performing a direct memory write of the remote parameters to a local memory in the local computing node (Last 4 lines of ¶ [0026] in col. 2 teach DMA.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have stored Zhang’s parameter update in the system’s memory via a direct memory write, as taught by Qian. A motivation for the combination is that DMA write operation reduces the consumption of CPU resources which further reduces impact on processing rates of other application programs. (Qian, ¶ [0044], last 4 lines)

Response to Arguments
	Examiner herein responds to Applicant’s remarks and claim amendments filed 11/22/2021.
Claim Objections (Remarks p. 12): The objection to claim 6 is withdrawn due to the claim amendment.

Rejections to the Claims Under 35 U.S.C. § 112 (Remarks pp. 12-13): The previous rejections of claims 1-4 and 6-18 concerning insufficient antecedent basis are withdrawn due to the claim amendments. 

Rejections to the Claims Under 35 U.S.C. § 103 (Remarks pp. 13-15): Applicant’s arguments with respect to claim 1 has been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To 





/ASHER H. JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127