DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This Office Action is in response to the communication filed on 04/01/2019.
	Claims 1-20 are being considered on the merits.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 4/01/2019, 7/02/2020, 1/05/2021, 5/20/2021, 8/19,2021, 9/03/2021, 10/21/2021, 1/19/2022, 03/28/2022, 8/02/2022 and 08/04/2022 have been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, initialed and dated copies of Applicant's IDS forms 1449 filed 4/01/2019, 7/02/2020, 1/05/2021, 5/20/2021, 8/19,2021, 9/03/2021, 10/21/2021, 1/19/2022, 03/28/2022, 8/02/2022 and 08/04/2022 are attached to the instant Office action. 
Drawings
	The drawings filed on 04/01/2019 are accepted. 
Claim Objections
Claim 13 is objected to because of the following informalities: the second line of the first limitation of the claim reads: “…an out-of-sync indications from each of the plurality of nodes…” and should instead read “…out-of-sync indications from each of the plurality of nodes…” .
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 8 recites the limitation "the fault condition" in the last line of the claim.  There is insufficient antecedent basis for this limitation in the claim as neither claims 5, 3, 2, nor 1 define or set forth a “fault condition”.  
Claim 9 recites the limitation "the distributed ledger" in the third paragraph of the claim.  There is insufficient antecedent basis for this limitation in the claim as neither claims 5, 3, 2, nor 1 define or set forth a “distributed ledger”.  
Claim 11 recites the limitation "the computer node" in the third paragraph of the claim. There is insufficient antecedent basis for this limitation in the claim as the claim sets forth a “plurality of nodes” but does not otherwise define or set for any specific node, including specifically a “computer node”. 
Claim 19 recites the limitation "the distributed ledger" in the second-to-last paragraph of the claim.  There is insufficient antecedent basis for this limitation in the claim as neither claims 15, 13, 12, nor 11 define or set forth a “distributed ledger”.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Baird, et. al. (US 2019/0020629 A1; hereinafter, “Baird”), in view of Reisizadeh, et. al. (“Robust and Communication-Efficient Collaborative Learning”, 31 Oct 2019, arXiv).

Regarding Claim 1, Baird teaches a system: 
generate a blockchain transaction comprising at least an indication that the respective computer node is present on the decentralized network to share the training parameters for participating in the current iteration of training (Baird, Pg. 4-5, para. 0034 and Pg. 20, para. 0172: “the distributed database instance 114 stores a record of a synchronization event, a record of prior synchronization events with other compute devices, an order of synchronization events, an order of transactions within events, parameters and/or values associated with identifying an order of synchronization events and/or transactions (e.g., used in calculating an order using a consensus method as described herein), a value for a parameter” “any other suitable distributed database and/or distributed ledger technology can be used to implement the above-described methods to facilitate secure and anonymous transactions. For example, in other instances technologies such as blockchain, AXOS, RAFT, Bitcoin, Ethereum and/or the like can be used to implement such methods”. Examiner notes that the broadest reasonable interpretation of “an indication” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present). 
a master node on the decentralized network being programmed to: receive indications from each of the plurality of computer nodes that are present on the decentralized network for participating in the current iteration of training (Baird, Pg. 4, para. 033: “Each compute device 110, 120, 130, 140 can be any type of device configured to send data over the network 105 to send and/or receive data from one or more of the other compute devices… In some embodiments, the memory 112 stores instructions to cause the processor to execute modules, processes and/or functions associated with sending to and/or receiving from another instance of a distributed database ( e.g., distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices”. Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present such as receiving a record from another node).
determine a number of computer nodes corresponding to a population of computer nodes that are present on the decentralized network for participating in the current iteration of training based on the received indications (Baird, pg. 21, para. 0178: “when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that in determining that the number of computer nodes is below a threshold, the system is nevertheless determining the number of nodes that are present on the network. Examiner additionally notes that “present on the decentralized network for participating” is broadly interpreted to mean “present on the decentralized network to be able to participate”). 
determine whether the number of computer nodes is above a predefined threshold (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold.)
and upon determining that the number of computer nodes is above the predefined threshold, perform operations for the current iteration of training (Baird, pg. 21, para. 0178: “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as a computer randomly selecting a member for synchronization).  
Baird does not explicitly disclose: 

a plurality of computer nodes on a decentralized network, each of the plurality of computer nodes being programmed to: train a respective local model based on a respective local training dataset during a current iteration of training a machine learning model
generate training parameters at a respective computer node based on the respective local model
However, Reisizadeh teaches:

a plurality of computer nodes on a decentralized network, each of the plurality of computer nodes being programmed to: train a respective local model based on a respective local training dataset during a current iteration of training a machine learning model; (Reisizadeh, pg. 2: “In this work we consider the general data-parallel setting where the data is distributed across different computing nodes, and develop decentralized optimization methods that do not rely on a central coordinator but instead only require local computation and communication among neighboring nodes…each node has computed a random number of stochastic gradients from which it aggregates and generates a stochastic gradient for its local objective…at every iteration all the nodes simultaneously start computing stochastic gradients by randomly picking data points from their local batches and evaluating the gradient function on the picked data point”)
generate training parameters at a respective computer node based on the respective local model (Reisizadeh, pg. 5: “Specifically, at time t, node i updates its decision variable according to the update:                                 
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                            +
                                            1
                                        
                                    
                                    =
                                     
                                    
                                        
                                            1
                                            -
                                            ε
                                            +
                                            ε
                                            
                                                
                                                    w
                                                
                                                
                                                    i
                                                    i
                                                
                                            
                                        
                                    
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                        
                                    
                                    +
                                    ε
                                    
                                        
                                            ∑
                                            
                                                j
                                                ∈
                                                N
                                                i
                                            
                                        
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                    -
                                    α
                                    ε
                                    
                                        
                                            ∇
                                        
                                        ~
                                    
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                
                             where α and ε are positive scalars that behave stepsize.” Examiner notes that the broadest reasonable interpretation of “training parameters” means any parameters related to training such as its decision variable).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  

Regarding Claim 11, Baird teaches a method of decentralized machine learning:
generating, by each of the plurality of nodes, a blockchain transaction comprising an indication that the computer node is present on the decentralized network to share the shared training parameters for participating in the current iteration of training (Baird, Pg. 4-5, para. 0034 and Pg. 20, para. 0172: “the distributed database instance 114 stores a record of a synchronization event, a record of prior synchronization events with other compute devices, an order of synchronization events, an order of transactions within events, parameters and/or values associated with identifying an order of synchronization events and/or transactions (e.g., used in calculating an order using a consensus method as described herein), a value for a parameter” “any other suitable distributed database and/or distributed ledger technology can be used to implement the above-described methods to facilitate secure and anonymous transactions. For example, in other instances technologies such as blockchain, AXOS, RAFT, Bitcoin, Ethereum and/or the like can be used to implement such methods”. Examiner notes that the broadest reasonable interpretation of “an indication” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present). 
Receiving, by a master node…indications from each of the plurality of computer nodes that are present on the decentralized network for participating in the current iteration of training, wherein the master node is selected from among the plurality of computer nodes participating in the current iteration of training (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network; Examiner additionally notes that in determining that the number of computer nodes is below a threshold, the system is necessarily determining the number of nodes that are present on the network. Examiner additionally notes that “present on the decentralized network for participating” is broadly interpreted to mean “present on the decentralized network to be able to participate”)
Determining, by the master node, a number of computer nodes that corresponds to the received indications, wherein the number of computer nodes represents a population of computer nodes that are present on the decentralized network for participating in the current iteration of training (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network; Examiner additionally notes that in determining that the number of computer nodes is below a threshold, the system is necessarily determining the number of nodes that are present on the network. Examiner additionally notes that “present on the decentralized network for participating” is broadly interpreted to mean “present on the decentralized network to be able to participate”.).
determining, by the master node, whether the number of computer nodes in the population is above a predefined population threshold  (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value…Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network; Examiner additionally notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold.)
upon determining that the number of computer nodes in the population is above the predefined population threshold, continuing, by the master node, to perform operations for the current iteration of training (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network; Examiner notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as randomly selecting a member for completing a synchronization step).
Baird does not explicitly disclose: 
training, by each of a plurality of nodes on a decentralized network, a local model based on a local training dataset during a current iteration of training a machine-learned model 
generating, by each of the plurality of nodes, shared training parameters based on the local model 
…decentralized network…
However, Reisizadeh teaches: 
training, by each of a plurality of nodes on a decentralized network, a local model based on a local training dataset during a current iteration of training a machine-learned model (Reisizadeh, pg. 2: “In this work we consider the general data-parallel setting where the data is distributed across different computing nodes, and develop decentralized optimization methods that do not rely on a central coordinator but instead only require local computation and communication among neighboring nodes…each node has computed a random number of stochastic gradients from which it aggregates and generates a stochastic gradient for its local objective…at every iteration all the nodes simultaneously start computing stochastic gradients by randomly picking data points from their local batches and evaluating the gradient function on the picked data point”)
generating, by each of the plurality of nodes, shared training parameters based on the local model (Reisizadeh, pg. 5: “Specifically, at time t, node i updates its decision variable according to the update:                                 
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                            +
                                            1
                                        
                                    
                                    =
                                     
                                    
                                        
                                            1
                                            -
                                            ε
                                            +
                                            ε
                                            
                                                
                                                    w
                                                
                                                
                                                    i
                                                    i
                                                
                                            
                                        
                                    
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                        
                                    
                                    +
                                    ε
                                    
                                        
                                            ∑
                                            
                                                j
                                                ∈
                                                N
                                                i
                                            
                                        
                                        
                                            
                                                
                                                    w
                                                
                                                
                                                    i
                                                    j
                                                
                                            
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                    -
                                    α
                                    ε
                                    
                                        
                                            ∇
                                        
                                        ~
                                    
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                
                             where α and ε are positive scalars that behave stepsize.” Examiner notes that the broadest reasonable interpretation of “training parameters” means any parameters related to training such as its decision variable).
…decentralized network… (Reisizadeh, pg. 2: “In this work we consider the general data-parallel setting where the data is distributed across different computing nodes, and develop decentralized optimization methods that do not rely on a central coordinator but instead only require local computation and communication among neighboring nodes). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  

Regarding Claim 12, Baird and Reisizadeh teach a method of claim 11 (above). Baird further teaches: 
the predefined population threshold indicates a minimum number of computer nodes in a population for participating in an iteration of training that is required for completing the iteration (Baird, pg. 21, para. 0178: “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes that the broadest reasonable interpretation of a “predefined threshold” includes a threshold value defined and configured by a user to be the minimum required. Examiner additionally notes that the broadest reasonable interpretation of “indicates” means any indication including the presence of a threshold). 

Claims 2-10 and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Baird, in view of Reisizadeh and further in view of Feng, et. al. (US 2017/0220949 A1; hereinafter, “Feng”)
Regarding Claim 2, Baird and Reisizadeh teach the system of claim 1 (above). Baird further teaches: 
wherein the predefined threshold indicates a minimum number of computer nodes in the population of computer nodes for participating in an iteration of training that is required for completing the iteration (Baird, pg. 21, para. 0178: “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes that the broadest reasonable interpretation of a “predefined threshold” includes a threshold value defined and configured by a user to be the minimum required).
Baird does not explicitly disclose:
“…an iteration of training…”
However, Feng teaches
“…an iteration of training…” (Feng, Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets.”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 


Regarding Claim 3, Baird, Reisizadeh and Feng teach the system of claim 2 (above). Baird further teaches: 
prior to continuing to perform operations for the current iteration of training, receive out-of-sync indications from each of the plurality of computer nodes on the decentralized network that are out-of-sync with the current iteration of training, wherein blockchain transactions comprise the out-of-sync indications (Baird, Pg. 4, para. 0033 and 0034: “In some embodiments, the memory 112 stores instructions to cause the processor to execute modules, processes and/or functions associated with sending to and/or receiving from another instance of a distributed database (e.g., distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices” “distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices, and/or an order of synchronization events, and/or an order of transactions within events, parameters associated with identifying an order of synchronization events and/or transactions, and/or a value for a parameter”. Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present such as receiving a record from another node.)
determine a number of out-of-sync computer nodes that corresponds to the received out-of-sync indications, wherein the number of out-of-sync computer nodes represents a group of computer nodes that are present on the decentralized network and not ready for participating in the current iteration of training (Baird, Pg. 4, para. 033: “In some embodiments, the memory 112 stores instructions to cause the processor to execute modules, processes and/or functions associated with sending to and/or receiving from another instance of a distributed database (e.g., distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices”. Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would include indications that some computer nodes are out-of-sync.).
determine whether the updated number of computer nodes in the population is above the predefined population threshold (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold). 
upon determining that the updated number of computer nodes in the population is above the predefined population threshold, continue to perform operations for the current iteration of training (Baird, pg. 21, para. 0178: “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as randomly selecting a member for synchronization).
	Bard does not explicitly disclose:
a decentralized network
	However, Reisizadeh teaches:
a decentralized network (Reisizadeh, pg. 2: “In this work we consider the general data-parallel setting where the data is distributed across different computing nodes, and develop decentralized optimization methods that do not rely on a central coordinator but instead only require local computation and communication among neighboring nodes”)
 It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  
Neither Baird nor Reisizadeh explicitly discloses: 
“…iteration of training…” 
exclude the number of out-of-sync computer nodes from the number of computer nodes in the population to further determine an updated number of computer nodes in theSMRH:489918606.3-- 61 --P A T E N TDocket No. 90684321(61CT-289710) population, wherein the updated number of computer nodes represents a population of computer nodes that are present on the decentralized network and ready to share their respective shared training parameters for participating in the current iteration of training 
However, Feng teaches:
“…iteration of training…” (Feng, Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets.”).
exclude the number of out-of-sync computer nodes from the number of computer nodes in the population to further determine an updated number of computer nodes in theSMRH:489918606.3-- 61 --P A T E N TDocket No. 90684321(61CT-289710) population, wherein the updated number of computer nodes represents a population of computer nodes that are present on the decentralized network and ready to share their respective shared training parameters for participating in the current iteration of training (Feng, Pg. 3, para. 0047 and Pg. 7, para. 0077: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets.” “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”. Examiner notes that each node in Feng that participated in training shares their parameters after each iteration such that their participation and continued connection makes them “ready to share”; examiner additionally notes that upon failure of one processing unit and allocating training to remaining processing units is necessarily a determination of the number of computer nodes present). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 
	
Regarding Claim 4, Baird, Reisizadeh, and Feng teach the system of claim 3 (above). Baird further teaches: 
…based on the out-of-sync indications such that their respective shared training parameters are prevented from being applied to the machine-learned model (Baird, Pg. 4, para. 033: “the memory 112 stores instructions to cause the processor to execute modules, processes and/or functions associated with sending to and/or receiving from another instance of a distributed database ( e.g., distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices”. Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present such as receiving a record from another node).
Baird does not explicitly disclose: 
exclude the out-of-sync computer nodes from participating in the current iteration of training…
However, Feng teaches:
exclude the out-of-sync computer nodes from participating in the current iteration of training…(Feng, Pg. 3, para. 0047 and Pg. 7, para. 0077: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets.” “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the exclusion of a failed GPU and allocation of training data precludes integration of the failed GPU, including application of any training parameters to machine-learned model). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 5, Baird, Reisizadeh, and Feng teach the system of claim 3 (above). Baird further teaches: 
upon determining that the number of computer nodes in the population is below the predefined population threshold, wait for the population to recover by pausing from performing operations for the current iteration of training for a specified time period (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that the broadest reasonable interpretation of a “specified time period” includes that time period required to establish new connections).
after the specified time period, determine whether the population is recovered to include a number of computer nodes that is above the predefined population threshold (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold. Examiner also notes that the broadest reasonable interpretation of “specified time period” includes that time period occurring after detection of a failure).  
upon determining that the population is recovered, continue to perform operations for the current iteration of training (Baird, pg. 21, para. 0178: “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700... Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as randomly selecting a member for synchronization). 
Baird does not explicitly disclose:
“…iteration of training”
However, Feng teaches:
“…iteration of training” (Feng, Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets.”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 6, Baird, Reisizadeh, and Feng teach the system of claim 5 (above). Reisizadeh further teaches: 
a decentralized network (Reisizadeh, pg. 2: “In this work we consider the general data-parallel setting where the data is distributed across different computing nodes, and develop decentralized optimization methods that do not rely on a central coordinator but instead only require local computation and communication among neighboring nodes”)
Reisizadeh does not explicitly disclose:
the population recovers from a fault condition of at least one of the plurality of computer nodes on the decentralized network, the fault condition comprising one of: network connectivity outage, power outage, or computer node crash
However, Feng teaches:
the population recovers from a fault condition of at least one of the plurality of computer nodes on the decentralized network, the fault condition comprising one of: network connectivity outage, power outage, or computer node crash (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the broadest reasonable interpretation of “recovered” to mean restored to a normal state of functioning and the broadest reasonable interpretation of a “computer node crash” to mean a computer failure due to hardware or software failure).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 7, Baird, Reisizadeh, and Feng teach the system of claim 6 (above). Feng further teaches: 
the plurality of computer nodes are further programmed to: automatically perform one or more corrective actions to recover from the fault condition (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the broadest reasonable interpretation of “a corrective action” includes any action that will allow continuation of the intended process such as allocation of training data to other nodes).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 8, Baird, Reisizadeh, and Feng teach the system of claim 5 (above). Feng further teaches: 
waiting for the population to recover enables training of the machine-learned model to tolerate the fault condition (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that the broadest reasonable interpretation of “tolerate” means to allow the existence of something without interference such as where the model continues to exist while waiting for new connections).   

Regarding Claim 9, Baird, Reisizadeh, and Feng teach the system of claim 5 (above). Baird further teaches: 
cause the transaction to be written as a block on the distributed ledger (Baird, Pg. 4-5, para. 0034 and Pg. 20, para. 0172: “the distributed database instance 114 stores a record of a synchronization event, a record of prior synchronization events with other compute devices, an order of synchronization events, an order of transactions within events, parameters and/or values associated with identifying an order of synchronization events and/or transactions (e.g., used in calculating an order using a consensus method as described herein), a value for a parameter” “any other suitable distributed database and/or distributed ledger technology can be used to implement the above-described methods to facilitate secure and anonymous transactions. For example, in other instances technologies such as blockchain, AXOS, RAFT, Bitcoin, Ethereum and/or the like can be used to implement such methods”)
Baird does not explicitly disclose: 
generate merged training parameters based on the shared training parameters;
	However, Reisizadeh teaches:
generate merged training parameters based on the shared training parameters; (Reisizadeh, pg. 3, “Algorithm Update”: “Once the local variables are exchanged between neighboring nodes, each node                                 
                                    i
                                     
                                
                            uses its local stochastic gradient                                 
                                    
                                        
                                            ∇
                                        
                                        ~
                                    
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                
                             its local decision variable                                 
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                        
                                    
                                
                             and the information received from its neighbors                                 
                                    
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                    ,
                                                    t
                                                
                                            
                                            =
                                            Q
                                             
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            i
                                                            ,
                                                            t
                                                        
                                                    
                                                
                                            
                                            ;
                                            j
                                            ∈
                                            
                                                
                                                    N
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                             to update its local decision variable.” Examiner notes that the broadest reasonable interpretation of training parameters means parameters related to training a network such as its decision variable).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  
Neither Baird nor Reisizadeh explicitly discloses:
upon continuing to perform operations for the current iteration of training, obtain shared training parameters from the computer nodes in the population for participating in the current iteration of training  
generate a transaction that includes an indication that the master node has generated the merged training parameters 
make the merged training parameters available to each of the computer nodes in the population for participating in the current iteration of training.
However, Feng teaches:
upon continuing to perform operations for the current iteration of training, obtain shared training parameters from the computer nodes in the population for participating in the current iteration of training (Feng, Pg. 1, para. 0011 Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets” “receiving, from each of the plurality of nodes, estimated values of the one or more parameters obtained based on a corresponding sub-set of data allocated to the node; and estimating the one or more parameters of the machine learning model based on the estimated values of the one or more parameters generated by at least some of the plurality of nodes”; Examiner notes that the broadest reasonable interpretation of “shared training parameters” means training parameters which have been shared i.e. communicated to another node) 
generate a transaction that includes an indication that the master node has generated the merged training parameters (Feng, Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets. Then, the coordination node 308 has to send the integrated state back to each operation node for next iteration of distributed training”. Examiner notes that the broadest reasonable interpretation of “transaction” includes an interaction between two parties or things that reciprocally affect each other such here where the exchange of information between the coordination node and each operating node during second and subsequent iterations indicates that the coordination node has integrated and updated parameters).
make the merged training parameters available to each of the computer nodes in the population for participating in the current iteration of training (Feng, Pg. 3, para. 0047 and Pg. 7, para. 0078: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets” “the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”. Examiner notes that Feng allows any remaining processing unit to retrieve a snapshot of last iteration of training which was sent to the coordination mode at the end of the last iteration).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 10, Baird, Reisizadeh, and Feng teach the system of claim 9 (above). Feng further teaches: 
each of the plurality of computer nodes are further programmed to: upon the master node continuing to perform operations for the current iteration of training, obtain merged training parameters from the master node  (Feng, Pg. 3, para. 0047 and Pg. 7, para. 0078: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets” “the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”. Examiner notes that Feng allows any remaining processing unit to retrieve a snapshot of last iteration of training which was sent to the coordination mode at the end of the last iteration and integrated). 
apply the merged training parameters to the local model (Feng, Pg. 3, para. 0047 and Pg. 7, para. 0078: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets” “the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”. Examiner notes that Feng allows any remaining processing unit to retrieve a snapshot of last iteration of training which was sent to the coordination mode at the end of the last iteration and integrated and retrieved by each node by its model snapshot retriever). 
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 13, Baird and Reisizadeh teach a method of claim 12 (above). Baird further teaches: 
prior to continuing to perform operations for the current iteration of training, receiving, by the master node, an out-of-sync indications from each of the plurality of computer…that are out-of-sync with the current iteration of training, wherein blockchain transactions comprise the out-of-sync indications (Baird, Pg. 4, para. 0034; pg. 5, para. 0042; pg. 20, para. 0172: “distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices, and/or an order of synchronization events, and/or an order of transactions within events, parameters associated with identifying an order of synchronization events and/or transactions, and/or a value for a parameter”. “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “any other suitable distributed database and/or distributed ledger technology can be used to implement the above-described methods to facilitate secure and anonymous transactions. For example, in other instances technologies such as blockchain, AXOS, RAFT, Bitcoin, Ethereum and/or the like can be used to implement such methods”. Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would indicate whether or not a respective computer node is present such as receiving a record from another node. Examiner additionally notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network). 
determining, by the master node, a number of out-of-sync computer nodes that corresponds to the received out-of-sync indications, wherein the number of out-of-sync computer nodes represents a group of computer nodes that are present…and not ready for participating in the current iteration of training. (Baird, Pg. 4, para. 033: “In some embodiments, the memory 112 stores instructions to cause the processor to execute modules, processes and/or functions associated with sending to and/or receiving from another instance of a distributed database (e.g., distributed database instance 124 at compute device 120) a record of a synchronization event, and/or a record of prior synchronization events with other compute devices” “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” Examiner notes that the broadest reasonable interpretation of “indications” means any indication including records of synchronization events with other devices which would include indications that some computer nodes are out-of-sync. Examiner additionally notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network).
determining, by the master node, whether the updated number of computer nodes in the population is above the predefined population threshold (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network. Examiner additionally notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold).
upon determining that the updated number of computer nodes in the population is above the predefined population threshold, continuing, by the master node, to perform operations for the current iteration of training. (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network. Examiner additionally notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as randomly selecting a member for synchronization).
Baird does not explicitly disclose:
…on the decentralized network…
However, Reisizadeh teaches:
…on the decentralized network…
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  
Neither Baird nor Reisizadeh teach: 
Excluding…the number of out-of-sync computer nodes from the number of computer nodes in the population to further determine an updated number of computer nodes in the population, wherein the updated number of computer nodes represents a population of computer nodes that are present on the decentralized network and ready to share their respective shared training parameters for participating in the current iteration of training 
However, Feng teaches:
Excluding…the number of out-of-sync computer nodes from the number of computer nodes in the population to further determine an updated number of computer nodes in the population, wherein the updated number of computer nodes represents a population of computer nodes that are present on the decentralized network and ready to share their respective shared training parameters for participating in the current iteration of training (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that upon failure of one processing unit and allocating training to remaining processing units is necessarily the exclusion of the failed processing unit).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 14, Baird, Reisizadeh and Feng teach a method of claim 13 (above). Feng further teaches: 
…by the master node... (Baird, pg. 5, para. 0042: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network)
Baird does not explicitly disclose:
Excluding…the out-of-sync computer nodes from participating in the current iteration of training based on the out-of-sync indications such that their respective shared training parameters are prevented from being applied to the machine-learned model
However, Feng teaches:
Excluding…the out-of-sync computer nodes from participating in the current iteration of training based on the out-of-sync indications such that their respective shared training parameters are prevented from being applied to the machine-learned model (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the exclusion of a failed GPU and allocation of training data precludes integration of the failed GPU, including application of any training parameters to machine-learned model).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 15, Baird, Reisizadeh, and Feng teach a method of claim 13 (above). Baird further teaches: 
upon determining that the number of computer nodes in the population is below the predefined population threshold, waiting, by the master node, for the population to recover by pausing from performing operations for the current iteration of training for a specified time period (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network. Examiner additionally notes that the broadest reasonable interpretation of a “specified time period” includes that time period required to establish new connections).
after the specified time period, determining, by the master node, whether the population is recovered to include a number of computer nodes that is above the predefined population threshold (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network. Examiner additionally notes that in determining that the number of computer nodes is not below a predefined threshold, the corollary determination is also made that the number of computer nodes is above a predefined threshold. Examiner also notes that the broadest reasonable interpretation of “specified time period” includes that time period occurring after detection of a failure).  
upon determining that the population is recovered, continuing, by the master node, to perform operations for the current iteration of training. (Baird, pg. 5, para. 0042 and pg. 21, para. 0178: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” “In some implementations, compute device 700 can be configured to limit or bound the size of a pool of connections according to a lower limit threshold value and an upper limit threshold value. In such a case, compute device 700 can randomly select a member or compute device for synchronization from the compute devices having an open connection with compute device 700... Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network. Examiner additionally notes that the broadest reasonable interpretation of “perform operations” means any to execute process or step such as randomly selecting a member for synchronization).

Regarding Claim 16, Baird, Reisizadeh, and Feng teach a method of claim 15 (above). Feng further teaches: 
the population recovers from a fault condition of at least one of the plurality of computer nodes on the decentralized network, the fault condition comprising one of: network connectivity outage, power outage, or computer node crash (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the broadest reasonable interpretation of “recovered” to mean restored to a normal state of functioning and “computer node crash” to mean a computer failure due to hardware or software failure).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 17, Baird, Reisizadeh, and Feng teach a method of claim 16 (above). Feng further teaches: 
automatically performing, by each of the plurality of computer nodes, one or more corrective actions to recover from the fault condition (Feng. Pg. 7, para. 0077: “During the training execution at the operation node 104-1, the processing unit failure detector 714 may detect a failure of one or more processing units… the machine learning module 706 may reallocate the training data subset to the remaining two GPUs for resuming the machine learning process”; examiner notes that the broadest reasonable interpretation of “a corrective action” includes allocation of training data to other nodes).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Feng into Baird and Reisizadeh. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning; Feng teaches distributed deep machine learning. One of ordinary skill would have motivation to combine the teachings of Feng into Baird and Reisizadeh to reduce end-to-end latency and support synchronous learning (Feng, Pg. 1, para. 0004). 

Regarding Claim 18, Baird, Reisizadeh, and Feng teaches a method of claim 15 (above). Baird further teaches: 
waiting for the population to recover enables training of the machine-learned model to tolerate the fault condition  (Baird, pg. 21, para. 0178: “Likewise, when the connections pool of compute device 700 is below the lower limit threshold value and/or when a number of compute devices in the group of members and/ or compute devices reaches the lower limit threshold value, compute device executes one or more threads to establish new connections with other members or compute devices and adds these new connections to the pool of connections and/or the group of members or compute devices.” Examiner notes that the broadest reasonable interpretation of “tolerate” means to allow the existence of something without interference such as where the model continues to exist while waiting for new connections).   

Regarding Claim 19, Baird, Reisizadeh, and Feng teach a method of claim 15 (above). Baird further teaches: 
…by the master node…(Baird, pg. 5, para. 0042: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network)
Causing…the transaction to be written as a block on the distributed ledger (Baird, Pg. 4-5, para. 0034 and Pg. 20, para. 0172: “the distributed database instance 114 stores a record of a synchronization event, a record of prior synchronization events with other compute devices, an order of synchronization events, an order of transactions within events, parameters and/or values associated with identifying an order of synchronization events and/or transactions (e.g., used in calculating an order using a consensus method as described herein), a value for a parameter”)
Baird does not explicitly disclose: 
generating…merged training parameters based on the shared training parameters 
However, Reisizadeh teaches:
generating…merged training parameters based on the shared training parameters (Reisizadeh, pg. 3, “Algorithm Update”: “Once the local variables are exchanged between neighboring nodes, each node                                 
                                    i
                                     
                                
                            uses its local stochastic gradient                                 
                                    
                                        
                                            ∇
                                        
                                        ~
                                    
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                
                                                    i
                                                    ,
                                                    t
                                                
                                            
                                        
                                    
                                
                             its local decision variable                                 
                                    
                                        
                                            x
                                        
                                        
                                            i
                                            ,
                                            t
                                        
                                    
                                
                             and the information received from its neighbors                                 
                                    
                                        
                                            
                                                
                                                    z
                                                
                                                
                                                    j
                                                    ,
                                                    t
                                                
                                            
                                            =
                                            Q
                                             
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            i
                                                            ,
                                                            t
                                                        
                                                    
                                                
                                            
                                            ;
                                            j
                                            ∈
                                            
                                                
                                                    N
                                                
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                             to update its local decision variable.” Examiner notes that the broadest reasonable interpretation of training parameters means parameters related to training a network such as its decision variable).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  
Neither Baird nor Reisizadeh explicitly disclose:
However, Feng teaches:
upon continuing to perform operations for the current iteration of training, obtaining…shared training parameters from the computer nodes in the population for participating in the current iteration of training (Feng, Pg. 7, para. 0078: “A request from the machine learning module 706 may indicate a failure of a processing unit in the operation node 104-1, such that the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”) 
Generating…a transaction that includes an indication that the master node has generated the merged training parameters (Feng, Pg. 3, para. 0047: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets. Then, the coordination node 308 has to send the integrated state back to each operation node for next iteration of distributed training”. Examiner notes that the broadest reasonable interpretation of “transaction” includes an interaction between two parties or things that reciprocally affect each other such here where the exchange of information between the coordination node and each operating node during second and subsequent iterations indicates that the coordination node has integrated and updated parameters.)
Making…the merged training parameters available to each of the computer nodes in the population for participating in the current iteration of training (Feng, Pg. 7, para. 0078: “A request from the machine learning module 706 may indicate a failure of a processing unit in the operation node 104-1, such that the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”).
It would have obvious to one of ordinary skill in the art before the effective filing date of the
present application to combine the teachings of Reisizadeh into Bard. Baird teaches implementation of a distributed database system; Reisizadeh teaches decentralized and gradient-based optimization algorithm in the context of distributed learning. One of ordinary skill would have motivation to combine the teachings of Reisizadeh into Baird for improvements upon on the convergence time of synchronous decentralized optimization methods (Reisizadeh, Pg. 2, section 1).  

Regarding Claim 20 Baird, Reisizadeh, and Feng teach a method of claim 19 (above). Baird further teaches: 
…the master node…(Baird, pg. 5, para. 0042: “While shown in FIG. 1 as being within a single compute device, in some instances the processor configured to execute modules, functions and/or processes to update the distributed database can be within a compute device separate from its associated distributed database. In such an instance, for example, a processor can be operatively coupled to a distributed database instance via a network. For example, the processor can execute a consensus method to identify an order of events and/or transactions (e.g., as a result of synchronization with the other distributed database instances) and can send a signal including the order of events and/or transactions to the associated distributed database instance over the network.” Examiner notes the broadest reasonable interpretation of a master node is node in the network with additional capabilities vis a vis the rest of the distributed network)
Baird does not explicitly disclose:
Upon…continuing to perform operations for the current iteration of training, obtaining, by each of the plurality of computer nodes, merged training parameters from the master node 
applying, by each of the plurality of computer nodes, the merged training parameters to the local model 
However, Feng teaches:
Upon…continuing to perform operations for the current iteration of training, obtaining, by each of the plurality of computer nodes, merged training parameters from the master node (Feng, Pg. 7, para. 0077 and 0078: “The snapshot is a record of estimates of the machine learning parameters from last iteration round before the failure. As such, the GPUs can read the state of the snapshot and resume the machine learning process from that state.” “the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”; examiner notes that the broadest reasonable interpretation of “merged training parameters” includes the combination of multiple training parameters) 
applying, by each of the plurality of computer nodes, the merged training parameters to the local model (Feng, Pg. 3, para. 0047 and Pg. 7, para. 0078: “After each iteration during a distributed training of a model, each operation node sends the calculated state of the model to the coordination node 308 to integrate the states calculated by different nodes based on different training data subsets” “the model snapshot retriever 712 may be instructed to retrieve the snapshot of last iteration round from the model storage 110 for resuming the machine learning process, at the remaining processing units”. Examiner notes that Feng allows any remaining processing unit to retrieve a snapshot of last iteration of training which was sent to the coordination mode at the end of the last iteration and integrated and retrieved by each node by its model snapshot retriever)

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Chilimbi, et. al. (US 2015/0324690 A1), teaches training large neural network models by providing training input to model training machines organized as multiple replicas that asynchronously update a shared model via a global parameter server.
Reza, et. al. (US 2018/0217905 A1), teaches fault-tolerant methods, systems and architectures for data distribution. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SALLY T. NGUYEN whose telephone number is 571-272-3406. The examiner can normally be reached Monday - Friday 9:00am - 5:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amir Mehrmanesh can be reached on 571-270-3351. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/STN/Examiner, Art Unit 4163                                                                                                                                                                                                        
/VIKER A LAMARDO/Primary Examiner, Art Unit 2126