DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1-20 are pending in the application.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 2019-06-12, 2019-06-19, and 2022-01-16 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
Regarding method claim 13, the claim recites contingent clauses:  “wherein in response to the comparison is greater than the predetermined threshold…” and “wherein in response to the difference is not greater than the predetermined threshold…”.  Examiner points out that, regarding application of prior art, one would only have to find art that reads on one of the conditional branches.  See MPEP 2111.04 (II):  “The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met.”  Nevertheless, Examiner has applied art that applies to both conditions.

Remarks
Claims 2, 6, 9, 13, 16, and 19 recite checking if the comparison is greater than a predetermined threshold, and that the transaction is approved if the comparison is greater than a predetermined threshold, and rejected if the comparison is not greater than a predetermined threshold.  However, Instant Specification [0062] indicates checking if the comparison is less than the threshold (“< τ”).  Examiner suspects that it is possible that Applicant meant to recite the converse of this limitation.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Xie et al. (“Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance”; hereinafter “Xie”) in view of Zhu et al. (“Blockchain-Based Privacy Preserving Deep Learning”; hereinafter “Zhu”)
As per Claim 1, Xie teaches a system, comprising: 
a training participant client, comprising a training dataset and configured to:  (Xie, Page 1 below Figure 1, discloses:  “We focus on Stochastic Gradient Descent (SGD), and use the Parameter Server (PS) architecture (Li et al., 2014a;b) for distributed SGD. As illustrated in Figure 1, processes are composed of the server nodes and worker nodes. In each SGD iteration, the workers pull the latest model from the servers, estimate the gradients using the locally sampled training data, then push the gradient estimators to the servers. The servers aggregate the gradient estimators, and update the model by using the aggregated gradients.”  Here, the training participant client is one of the “workers” that “pull the latest model from the servers, estimate the gradients”, which comprises a training dataset (“using the locally sampled training data.”)).
generate a plurality of transaction proposals that each correspond to a training iteration for machine learning model training related to stochastic gradient descent, the machine learning model training comprising a plurality of training iterations, the transaction proposals comprising a gradient calculation performed by the training participant client, a batch from the private dataset, a loss function, and an original model parameter (Xie, Page 1 under Figure 1, discloses stochastic gradient descent (“We focus on Stochastic Gradient Descent (SGD), and use the Parameter Server (PS) architecture (Li et al., 2014a;b) for distributed SGD.”) Xie, Page 2 Top Right, discloses a plurality of training iterations:  “In each iteration, each worker will sample n independent and identically distributed (i.i.d.) data points from the distribution D, and compute the gradient of the local empirical loss.”  Xie, Page 3 Section 4 discloses transaction proposals:  “In contrast to the existing majority-based methods, we compute a score for each candidate gradient estimator by using the stochastic zero-order oracle. We rank each candidate gradient estimator based on the estimated descent of the loss function, and the magnitudes. Then, the algorithm aggregates the candidates with highest scores. The score roughly indicates how trustworthy each candidate is.”, wherein the transaction proposals are the gradient calculations from each candidate.  Xie, Page 3 Section 4, discloses:  
    PNG
    media_image1.png
    142
    318
    media_image1.png
    Greyscale

Here, Xie discloses that the transaction proposal includes a gradient calculation (“gradient estimator u”), a batch from the private dataset (“i.i.d. samples drawn from D”), a loss function (“estimated descent of the loss function” (fr(x) – fr(x - γu)),  and original model parameter (“current parameter x”)).
one or more endorser nodes or peers, each comprising a verify gradient [smart contract] configured to: (Xie, Page 1 below Figure 1, discloses one or more endorser nodes or peers:  “As illustrated in Figure 1, processes are composed of the server nodes and worker nodes.”  Xie Page 3 Section 4 discloses a verify gradient that measures trustworthiness:  “We rank each candidate gradient estimator based on the estimated descent of the loss function, and the magnitudes. Then, the algorithm aggregates the candidates with highest scores. The score roughly indicates how trustworthy each candidate is.”)
receive the plurality of transaction proposals (Xie Page 3 Section 4 discloses receiving each gradient update: “We rank each candidate gradient estimator”)
and evaluate each transaction proposal (Xie Page 3 Section 4 discloses “The score roughly indicates how trustworthy each candidate is”)
However, Xie does not teach a blockchain network comprising a smart contract.
First, Zhu, like Xie, teaches generate a plurality of transaction proposals that each correspond to a training iteration for machine learning model training related to stochastic gradient descent, the machine learning model training comprising a plurality of training iterations, the transaction proposals comprising a gradient calculation performed by the training participant client (Zhu, Page 376 Section 3.3, discloses:  “First, the source participant (R(s)) ‘advertises’ the new candidate model (MsC ) by announcing the DLM updates to the entire network.”  This includes a gradient calculation, as shown in Zhu Page 376 Algorithm 2: “3. Run stochastic gradient descent (SGD) on the local dataset and update the local parameters w(i). 4: Compute gradient vector w(i) which is the vector of changes in all local parameters due to SGD.”)
Furthermore, Zhu teaches and a blockchain network, comprising: (Zhu, Page 376 Algorithm 2 Step 5, discloses:  “Store w(i) to the blockchain”)
one or more endorser nodes or peers, each comprising a verify gradient smart contract configured to: (Zhu, Page 375 Section 3.2, discloses endorser nodes or peers:  “To allow knowledge sharing – one of the key ingredients of the framework – this new MC is then evaluated by the participant peers operating at other sites, i.e., different hubs within a network”.  Zhu discloses a verify gradient smart contract at the top of Page 374:  “The data producers collect the massive data through the smart contract to store in the blockchain, for the use of data sharing. The smart contract code runs on the contract layer of blockchain, which provides the authority to control the system” and the bottom of Page 377: “Finally, note that the whole consensus reaching process could have been implemented directly on the blockchain via ‘smart contracts’ in order to prevent intruders from ‘attacking’ the network; yet, we assume here that the network access is protected.”  
receive the plurality of transaction proposals; (Zhu, Page 376 Top: “First, the source participant (R(s)) ‘advertises’ the new candidate model (MsC) by announcing the DLM updates to the entire network. Then, the destination participants (R(d,i)), where i = 1· · ·N denotes the target hubs, are notified by their local hub that there is an update available in the network. This can be achieved by the subscription pipeline they have with their local hub. In case the updates are available, the participants can retrieve and apply them to their working directory.”)
and evaluate each transaction proposal. (Zhu, Page 376 Top, discloses:  “Once Ms(C) is adopted from the source hub, R(d,i) starts its evaluation, quantified in terms of the scores”)
Zhu and Xie are analogous art because they are both in the field of endeavor of distributed machine learning.
It would have been obvious before the effective filing date of the claimed invention to combine the blockchain of Zhu with the distributed SGD of Xie.  One of ordinary skill in the art would be motivated to do so in order to have a secure and trustworthy record of data provenance (Zhu, Page 373 Last Paragraph: “The properties of blockchain make it a promising tool in many privacy informatics applications [33]: from building decentralized backbones for data exchange and interoperability, protocols enforced by immutable ledgers that keep track of data usage [35], and data provenance [36,37], to maintain user’s privacy and security through the persistence of consent statements in blockchain [38]” as well as Zhu Page 377:  “This transaction on the blockchain is necessary to allow participants within the network to prove/validate how models were created, who participated in their consensus process, and when did those transaction take place.”).  Examiner notes that this is the same motivation recited by Applicant in related paper Sarpatwar et al. (“Towards Enabling Trusted Artificial Intelligence via Blockchain”) Page 138:  “This chapter will describe how blockchain technology can be used to tackle various aspects of trust in the AI training process, including auditable provenance of data and models, data privacy and fairness. Blockchain data can include: the history of model creation, a hash of the data used to train the models, origin of the model and, potentially, the contribution of various participants in the creation of the model.”

As per Claim 8, this is a method claim corresponding to system Claim 1, and is rejected for the same reasons.

As per Claim 15, this is a non-transitory computer readable medium claim corresponding to system Claim 1.  The difference is that it recites a non-transitory computer readable medium and a processor.  Zhu, Top of Page 374, discloses:  “The data producers collect the massive data through the smart contract to store in the blockchain, for the use of data sharing. The smart contract code runs on the contract layer of blockchain, which provides the authority to control the system.”  Here, Zhu discloses “code” running on a “system”, which requires a non-transitory computer readable medium and a processor to run.  Claim 15 is rejected for the same reasons as Claim 1.

Claim(s) 2-7, 9-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xie in view of Zhu, further in view of Anderson et al. (US 2019/0050727 A1; hereinafter “Anderson”)
As per Claim 2, the combination of Xie and Zhu teaches the system of claim 1, wherein the verify gradient smart contract evaluates the transaction proposals comprises the verify gradient smart contract is configured to: 
For the following, see Xie Page 3 Section 4: “ In contrast to the existing majority-based methods, we compute a score for each candidate gradient estimator by using the stochastic zero-order oracle. We rank each candidate gradient estimator based on the estimated descent of the loss function, and the magnitudes. Then, the algorithm aggregates the candidates with highest scores. The score roughly indicates how trustworthy each candidate is.”

    PNG
    media_image2.png
    153
    321
    media_image2.png
    Greyscale

evaluate the loss function at two different points (Xie above:  
    PNG
    media_image3.png
    17
    108
    media_image3.png
    Greyscale
.  Note that 
    PNG
    media_image4.png
    21
    150
    media_image4.png
    Greyscale
, and thus fr(x) is a cumulative loss function, a sum of the basic loss functions, wherein each loss function is simply divided by the batch size.)
calculate a difference between the loss functions at the two different points (Xie above:  
    PNG
    media_image3.png
    17
    108
    media_image3.png
    Greyscale
)
compare the difference to a product of a size of the batch, a random scalar, and a dot product between the gradient calculation [and a random direction] (Xie above, compares the difference to a product, by subtracting the product 
    PNG
    media_image5.png
    18
    32
    media_image5.png
    Greyscale
 from the difference.  This product is a product of a random scalar ρ and a dot product between the gradient calculation with itself, as ||u||2 is equivalent to u ∙ u.  Also note Xie defines ρ as a “constant weight”, and Xie Page 5 Section 6 states that “Zeno is robust to the choices of…the weight ρ”.  Thus, there is no stated single prescribed value of ρ, and thus it can be interpreted as being “random”.  Examiner notes that the Instant Specification does not specify any particular randomization scheme or sampling from any particular distribution, other than in [0063] merely stating that the random step size is “chosen”, much like Xie discloses the “choices of…the weight ρ”.
Regarding multiplying by a size of the batch, Examiner notes that this is merely a matter of scale between Xie and the instant claims.  Xie, as shown above, recites dividing each loss function by the batch size.  One of ordinary skill in the art will appreciate that if Xie were to not do this, Xie could instead compare the non-divided loss functions to the batch size nr * 
    PNG
    media_image5.png
    18
    32
    media_image5.png
    Greyscale
.  In other words, in terms of Xie, evaluating:
Score = 
    PNG
    media_image6.png
    20
    91
    media_image6.png
    Greyscale
 - 
    PNG
    media_image7.png
    19
    62
    media_image7.png
    Greyscale
x – γu; zi) - 
    PNG
    media_image5.png
    18
    32
    media_image5.png
    Greyscale

Is similar to evaluating
nr * Score = 
    PNG
    media_image8.png
    17
    76
    media_image8.png
    Greyscale
 - 
    PNG
    media_image9.png
    16
    45
    media_image9.png
    Greyscale
 x – γu; zi) – nr * 
    PNG
    media_image5.png
    18
    32
    media_image5.png
    Greyscale

Thus, Xie suggests compare the difference to a product of a size of the batch, a random scalar, and a dot product between the gradient calculation.)
and determine if the comparison is greater than a predetermined threshold. (Xie, Page 3 Bottom, discloses comparing the scores against one another:  

    PNG
    media_image10.png
    17
    243
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    65
    317
    media_image11.png
    Greyscale

And as shown above, determining if the comparison is greater than a predetermined threshold, being the “(m – b) highest scores”.)
	However, the combination of Xie and Zhu does not explicitly teach and a random direction.
	Anderson teaches dot product between the gradient calculation and a random direction. (Anderson, Para [0025], discloses:  “To address the numerous problems noted above with respect to ANN training techniques, optimization may be accomplished without backpropagation and without excessive parameter storage or communication by modifying the stochastic gradient descent (SGD) technique by randomly generating a unit parameter vector and adjusting the magnitude of that parameter vector by its degree of coincidence to an estimated gradient.”  Here, Anderson discloses a random direction (“random unit vector”). Anderson also suggests a dot product between the gradient calculation and the random direction, as they recite “adjusting the magnitude of that parameter vector by its degree of coincidence to an estimated gradient”.  One of ordinary skill in the art will appreciate that “degree of coincidence to an estimated gradient” describes the result of a dot product, which is defined as a projection of one vector in the direction of another.  Thus Anderson suggests replacing Xie’s ||u||2  with r ∙ u, if r is Anderson’s random unit vector and u is Xie’s gradient, and thus the degree of coincidence of the random unit vector to the estimated gradient.)
	Anderson and the combination of Xie and Zhu are analogous art because they are both in the field of endeavor of training machine learning models.
	It would have been obvious before the effective filing date of the claimed invention to combine the blockchain federated stochastic gradient descent training of Xie and Zhu with the random direction perturbation of Anderson.  One of ordinary skill in the art would be motivated to do so in order to achieve more efficient training and save time and resources compared to ordinary SGD (Anderson [0025]:  “To address the numerous problems noted above with respect to ANN training techniques, optimization may be accomplished without backpropagation and without excessive parameter storage or communication”).

As per Claim 3, the combination of Xie, Zhu, and Anderson teaches the system of claim 2. Xie teaches wherein the predetermined threshold is application dependent.  (Xie, Page 5 Section 6, discloses:  “Zeno is robust to the choices of…the number of trimmed elements b”.  Xie Page 4 Figure 3 discloses:  “Taking b = 3”.  Xie Page 5 Definition 5 discloses:  “Note that Krum requires 2b + 2 < m. Thus, b = 8 is the best we can take.”  Xie Page 5 Section 6.2 discloses:  “For both Krum and Zeno, we take b = 4”.  Thus, it is clear form Xie that the threshold varies by application.)

As per Claim 4, the combination of Xie, Zhu, and Anderson teaches the system of claim 2.  Xie teaches wherein the verify gradient smart contract evaluates the loss function at two different points comprises the verify gradient smart contract is further configured to: 
generate the random scalar [and the random direction];  (Xie discloses two random scalars in Page 3 Definition 2:  “learning rate γ, and a constant weight ρ > 0”.  Examiner notes that Xie does not state that these must be different, and one of ordinary skill in the art will appreciate that two arbitrarily, or randomly, chosen weights, may be the same)
and obtain a new model parameter by perturbing an original model parameter [in the random direction] with a step size equal to a value of the random scalar. (Xie, Page 3 Section 4, discloses:  “
    PNG
    media_image12.png
    17
    61
    media_image12.png
    Greyscale
”, wherein the original model parameter x is perturbed with a step size equal to random scalar γ.)
However, the combination of Xie and Zhu does not teach generate the random direction; and obtain a new model parameter by perturbing an original model parameter in the random direction
Anderson teaches generate the random direction (Anderson, Para [0067], discloses: 

    PNG
    media_image13.png
    100
    399
    media_image13.png
    Greyscale
)
and obtain a new model parameter by perturbing an original model parameter in the random direction (Anderson, Para [0044], discloses: 

    PNG
    media_image14.png
    181
    401
    media_image14.png
    Greyscale

Here, Anderson discloses obtain a new model parameter (“parameter update”) by perturbing an original model parameter in the random direction (“parameters are moved in the direction of u”)).
Thus, combining the teachings of Xie and Anderson suggests obtain a new model parameter by perturbing an original model parameter in the random direction with a step size equal to a value of the random scalar, as both Xie and Anderson teach perturbing the original parameters by a scalar (γ for Xie, ε for Anderson), and Anderson discloses that the perturbation is in the direction of a random unit vector.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Anderson with Xie and Zhu for at least the reasons recited in Claim 1.

As per Claim 5, the combination of Xie, Zhu, and Anderson teaches the system of claim 4.  Xie teaches wherein verify gradient smart contract calculates the difference between the loss functions at the two different points comprises the verify gradient smart contract is further configured to: 
evaluate a difference in the loss function of all the samples in the batch between the new model parameter and the original model parameter (Xie, Page 3 Section 4, discloses:  
    PNG
    media_image15.png
    239
    501
    media_image15.png
    Greyscale

Here, Xie discloses a summation from 1 to nr, which is the batch size, and evaluated a difference in the loss function.)

As per Claim 6, the combination of Xie, Zhu, and Anderson teaches the system of claim 5 as well as smart contract (see Rejection to Claim 1).  Xie teaches wherein in response to the comparison is greater than the predetermined threshold, the verify gradient smart contract approves the corresponding transaction proposal, wherein in response to the difference is not greater than the predetermined threshold, the verify gradient smart contract rejects the corresponding transaction proposal. (Recall above that Zhu teaches a smart contract.  Xie, Page 3 Definition 3, discloses:

    PNG
    media_image16.png
    66
    316
    media_image16.png
    Greyscale

Here, Xie discloses the “(m-b) highest scores”, thus being greater the predetermined threshold, are approved, as they are “aggregated” by taking the average.  Conversely, the lower scores than that are not incorporated into the results, and thus are rejected.)

	As per Claim 7, the combination of Xie, Zhu, and Anderson teaches the system of claim 6.  Zhu teaches wherein the verify gradient smart contract is further configured to: 
provide endorsements for approved transaction proposals to the training participation client (Zhu, Page 376 Section II, discloses:  “In this way, the new baseline model for future inferences is endorsed by the network”, and in Bottom of Page 377:  “Finally, note that the whole consensus reaching process could have been implemented directly on the blockchain via ‘smart contracts’”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Zhu with Xie for at least the reasons recited in Claim 1.

As per Claims 9-14, these are method claims corresponding to system Claims 2-7 respectively, and are rejected for the same reasons.

As per Claim 15, this is a non-transitory computer readable medium claim corresponding to system Claim 1.  The difference is that it recites a non-transitory computer readable medium and a processor.  Zhu, Top of Page 374, discloses:  “The data producers collect the massive data through the smart contract to store in the blockchain, for the use of data sharing. The smart contract code runs on the contract layer of blockchain, which provides the authority to control the system.”  Here, Zhu discloses “code” running on a “system”, which requires a non-transitory computer readable medium and a processor to run.  Claim 15 is rejected for the same reasons as Claim 1.
	
	As per Claim 16, Claim 16 is a non-transitory computer readable medium claim corresponding to system Claim 3, and is rejected for the same reasons.

	As per Claims 17-20, these are non-transitory computer readable medium claims corresponding to system Claims 4-7 respectively, and are rejected for the same reasons.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  The following all make use of blockchain to perform distributed machine learning:
Li et al. (US 2020/0076884 A1)
Singla et al. (US 20210272017 A1)
Wang et al. (US 20190279107 A1)
Anglin et al. (US 20200218940 A1)
Zoldi et al. (US 20200082302 A1)
Gidney (US 20200143267 A1)
Beser et al. (US 20190012595 A1)
Manamohan et al. (US 20200272945 A1)
Lu et al. (WO 2020210979 A1)
Kim et al. ("Blockchained On-Device Federated Learning”)
Chen et al. ("When Machine Learning Meets Blockchain: A Decentralized, Privacy-preserving and Secure Design”)
Yang et al. (“ACM Transactions on Intelligent Systems and Technology”)
Baldominos et al. (“Coin.AI: A Proof-of-Useful-Work Scheme for Blockchain-based Distributed Deep Learning”)
Also, the following introduces the idea of using a random unit vector in optimization methods:  Leventhal et al. (“Randomized Hessian estimation and directional search”)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/L.A.S./Examiner, Art Unit 2126                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145