DETAILED ACTION
1.	This action is in response to the claims filed 12/04/2020 for application 15639557 filed 06/30/2017. Currently claims 1-20 are pending and claims 1, 8, and 12 have been amended.  
Amendments to the drawings are acknowledged. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatenable over Shokri, Reza, et al. "Privacy-preserving deep learning." Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 2015(“Reza”) and in view of McMahan et al. (US 2017/0109322 Al, “McMahan”). 
Regarding claim 1 Reza teach the system comprising: at least one processor; and one or more computer-readable storage media including instructions stored thereon that, responsive to execution by the at least one processor, cause the system perform operations including (Reza, pg. 1315, sec. 6.4 Experimental setup, “We implemented two criteria for selecting which gradients to upload to the parameter server.” The description of the results of the experiment inherently includes a processor and memory as recited in the claim ) calculating a gradient value based on a data set applied to a data model the gradient value including a weight value calculated for the data model (Reza, pgs., 1313-1314 sec. 5.2 local training, fig. 3(lines 2 and 3), “We assume that each participant i maintains a local vector of neural network parameters,             
                
                    
                        w
                    
                    
                        (
                        i
                        )
                    
                
            
        …Each participant then trains the neural network using the standard SGD algorithm, iterating over his local training data over many epochs… In the third step, the participant computes [gradient vector]             
                ∆
                
                    
                        w
                    
                    
                        (
                        i
                        )
                    
                
            
        , the vector of changes in all [local]parameters.”) communicating the gradient value to an external service (Reza, pg., 1313 sec. 5.2 local training, see fig. 3 detailing in line 4 the following: “Upload             
                
                    
                        ∆
                        
                            
                                w
                            
                            
                                (
                                i
                                )
                            
                        
                    
                    
                        S
                    
                
            
         to the parameter server, S is the set of indices of…gradients that are selected…”); and obtaining, based on ascertaining that a termination criterion occurs, a predictive model that represents a trained version of the data model (Reza, pg., 1313 sec. 5.1 overview, “Once the network is trained, each participant can independently and privately evaluate it on new data, without interacting with other participants.” ). 
Reza does not teach: receiving an average gradient value from the external service, the average gradient value representing an average of the communicated gradient value and one or more additional gradient values, each additional gradient value communicated to the external service from an independent source system configured to aggregate and maintain its own data set independently of all other data sets; applying the average gradient value to the data model. 
However, McMahan teaches: receiving an average gradient value from the external service, the average gradient value representing an average of the communicated gradient value and one or more additional gradient values(McMahan, paras. 0017, fig.1, fig.2, fig.3, “In implementations wherein the global gradient and the local gradients are not equal, each remote computing device can be configured to provide the determined gradient to the central computing device (e.g. server device, data center, etc.). The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined             
                ∇
                f
                (
                
                    
                        w
                        )
                    
                    ~
                
                =
                
                    
                        1
                    
                    
                        n
                    
                
                
                    
                        ∑
                        
                            i
                            =
                            1
                        
                        
                            n
                        
                    
                    
                        ∇
                        
                            
                                f
                            
                            
                                i
                            
                        
                        (
                        
                            
                                w
                                )
                            
                            ~
                        
                    
                
            
        .”), each additional gradient value communicated to the external service from an independent source system configured to aggregate and maintain its own data set independently of all other data sets(McMahan, para. 0014, fig.1, fig.2, fig.3,  “Each computing device can then provide the local update(s) to the central computing device. For instance, the local update can be a gradient vector. In some  the training data used to determine the local update, thereby reducing bandwidth requirements and maintaining user privacy…By only providing the local update (and not the training data) to the server, the global model update can be determined using reduced bandwidth requirements, and without compromising the security of potentially privacy sensitive data stored on the user devices.”); applying the average gradient value to the data model(McMahan, para. 0017, fig.1, fig.2, fig.3, “The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined             
                ∇
                f
                (
                
                    
                        w
                        )
                    
                    ~
                
                =
                
                    
                        1
                    
                    
                        n
                    
                
                
                    
                        ∑
                        
                            i
                            =
                            1
                        
                        
                            n
                        
                    
                    
                        ∇
                        
                            
                                f
                            
                            
                                i
                            
                        
                        (
                        
                            
                                w
                                )
                            
                            ~
                        
                    
                
            
        . Each remote computing device can then determine a local update based at least in part on the global gradient.”). 
Accordingly it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Reza’s method in view of McMahan to teach: receiving an average gradient value from the external service, the average gradient value representing an average of the communicated gradient value and one or more additional gradient values, each additional gradient value communicated to the external service from an independent source system configured to aggregate and maintain its own data set independently of all other data sets; applying the average gradient value to the data model. The motivation to do so would be to solve optimization problems in which the data used to solve the optimization problems reside on many different devices and each dataset held by each device forms a sample of the total 
Regarding claim 2, Reza as modified in view of McMahan teaches the system of claim 1, wherein said calculating comprises using a backpropagation procedure to the data model and using the data set (Reza, pg., 1312, sec. 3 stochastic gradient descent,  “Let w be the flattened vector of all parameters in a neural network, composed of             
                
                    
                        W
                    
                    
                        k
                    
                
                ,
                 
                ∀
                k
            
        . Let E be the error function, i.e., the difference between the true value of the objective function and the computed output of the network. E can be based on             
                
                    
                        L
                    
                    
                        2
                    
                
            
        norm or cross entropy…The back-propagation algorithm computes the partial derivative of E with respect to each parameter in w [i.e., the gradient]…”).
Regarding claim 3, Reza as modified in view of McMahan teaches the system of claim 1, wherein said calculating comprises: dividing the data set into a set of mini-batches(Reza, pg., 1313 sec. 5.2 local training,  fig. 3(lines 2), “Figure 3 presents the pseudocode of the distributed selective SGD (DSSGD) algorithm. DSSGD is run independently by every participant and consists of five steps in each learning epoch…[In the second step] [h]e then runs one epoch of SGD training on his local dataset. This training can be done on a sequence of mini-batches… [where] a mini-batch is the set of randomly chosen training data points of size  M.”); and calculating the gradient value using a particular mini-batch of the set of mini- batches (Reza, pg., 1314 sec. 5.2 local training fig. 3(lines 3), “In the third step, the participant computes [gradient vector]             
                ∆
                
                    
                        w
                    
                    
                        (
                        i
                        )
                    
                
            
        , the vector of changes in all [local]parameters.”). 
Regarding claim 4, Reza as modified in view of McMahan teaches the system of claim 1, wherein said calculating comprises: dividing the data set into a set of mini-batches(Reza, pg. 1313 sec. 5.2 local training,  fig. 3 lines 2, “Figure 3 presents the pseudocode of the distributed selective SGD (DSSGD) algorithm. DSSGD is run independently by every participant and consists of five steps in each learning epoch…[In the second step] [h]e then runs one epoch of SGD training on his local dataset. This training can be done on a sequence of mini-batches… [where] a mini-batch is the set of randomly chosen training data points of size  M.”) ; and calculating the gradient value using a particular mini-batch of the set of mini- batches(Reza, pg., 1314 sec. 5.2 local training fig. 3(lines 3), “In the third step, the participant computes [gradient vector]             
                ∆
                
                    
                        w
                    
                    
                        (
                        i
                        )
                    
                
            
        , the vector of changes in all [local]parameters.”),wherein the termination criterion comprises determining that each mini-batch of the set of mini-batches is evaluated to generate a respective gradient value(Reza, pg., 1314, sec. 5.2 local training fig. 3(lines 3), “We refer to             
                
                    
                        w
                    
                    
                        j
                    
                    
                        (
                        i
                        )
                    
                
            
         as the gradient of parameter j over one epoch of local SGD.” & footnote one: “Usually gradient refers to the change in a parameter after a single mini-batch training, but here we generalize it to one epoch of training over several mini-batches.).
Regarding claim 5, Reza as modified in view of McMahan teaches the method of claim 1, wherein said applying comprises applying the average gradient value to update the weight value of the data model (McMahan, para. 0021, “In particular, the local updates can be gradient vectors. The central computing device can then aggregate the local updates to determine a global update to the model. For instance, the aggregation can be an averaging aggregation defined as:             
                
                    
                        w
                    
                    ~
                
                =
                
                    
                        w
                    
                    ~
                
                +
                
                    
                        1
                    
                    
                        k
                    
                
                
                    
                        ∑
                        
                            k
                            =
                            1
                        
                        
                            k
                        
                    
                    
                        (
                        
                            
                                w
                            
                            
                                k
                            
                        
                        -
                        
                            
                                w
                            
                            ~
                        
                    
                
            
        ) This can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges).”). 
Regarding claim 6, Reza as modified in view of McMahan teaches the system of claim 1, wherein the predictive model comprises a neural network trained using the average gradient value (McMahan, paras. 0015-0017, “More particularly... a set of input-output data can be used to describe a global objective via a loss function. Such functions can be, for instance, a convex or non-convex function, such as a...neural network function and/or various other suitable functions. A local objective (            
                
                    
                        F
                    
                    
                        k
                    
                
            
        ) can also be defined using data stored on a computing device. For instance, the global objective can be defined as:             
                f
                
                    
                        w
                    
                
                =
                
                    
                        ∑
                        
                            k
                            =
                            1
                        
                        
                            K
                        
                    
                    
                        
                            
                                
                                    
                                        n
                                    
                                    
                                        k
                                    
                                
                            
                            
                                n
                            
                        
                    
                
                
                    
                        F
                    
                    
                        k
                    
                
                
                    
                        w
                    
                
                =
                
                    
                        ∑
                        
                            k
                            =
                            1
                        
                        
                            K
                        
                    
                    
                        
                            
                                
                                    
                                        n
                                    
                                    
                                        k
                                    
                                
                            
                            
                                n
                            
                        
                    
                
                 
                ̇
                
                    
                        1
                    
                    
                        
                            
                                n
                            
                            
                                k
                            
                        
                    
                
                
                    
                        ∑
                        
                            i
                            ∈
                            
                                
                                    P
                                
                                
                                    k
                                
                            
                        
                    
                    
                        
                            
                                f
                            
                            
                                i
                            
                        
                        (
                        w
                        )
                    
                
            
         wherein K describes the number of computing devices, n describes a total number of data examples,             
                
                    
                        n
                    
                    
                        k
                    
                
            
         describes the number of data examples stored on computing device k, and             
                
                    
                        P
                    
                    
                        k
                    
                
            
         describes a partition of data example indices { 1, ... , n} stored on the computing device k… The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined             
                ∇
                f
                (
                
                    
                        w
                        )
                    
                    ~
                
                =
                
                    
                        1
                    
                    
                        n
                    
                
                
                    
                        ∑
                        
                            i
                            =
                            1
                        
                        
                            n
                        
                    
                    
                        ∇
                        
                            
                                f
                            
                            
                                i
                            
                        
                        (
                        
                            
                                w
                                )
                            
                            ~
                        
                    
                
            
        .”).  
Regarding claim 7, Reza as modified in view of McMahan teaches the system of claim 1, further comprising: applying a set of input data to the predictive model (Reza, pg., 1313 sec. 5.1 overview, “Once the network is trained, each participant can independently and privately evaluate it on new data.”); ascertaining an output of the predictive model; and performing an action based on the output of the predictive model (Reza, pg. 1312, sec. 3,  second full paragraph, “Let E be the error function, i.e., the difference between the true value of the objective function and the computed output of the network. E can be based on L 2 norm or cross entropy” where computed output of the network corresponds to ascertaining an output and 
Claims 8 and 11 are rejected under 35 U.S.C. 103 as being unpatenable over of McMahan et al. (US 2017/0109322 Al, “McMahan”) in view of Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016(“Abadi”). 
Regarding claim 8, McMahan teaches the computer implemented method comprising:
receiving multiple gradient values from multiple different source system, each different source system configured to aggregate and maintain its own data set independently of all other data sets, and to generate a gradient value based on its own data set (McMahan, para. 0014, fig.1, fig.2, fig.3,  “Each computing device can then provide the local update(s) to the central computing device. For instance, the local update can be a gradient vector. In some implementations, the local update may be determined using one or more gradient descent techniques. For instance, the local update may be determined using batch gradient descent techniques, stochastic gradient descent techniques, or other gradient descent techniques. The local update does not include the training data used to determine the local update. In this manner, the size of the local update can be independent of the training data used to determine the local update, thereby reducing bandwidth requirements and maintaining user privacy…By only providing the local update (and not the training data) to the server, the global model update can be determined using reduced bandwidth requirements, and without compromising the security of potentially privacy sensitive data stored on the user devices.”); generating an average gradient value from the multiple gradient values (McMahan, para. 0017, fig.1, fig.2, fig.3, “The central device can then be configured to determine a gradient of the global objective based at least in             
                ∇
                f
                (
                
                    
                        w
                        )
                    
                    ~
                
                =
                
                    
                        1
                    
                    
                        n
                    
                
                
                    
                        ∑
                        
                            i
                            =
                            1
                        
                        
                            n
                        
                    
                    
                        ∇
                        
                            
                                f
                            
                            
                                i
                            
                        
                        (
                        
                            
                                w
                                )
                            
                            ~
                        
                    
                
            
        .”); to each of the multiple different source systems(McMahan, para. 0018, “Each remote computing device can then determine a local update based at least in part on the global gradient.”); at each of the multiple different source systems, iteratively performing training(McMahan, paras. 0018-0020, “For instance, the local update can be determined using one or more gradient descent techniques ( e.g. stochastic gradient descent). In this manner, each remote computing device can perform one or more stochastic updates or iterations to determine the local update. More particularly, each remote computing device can initialize one or more parameters associated with the local objective. Each remote computing device can then, for instance, uniformly, randomly sample Pk for one or more stochastic iterations. In this manner, the local update can be determined based at least in part on the sampled data. In particular, the local update can be defined as: for t=l to m do Sample             
                i
                ∈
                
                    
                        P
                    
                    
                        k
                    
                
            
         uniformly at random              
                
                    
                        w
                    
                    
                        k
                    
                
                =
                
                    
                        w
                    
                    
                        k
                    
                
                -
                h
                (
                ∇
                
                    
                        f
                    
                    
                        i
                    
                
                
                    
                        
                            
                                w
                            
                            
                                k
                            
                        
                    
                
                -
                ∇
                
                    
                        f
                    
                    
                        i
                    
                
                
                    
                        
                            
                                
                                    
                                        w
                                    
                                    ~
                                
                            
                            
                                s
                            
                        
                    
                
                +
                ∇
                f
                
                    
                        
                            
                                
                                    
                                        w
                                    
                                    ~
                                
                            
                            
                                s
                            
                        
                    
                
                )
            
         wherein m is a number of stochastic steps per iteration, and h is the stepsize.”).
McMahan does not teach: adding a noise term to the average gradient value to generate a noisy gradient average; communicating the noisy gradient average; obtaining a predictive model trained using the noisy gradient average; using the noisy gradient average on an initial model to generate an updated model based on data previously unanalyzed by the initial model; and based on ascertaining that a termination criterion occurs.
However, Abadi teaches adding a noise term to the average gradient value to generate a noisy gradient average(Abadi, pg.3, see Algorithm 1 in which the add noise portion of Algorithm 1 adds noise to the average gradient ); communicating the noisy gradient average(Abadi, pg.3, see Algorithm 1, in which the output portion of Algorithm 1 returns             
                
                    
                        θ
                    
                    
                        T
                    
                
            
          which represents the trained weights for a neural network model  as computed by the noisy gradient average); obtaining a predictive model trained using the noisy gradient average(Abadi, pg.3, see Algorithm 1, in which the output portion of Algorithm 1 returns             
                
                    
                        θ
                    
                    
                        T
                    
                
            
          which represents the trained weights for a neural network model  as computed by the noisy gradient average); using the noisy gradient average on an initial model to generate an updated model based on data previously unanalyzed by the initial model(Abadi, pg. 3, see algorithm 1’s heading Initialize in which the an initial model             
                
                    
                        θ
                    
                    
                        0
                    
                
            
         are randomly initialized, then in the section Add noise a noisy gradient average             
                
                    
                        
                            
                                g
                            
                            
                                t
                            
                        
                    
                    ~
                
            
         is computed  and in the Descent section an updated model             
                
                    
                        θ
                    
                    
                        t
                        +
                        1
                    
                
            
        is generated based on data previous unanalyzed by the initial model); and based on ascertaining that a termination criterion occurs(Abadi, pg. 3, see algorithm 1’s first for loop which continues to update the model so long as t            
                ∈
                [
                T
                ]
            
         ).
Accordingly it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify McMahan’s method in view of Abadi to teach: adding a noise term to the average gradient value to generate a noisy gradient average; communicating the noisy gradient average; obtaining a predictive model trained using the noisy gradient average; using the noisy gradient average on an initial model to generate an updated model based on data previously unanalyzed by the initial model; and based on ascertaining that a termination criterion occurs.  The motivation to do so would be implement a differentially private stochastic gradient descent algorithm in which the overall privacy cost is a factor in training the network(Abadi pg. 5, fig. 1(accountant, PrivacyAccountant),  “The main component in our implementation is PrivacyAccountant which keeps track of privacy spending over the course of training…[a]t any point during training, one can query the privacy loss…[and in] 
 Regarding claim 11, McMahan as modified in view of Abadi teaches the method of claim 8, wherein the predictive model comprises a neural network trained using the noisy gradient average (Abadi, pg.3, see Algorithm 1, in which the output portion of Algorithm 1 returns             
                
                    
                        θ
                    
                    
                        T
                    
                
            
        ,  which represents the trained weights for a neural network model  as computed by the noisy gradient average).
10.	Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (US 2017/0109322 Al, “McMahan”) in view of Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016(“Abadi”) and in view of Dwork, Cynthia, et al. "Calibrating noise to sensitivity in private data analysis." Theory of cryptography conference. Springer, Berlin, Heidelberg, 2006 (“Cynthia”).
	Regarding claim 9, McMahan as modified in view of Abadi of claim 8, wherein said adding the noise term comprises: to the average gradient value to generate the noisy gradient average (Abadi, pg.3, see Algorithm 1 in which the add noise potion of Algorithm 1 adds noise to the average gradient ). 
 McMahan as modified in view of Abadi does not teach adding a Laplace-distributed random number. 
	However, Cynthia teaches adding a Laplace-distributed random number (Cynthia, pg. 281, Example 1, detailing the addition of a Laplace distributed random number Y to the function                        
                             
                            f
                            
                                
                                    x
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    ). 
	Accordingly, it would have been obvious to one of ordinary skill in the art before the
S(f)                        
                            /
                            ϵ
                        
                    ) ensures                         
                            ε
                        
                    -indistinguishability when the query function f [i.e., certain adversary that queries a data base to find information] has sensitivity S(f).”).
11.	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (US 2017/0109322 Al, “McMahan”) in view of Abadi, Martin, et al. "Deep learning with differential privacy." Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016(“Abadi”) and in view of Barni, Mauro, et al. "Privacy-preserving ECG classification with branching programs and neural networks." IEEE Transactions on Information Forensics and Security 6.2 (2011)(“Mauro”).
Regarding claim 10, McMahan as modified in view of Abadi teaches the method of claim 8, wherein said adding the noise term comprises using the average gradient value (McMahan, para. 0017, fig.1, fig.2, fig.3, “The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined                         
                            ∇
                            f
                            (
                            
                                
                                    w
                                    )
                                
                                ~
                            
                            =
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    ∇
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            w
                                            )
                                        
                                        ~
                                    
                                
                            
                        
                    .”).  
McMahan as modified in view of Abadi does not teach performing a garbled circuits protocol.
	However, Mauro teaches performing a garbled circuits protocol (Mauro, pg.461, sec. B Garbled Ciruits (GCs) for Boolean Circuits,“Yao’s GC protocol works as follows: in the setup constructor generates an encrypted version of the function (represented as boolean circuit), called garbled circuit…”).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify, McMahan’s method in view of Abadi and in view of Mauro to teach performing a garbled circuits protocol. The motivation to do so would be to have a two-party secure function evaluation protocol that correctly models the type of adversarial threats faced and thus, less computational overhead  (Mauro, pg., 460, sec. IV Modular Design of Efficient SFE Protocols, “As the overhead for getting full-fledged security against both parties being malicious is too large for practical applications, we advocate the usage of hybrid security instead, where players are not equal in their capabilities, trustworthiness, and motivation… it is reasonable to assume that the service provider has strong incentives not to cheat in the protocol (act semihonestly) as his cheating attempts might be detected and ruin his reputation and business model, whereas [a client] may be much more willing to cheat (act maliciously)… such protocols with asymmetric assumptions on the two players can be constructed efficiently, where the overhead is very moderate… Thus, protocols which are based only on garbled circuits are good candidates for settings with corresponding trust relationships.”).
12.	Claims 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (US 2017/0109322 Al, “McMahan”) in view of Chen, Tingting, et al. "Privacy-preserving backpropagation neural network learning." IEEE Transactions on Neural Networks 20.10 (2009)(“Chen”) and in view of  Yuan et al. "Privacy preserving back-propagation neural network learning made practical with cloud computing." IEEE Transactions on Parallel and Distributed Systems 25.1 (2013)(“Yuan”).
Regarding claim 12, McMahan teaches a computer-implemented method comprising: 
calculating a gradient value based on a data set applied to a data model(McMahan, paras. 0015-0017, “More particularly... a set of input-output data can be used to describe a global objective via a loss function. Such functions can be, for instance, a convex or non-convex function, such as a...neural network function and/or various other suitable functions. A local objective (                        
                            
                                
                                    F
                                
                                
                                    k
                                
                            
                        
                    ) can also be defined using data stored on a computing device. For instance, the global objective can be defined as:                         
                            f
                            
                                
                                    w
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                            
                                
                                    F
                                
                                
                                    k
                                
                            
                            
                                
                                    w
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                             
                            ̇
                            
                                
                                    1
                                
                                
                                    
                                        
                                            n
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                            
                                                P
                                            
                                            
                                                k
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    w
                                    )
                                
                            
                        
                     wherein K describes the number of computing devices, n describes a total number of data examples,                         
                            
                                
                                    n
                                
                                
                                    k
                                
                            
                        
                     describes the number of data examples stored on computing device k, and                         
                            
                                
                                    P
                                
                                
                                    k
                                
                            
                        
                     describes a partition of data example indices { 1, ... , n} stored on the computing device k… The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients.”); receiving an average gradient value from one or more of the first host system or the second host system(McMahan, paras. 0017, fig.1, fig.2, fig.3, “The central device can then be configured to determine a gradient of the global objective…to provide the gradient to the remote computing devices. For instance, the gradient can be defined                         
                            ∇
                            f
                            (
                            
                                
                                    w
                                    )
                                
                                ~
                            
                            =
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    ∇
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            w
                                            )
                                        
                                        ~
                                    
                                
                            
                        
                    .” Note: It is being interpreted that the central device represents one or more of the first host system or the second host system);each of the multiple different source systems configured to aggregate and maintain its own data set independently of all other data sets(McMahan, para. 0014, fig.1, fig.2, fig.3,  “Each computing device can then provide the local update(s) to the central computing device. For instance, the local update can be a gradient vector. In some implementations, the local update may be determined using one or more gradient descent techniques. For instance, the local update may be determined using batch gradient descent techniques, stochastic gradient descent techniques, or other gradient descent techniques.  the training data used to determine the local update, thereby reducing bandwidth requirements and maintaining user privacy…By only providing the local update (and not the training data) to the server, the global model update can be determined using reduced bandwidth requirements, and without compromising the security of potentially privacy sensitive data stored on the user devices.”); applying the average gradient value to the data model(McMahan, para. 0017, fig.1, fig.2, fig.3, “The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined                         
                            ∇
                            f
                            (
                            
                                
                                    w
                                    )
                                
                                ~
                            
                            =
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    ∇
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            w
                                            )
                                        
                                        ~
                                    
                                
                            
                        
                    . Each remote computing device can then determine a local update based at least in part on the global gradient.”); and obtaining a predictive model that represents a trained version of the data model, the data model trained at least in part using the average gradient value(McMahan, paras. 0018-0021, “For instance, the local update can be determined using one or more gradient descent techniques ( e.g. stochastic gradient descent). In this manner, each remote computing device can perform one or more stochastic updates or iterations to determine the local update. More particularly, each remote computing device can initialize one or more parameters associated with the local objective. Each remote computing device can then, for instance, uniformly, randomly sample Pk for one or more stochastic iterations. In this manner, the local update can be determined based at least in part on the sampled data. In particular, the local update can be defined as: for t=l to m do Sample                         
                            i
                            ∈
                            
                                
                                    P
                                
                                
                                    k
                                
                            
                        
                     uniformly at random                          
                            
                                
                                    w
                                
                                
                                    k
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    k
                                
                            
                            -
                            h
                            (
                            ∇
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            
                                
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            -
                            ∇
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    w
                                                
                                                ~
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                            +
                            ∇
                            f
                            
                                
                                    
                                        
                                            
                                                
                                                    w
                                                
                                                ~
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                            )
                        
                     wherein m is a number of stochastic steps per iteration, and h is the stepsize. The local updates can then be provided to the central computing device.In particular, the local updates can be gradient vectors.                         
                            
                                
                                    w
                                
                                ~
                            
                            =
                            
                                
                                    w
                                
                                ~
                            
                            +
                            
                                
                                    1
                                
                                
                                    k
                                
                            
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    -
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                            
                        
                    ) This can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges). The threshold can be determined based at least in part on a desired accuracy of the global model.”).  
	McMahan does not teach generating a perturbed gradient value based on the gradient value and a perturbation value; communicating the perturbed gradient value to a first host system; communicating the perturbation value to a second host system; the average gradient value calculated based on the perturbed gradient value and the perturbation value. 
	However Chen teaches generating a perturbed gradient value based on the gradient value and a perturbation value (Chen, pgs. 1557-1558,  see step 2 of Algorithm 1 where host A’s perturbed gradient value for the output layer of the neural network is defined as                         
                            
                                
                                    ∆
                                
                                
                                    1
                                
                            
                            
                                
                                    w
                                
                                
                                    i
                                    ,
                                     
                                    j
                                
                                
                                    0
                                
                            
                            =
                            
                                
                                    
                                        
                                            o
                                        
                                        
                                            i
                                            1
                                        
                                    
                                    -
                                    
                                        
                                            t
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    h
                                
                                
                                    j
                                    1
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    11
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    21
                                
                            
                        
                    , where                         
                            
                                
                                    
                                        
                                            o
                                        
                                        
                                            i
                                            1
                                        
                                    
                                    -
                                    
                                        
                                            t
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    h
                                
                                
                                    j
                                    1
                                
                            
                        
                     represents the gradient portion and                         
                            
                                
                                    r
                                
                                
                                    11
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    21
                                
                            
                        
                     represent the perturbation value); communicating the perturbed gradient value to a first host system ( Chen, pgs. 1557-1558, see step 3 of Algorithm 1 where host A sends its output layer’s perturbed gradient value                         
                            
                                
                                    ∆
                                
                                
                                    1
                                
                            
                            
                                
                                    w
                                
                                
                                    i
                                    ,
                                     
                                    j
                                
                                
                                    0
                                
                            
                        
                     to host B); communicating the perturbation value to a second host system(Chen, pg. 1559, see steps  2 of Algorithm 3 where host B sends the encrypted message                         
                            E
                            (
                            
                                
                                    m
                                
                                
                                    N
                                
                            
                            ,
                             
                            
                                
                                    r
                                
                                ´
                            
                            )
                        
                     to host A that partially decrypts the message to get the perturbation value R in step 4). 
	 Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify, McMahan’s method in view of Chen to teach generating a perturbed gradient value based on the gradient value and a perturbation value; communicating the perturbed gradient value to a first host system; communicating the I Introduction, “We propose a lightweight two-party distributed algorithm for privacy-preserving backpropagation training with vertically partitioned data… Both analytical and experimental results show that our algorithms are lightweight in terms of computation and communication overheads”).
	McMahan also does not teach: the average gradient value calculated based on perturbed gradient values received at the first host system from multiple different source systems;  and perturbation values received at the second host system from multiple different source systems.
	However, Yuan teaches the average gradient value calculated based on perturbed gradient values received at the first host system from multiple different source systems(Yuan pg. 216, see algorithm 4 in which the cloud calculates                         
                            C
                            
                                
                                    
                                        
                                            L
                                        
                                        ^
                                    
                                
                            
                        
                     from each                         
                            C
                            
                                
                                    
                                        
                                            L
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                     Note: It is being interpreted that                        
                             
                            C
                            
                                
                                    
                                        
                                            L
                                        
                                        ^
                                    
                                
                            
                        
                     represents the average gradient value,                          
                            C
                            
                                
                                    
                                        
                                            L
                                        
                                        
                                            s
                                        
                                    
                                
                            
                             
                        
                     represents the perturbed gradient values from multiple different source systems and the cloud server that  calculates                         
                            C
                            
                                
                                    
                                        
                                            L
                                        
                                        ^
                                    
                                
                            
                        
                      represents the first host system);  and perturbation values received at the second host system from multiple different source systems(Yuan pg. 217, sec. 4.4 Secure Sharing of Scalar Product and Sum, Algorithm 4, “All the parties work together to decrypt the difference between                         
                            ϵ
                        
                     and sumL as                        
                            
                                
                                    L
                                
                                ^
                            
                            =
                            
                                
                                    ϵ
                                    -
                                    s
                                    u
                                    m
                                    L
                                
                            
                        
                    … [a]s the cloud is able to efficiently decrypt numbers as large as u, it can decrypt                         
                            
                                
                                    ∑
                                    
                                        s
                                        =
                                        1
                                    
                                    
                                        Z
                                    
                                
                                
                                    
                                        
                                            L
                                        
                                        
                                            s
                                        
                                    
                                    -
                                    ϵ
                                
                            
                        
                    …[f]inally, each                         
                            
                                
                                    P
                                
                                
                                    s
                                
                            
                        
                     get[s] its secure share                         
                            
                                
                                    ϵ
                                
                                
                                    s
                                
                            
                        
                     [from the cloud]….”  Note: It is being interpreted that                         
                            
                                
                                    L
                                
                                ^
                            
                        
                     represents the perturbation values and the cloud server that decrypts numbers as large as u and distributes the secure share represents the second host system).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the
 McMahan method in view Yuan to teach the average gradient value calculated based on perturbed gradient values received at the first host system from multiple different source systems;  and perturbation values received at the second host system from multiple different source systems. The motivation to do so would be to have a multi-party privacy preserving distributed algorithm rather than just a two party scheme(Yuan pg. 212, sec. Abstract, “To improve the accuracy of learning result, in practice multiple parties may collaborate through conducting joint Back-Propagation neural network learning on the union of their respective data sets. During this process no party wants to disclose her/his private data to others. Existing schemes supporting this kind of collaborative learning are either limited in the way of data partition or just consider two parties. There lacks a solution that allows two or more parties, each with an arbitrarily partitioned data set, to collaboratively conduct the learning. This paper solves this open problem by utilizing the power of cloud computing.”). 
	Regarding claim 13, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, wherein said calculating comprises applying backpropagation to the data model and using the data set to calculate the gradient value(Yuan, pg. 216, sec., 4.2 Privacy Preserving Multiparty Neural Network Learning, “After the Feed Forward Stage, all the parties work together to check whether the network has reached the error threshold. If not, they proceed to the Back-Propagation Stage, which aims at modifying the weights so as to achieve correct weights in the neural network.” & see Algorithm 2’s input with represents the parties data samples to calculate the network’s final weights as output).
	Regarding claim 14, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, wherein said calculating comprises: dividing the data set into a set of mini-batches; and calculating the gradient value using a particular mini-batch of the set of mini- batches (McMahan para. 0029, “[S]tochastic gradient descent techniques can be naively applied to the optimization problem, wherein one or more "minibatch" gradient calculations (e.g. using one or more randomly selected use devices) are performed per round of communication. For instance, the minibatch can include at least a subset of the training data stored locally on the user devices. In such implementations, one or more user devices can be configured to determine the average gradient associated with the local training data respectively stored on the user devices for a current version of a model.”). 
Regarding claim 15, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12 wherein said generating the perturbed gradient value comprises generating the perturbation value as a random vector (Chen, pgs. 1557-1559, in step 2.1 of Algorithm 1, the perturbation values of host A are                          
                            
                                
                                    r
                                
                                
                                    11
                                
                            
                            ,
                             
                            
                                
                                    r
                                
                                
                                    21
                                
                            
                        
                     , is done by Algorithm 3 in step 1 where host A generates a series of new random numbers                         
                            
                                
                                    r
                                
                                
                                    i
                                
                            
                        
                     ), and adding the random vector to the gradient value to generate the perturbed gradient value(Chen, pgs. 1557-1558,  see step 2 of Algorithm 1 where host A’s perturbed gradient value for the output layer of the neural network is defined as                         
                            
                                
                                    ∆
                                
                                
                                    1
                                
                            
                            
                                
                                    w
                                
                                
                                    i
                                    ,
                                     
                                    j
                                
                                
                                    0
                                
                            
                            =
                            
                                
                                    
                                        
                                            o
                                        
                                        
                                            i
                                            1
                                        
                                    
                                    -
                                    
                                        
                                            t
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    h
                                
                                
                                    j
                                    1
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    11
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    21
                                
                            
                        
                    , where                         
                            
                                
                                    
                                        
                                            o
                                        
                                        
                                            i
                                            1
                                        
                                    
                                    -
                                    
                                        
                                            t
                                        
                                        
                                            i
                                        
                                    
                                
                            
                            
                                
                                    h
                                
                                
                                    j
                                    1
                                
                            
                        
                     represents the gradient portion and                         
                            
                                
                                    r
                                
                                
                                    11
                                
                            
                            +
                            
                                
                                    r
                                
                                
                                    21
                                
                            
                        
                     represent the perturbation value).  
Regarding claim 16, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, wherein said applying comprises applying a weight value from the average gradient value to the data model (McMahan, para. 0021, “In particular, the local updates can be gradient vectors. The central computing device can then aggregate the local updates to determine a global update to the model. For instance, the aggregation can be an averaging aggregation defined as:                         
                            
                                
                                    w
                                
                                ~
                            
                            =
                            
                                
                                    w
                                
                                ~
                            
                            +
                            
                                
                                    1
                                
                                
                                    k
                                
                            
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    -
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                            
                        
                    ) This can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges).”).  
wherein said obtaining is performed in response to ascertaining that a termination criterion occurs (McMahan, para. 0021, “In particular, the local updates can be gradient vectors. The central computing device can then aggregate the local updates to determine a global update to the model. For instance, the aggregation can be an averaging aggregation defined as:                         
                            
                                
                                    w
                                
                                ~
                            
                            =
                            
                                
                                    w
                                
                                ~
                            
                            +
                            
                                
                                    1
                                
                                
                                    k
                                
                            
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    (
                                    
                                        
                                            w
                                        
                                        
                                            k
                                        
                                    
                                    -
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                            
                        
                    ) This can be repeated for one or more iterations, for instance, until the loss function reaches a threshold (e.g. converges).”).
Regarding claim 19, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, wherein the predictive model comprises a neural network trained using the average gradient value (McMahan, paras. 0015-0017, “More particularly... a set of input-output data can be used to describe a global objective via a loss function. Such functions can be, for instance, a convex or non-convex function, such as a...neural network function and/or various other suitable functions. A local objective (                        
                            
                                
                                    F
                                
                                
                                    k
                                
                            
                        
                    ) can also be defined using data stored on a computing device. For instance, the global objective can be defined as:                         
                            f
                            
                                
                                    w
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                            
                                
                                    F
                                
                                
                                    k
                                
                            
                            
                                
                                    w
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        k
                                        =
                                        1
                                    
                                    
                                        K
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    n
                                                
                                                
                                                    k
                                                
                                            
                                        
                                        
                                            n
                                        
                                    
                                
                            
                             
                            ̇
                            
                                
                                    1
                                
                                
                                    
                                        
                                            n
                                        
                                        
                                            k
                                        
                                    
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        ∈
                                        
                                            
                                                P
                                            
                                            
                                                k
                                            
                                        
                                    
                                
                                
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    w
                                    )
                                
                            
                        
                     wherein K describes the number of computing devices, n describes a total number of data examples,                         
                            
                                
                                    n
                                
                                
                                    k
                                
                            
                        
                     describes the number of data examples stored on computing device k, and                         
                            
                                
                                    P
                                
                                
                                    k
                                
                            
                        
                     describes a partition of data example indices { 1, ... , n} stored on the computing device k… The central device can then be configured to determine a gradient of the global objective based at least in part on the local objective gradients, and then to provide the gradient to the remote computing devices. For instance, the gradient can be defined                         
                            ∇
                            f
                            (
                            
                                
                                    w
                                    )
                                
                                ~
                            
                            =
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    ∇
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            w
                                            )
                                        
                                        ~
                                    
                                
                            
                        
                    .”).  
Regarding claim 20, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, further comprising: applying a set of input data to the predictive model (Chen, pg., 1555, sec. A Notations for Backpropagation Learning, “Given a neural network with                         
                            a
                            -
                            b
                            -
                            c
                        
                     configuration, one input vector is denoted as                        
                            (
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    x
                                
                                
                                    2
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    x
                                
                                
                                    a
                                
                            
                            )
                        
                    . The values of hidden layer nodes are denoted as                         
                            {
                            
                                
                                    h
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    h
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    h
                                
                                
                                    b
                                
                            
                            }
                        
                    , and the values of output-layer nodes are                         
                            {
                            
                                
                                    o
                                
                                
                                    1
                                
                            
                            ,
                             
                            
                                
                                    o
                                
                                
                                    2
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    o
                                
                                
                                    c
                                
                            
                            }
                        
                    .                         
                            
                                
                                    w
                                
                                
                                    j
                                    k
                                
                                
                                    h
                                
                            
                        
                     denotes the weight connecting the input-layer node k and the hidden-layer node j.                         
                            
                                
                                    w
                                
                                
                                    i
                                    j
                                
                                
                                    o
                                
                            
                        
                     denotes the weight connecting j and the output-layer node i.”); ascertaining an output of the predictive model(Chen, pg., 1556, sec. A Notations for Backpropagation Learning, “We use mean square error (MSE) as the error function in the backpropagation algorithm                         
                            e
                            =
                            
                                
                                    1
                                
                                
                                    2
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                    
                                
                                
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            t
                                                        
                                                        
                                                            i
                                                        
                                                    
                                                    -
                                                    
                                                        
                                                            o
                                                        
                                                        
                                                            i
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                    .” Note: It is being interpreted that                         
                            
                                
                                    o
                                
                                
                                    i
                                
                            
                        
                     represents ascertaining an output of the predictive model); and performing an action based on the output of the predictive model (Chen, pg., 1556, sec. A Notations for Backpropagation Learning, “For the neural networks described above, the partial derivatives are listed as (1) and (2), for future reference.” Note: It is being interpreted that performing the backpropagation algorithm represents performing an action based on the output of the predictive model).
13.	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over McMahan et al. (US 2017/0109322 Al, “McMahan”)  in view of Chen, Tingting, et al. "Privacy-preserving backpropagation neural network learning." IEEE Transactions on Neural Networks 20.10 (2009)(“Chen”) and in view of  Yuan et al. "Privacy preserving back-propagation neural network learning made practical with cloud computing." IEEE Transactions on Parallel and Distributed Systems 25.1 (2013)(“Yuan”) and further in view of  Barni, Mauro, et al. "Privacy-preserving ECG classification with branching programs and neural networks." IEEE Transactions on Information Forensics and Security 6.2 (2011)(“Mauro”).
	Regarding claim 18, McMahan as modified in view of Chen and in view of Yuan teaches the method of claim 12, wherein the average gradient value is calculated (McMahan, paras.                         
                            ∇
                            f
                            (
                            
                                
                                    w
                                    )
                                
                                ~
                            
                            =
                            
                                
                                    1
                                
                                
                                    n
                                
                            
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    ∇
                                    
                                        
                                            f
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            w
                                            )
                                        
                                        ~
                                    
                                
                            
                        
                    .”).
	McMahan as modified in view of Chen and in view of Yuan does not teach using a garbled circuits protocol.
However, Mauro teaches performing a garbled circuits protocol (Mauro, pg.461, sec. B Garbled Ciruits (GCs) for Boolean Circuits, “Yao’s GC protocol works as follows: in the setup phase, the constructor generates an encrypted version of the function (represented as boolean circuit), called garbled circuit…”).
	Accordingly, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to modify, McMahan’s method in view of Chen and in view of Yuan and further in view of Mauro to teach performing a garbled circuits protocol. The motivation to do so would be to have a two-party secure function evaluation protocol that correctly models the type of adversarial threats faced and thus, less computational overhead  (Mauro, pg., 460, sec. IV Modular Design of Efficient SFE Protocols, “As the overhead for getting full-fledged security against both parties being malicious is too large for practical applications, we advocate the usage of hybrid security instead, where players are not equal in their capabilities, trustworthiness, and motivation… it is reasonable to assume that the service provider has strong incentives not to cheat in the protocol (act semihonestly) as his cheating attempts might be detected and ruin his reputation and business model, whereas [a client] may be much more willing to cheat (act maliciously)… such protocols with asymmetric assumptions on the two players can be constructed efficiently, where the overhead is very moderate… Thus, .
Response to Arguments
Applicant's arguments filed 03/29/2021 have been fully considered but they are not persuasive as it pertains to the 103 rejection.
Applicant argues that claims 1 should not be rejected under 35 U.S.C § 103. In support of this argumentation, Applicant states that prior art of Abadi fails to teach:  that multiple source systems are contributing average gradient values that are used to generate a noisy gradient average. Further, Abadi does not teach that such a noisy gradient average is applied, by multiple different source systems, for use in applying previously unanalyzed data to generate updated models(see pg. 9, para. 2 of Applicant’s 03/29/2021 submitted arguments).Initially, examiner must respectfully point out that claims 1-7 do not talk about noisy gradient averages. Claims 1-7 deal with average gradient values (see pgs. 2-3 of Applicant’s 03/29/2021 submitted claims). With that being said, the new prior art of McMahan renders applicant’s argumentation moot since McMahan teaches that multiple source systems are contributing average gradient values and the prior art of Abadi is not relied upon for this teaching.  
	In regards to claims 8, Applicant argues that neither Abadi nor Shokri teach or suggest receiving multiple gradient values from multiple different source systems configured to
independently aggregate and maintain its own data set, and to generate an average gradient
value from these multiple gradient values(see pg. 10, para. 2, of Applicant’s 03/29/2021 submitted arguments).  However, Applicant’s arguments with respect to claims 8 and 11 are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
12, Applicant argues that the prior art system taught by Yuan is not equivalent to the system taught by independent claim 12. Again, Examiner respectfully must point out that what is stated in page 12, paragraph 1 of Applicant’s 03/29/2021 arguments is not what is actually claimed.1 The limitations of claim 12 that pertain to Applicant’s argument states: “receiving an average gradient value from one or more of the first host system or the second host system, the average gradient value calculated based on perturbed gradient values received at the first host system from multiple different source systems and perturbation values received at the second host system from multiple different source systems, each of the multiple different source systems configured to aggregate and maintain its own data set independently of all other data sets.” See pg. 5 of Applicant’s 03/29/2021 submitted claims. McMahan as modified in view of Chen and in view of Yuan teaches these limitations (see pgs. 15-19 of the current Office Action for further clarification). 
Accordingly, since independent claims 1, 8, and 12 are rejected under 103, the dependent claims are also rejected under 103 since they depend on the rejected independent claims above. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20150324686 A1
US 9984337 B2

		

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADAM CLARK STANDKE whose telephone number is (571)270-1806.  The examiner can normally be reached on 7:00-5:00 M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali, Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ADAM C STANDKE/Examiner, Art Unit 2122                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 “This system [i.e., the system taught by Yuan]  is not the equivalent of a system where individual source systems operate on their own data sets and merely share perturbed gradient values and perturbed values for the purpose of generating average gradient values that can be used by the source systems, individually, in training
        their own data models. As such, whether taken alone or together, the combined teachings of Shokri, Chen,
        and Yuan fail to teach or suggest each and every element of claim 12, and it would not be obvious to combine their teachings to derive these limitations.”(Emphasis added). See pg. 12, paras. 1-2 of Applicant’s 03/29/2021 submitted arguments.