Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
The instant application having Application No. 16/206,274 filed on 11/30/2018 is presented for examination by the examiner.

Examiner Notes
Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Drawings
The applicant’s drawings submitted are acceptable for examination purposes.

Information Disclosure Statement
As required by M.P.E.P. 609, the applicant’s submissions of the Information Disclosure Statement dated 02/25/2020 and 11/30/2018 are acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending.

Allowable Subject Matter
Claims 4, 6, 11, 13, and 18-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Prior arts:
US 2017/0255621 to Kenthapadi
[0044] In some example embodiments, the graph generating system determines a weighted combination of the above-described weight functions based on a machine-learning model that uses linear regression or logistic regression techniques. The model is “taught” (e.g., trained) with respect to a ground truth dataset, wherein each item in the ground truth dataset corresponds to a pair of sample concepts (u,v) that are related. For each pair (u,v), the graph generating system computes one or more weight values (e.g., intermediate weight values) using different weight functions. The graph generating system also receives a ground truth weight value that could be provided by a judge. The judge may be a person whose role is to perform an analysis of the relationship between concepts u and v of the pair of concepts (u,v), and to determine a ground truth weight value that reflects the degree of relatedness of concepts u and v. Based on the ground truth weight value provided by the judge (e.g., via a user interface of a client device associated with the judge), the graph generating system associates the ground truth weight with the pair of concepts (u,v) as the current weight value of the edge between the nodes that represent concepts u and v in the universal concept graph. Based on the ground truth weight values provided for all the items in the ground truth dataset, the graph generating system uses the machine-learning model to determine the logic behind the allocation, by the human judge, of certain ground truth weight values to the sample concept pairs ground truth dataset, and to determine, using the logic, what the current edge weight values associated with the remainder of the edges in the universal concept graph should be considering all the intermediate weight values computed for a respective edge.

US 2018/0349382 to Kumaran
[0029] FIG. 1 is one example of a system 100 of interconnected servers or computers using linear expressions for search engines. Network 102 interconnects training server (or computer) 104 and search servers (or computers) 108-1 through 108-N. Network 102 can be a wide area network (WAN) including the Internet or a local area network (LAN), which can provide IEEE 802.11 wired or wireless communication or any combination of thereof. Training server 104 can be located at any data center (e.g., a core data center) or a training facility to build machine learning (ML) models 106 generating decision trees used for ranking search results. For example, ML models 106 can be based on any type of ML model including Gradient Boosting Decision Tree (GBDT) models, linear regression models or logistic regression models in generating decision trees that can predict a target result or provide a target value based on any number of input features at nodes of the tree providing classification labels. ML models can be generated on any type of computing device or computer such as a server, battery powered mobile device or client computer. Linear expressions can be received and used by any server, battery powered mobile device or client computer.

US 2013/0031252 to Chang
[0059] Additionally, or alternatively, if a second pre-selected time interval expires before network node 140 receives a response to the second request for subscriber data, network node 140 may execute a fail-open function (event 7), which may include bypassing subscriber data storage 130 in a call processing procedure. In other words, similar to network device 120, network node 140 may continue processing the request for the network service by communicating the request for the network service to network device 120 without receiving subscriber data from subscriber data storage 130. At some point after receiving a positive response from network node 140, network device 120 may grant network access to user device 110. In some implementations, the network access granted may be limited to one or more network services (e.g., voice calling services, text message services, e-mail services, etc.).

The prior art of record (Steele in view of Kumar, Kenthapadi, Kumaran, and Chang) does not disclose and/or fairly suggest at least claimed limitations recited in such manners in dependent claims 4, 6, 11, 13 and 18-19.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 8-9 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over US 2015/0379426 to Steele et al. (hereafter “Steele”) in further view of US 2017/0098171 to Kumar et al. (hereafter “Kumar”)

As per claim 1, Steele discloses a method for decentralized distributed deep learning in a computing environment by one or more processors comprising:
performing asynchronous distributed training of one or more machine learning models (FIGs. 5 and 9; paragraphs 0051, 0057, 0073 and 0091: “FIG. 5 illustrates an example of asynchronous scheduling of jobs at a machine learning service, according to at least some embodiments. In the depicted example, a client has invoked four MLS APIs, API1 through API4, and four corresponding job objects J1 through J4 are created and placed in job queue 142. Timelines TL1, TL2, and TL3 show the sequence of events from the perspective of the client that invokes the APIs, the request handler that creates and inserts the jobs in queue 142, and a job scheduler that removes the jobs from the queue and schedules the jobs at selected resources.”)) by generating a list of neighbor nodes for each node in a plurality of nodes (paragraphs 0052, 0057, 0073 and 0187: “In at least some implementations, job queue 142 may be managed as a first-in-first-out (FIFO) queue, with the further constraint that the dependency requirements of a given job must have been met in order for that job to be removed from the queue. In some embodiments, jobs created on behalf of several different clients may be placed in a single queue, while in other embodiments multiple queues may be maintained (e.g., one queue in each data center of the provider network being used, or one queue per MLS customer). Asynchronously with respect to the submission of the requests 111, the next job whose dependency requirements have been met may be removed from job queue 142 in the depicted embodiment, as indicated by arrow 113, and a processing plan comprising a workload distribution strategy may be identified for it.”).
Steele does not explicitly disclose creating a first thread for continuous communication according to a weight management operation and a second thread for continuous computation of a gradient for each node, wherein one or more variables are shared between the first thread and the second thread.
Kumar further discloses creating a first thread for continuous communication according to a weight management operation (paragraphs 0023-0027: “The generate thread then checks to see if the toLearner buffer was filled since it was last checked, and, if so, swaps its content with weight buffer using a constant-time, atomic, non-blocking operation. It updates its weights and changes I to the index of the received weight. The received weight also has an associated mass M′. If M′ crosses an epoch multiple then a test run is triggered (with the current weights) by one of learner processing systems 210, 220, 230, 240.”) and a second thread for continuous computation of a gradient for each node (paragraphs 0025-0027: “The reconciler thread at each learner processing system 210, 220, 230, 240 receives gradients from the respective generate thread of that learner processing system, communicates with the reconciler threads at the other learner processing systems 210, 220, 230, 240, and generates new weights.”), wherein one or more variables are shared between the first thread and the second thread (paragraphs 0027-0028: “The reconciler thread has its own copy of the model and uses it to update the weights, using the incoming gradient. This updated weight is available for pickup by the generate thread in the toLearner buffer. In examples, if the weight is not picked up before a new weight is available, the new weight overwrites the existing weight in the buffer.”).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kumar into Steele’s teaching because it would provide for the purpose of computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners (Kumar, paragraph 0005).

As per claim 2, Steele discloses selecting a neighbor node from the list of neighbor nodes (paragraphs 0182, 0183, and 0190).
Kumar further discloses defining the weight management operation (paragraphs 0025-0027) to include:
applying a gradient of a selected node from the list of neighbor nodes obtained second thread to obtain an updated weight of the selected node (paragraphs 0023, 0025, 0030, 0034, 0041 and 0043: “(paragraphs 0025-0027: “The reconciler thread at each learner processing system 210, 220, 230, 240 receives gradients from the respective generate thread of that learner processing system, communicates with the reconciler threads at the other learner processing systems 210, 220, 230, 240, and generates new weights.”), wherein one or more variables are shared between the first thread and the second thread (paragraphs 0027-0028: “The reconciler thread has its own copy of the model and uses it to update the weights, using the incoming gradient. This updated weight is available for pickup by the generate thread in the toLearner buffer. In examples, if the weight is not picked up before a new weight is available, the new weight overwrites the existing weight in the buffer.”);
setting the gradient equal to a zero value;
selecting a neighbor node from the list of neighbor nodes; or
exchanging weights with the selected neighbor node and averaging exchanged weights to generate a weighted vector.
Kumar further disclose defining the weight management operation to include:
applying a gradient of a selected node from the list of neighbor nodes obtained second thread to obtain an updated weight of the selected node (paragraphs 0023, 0025, 0030, 0034, 0041 and 0043) ;
setting the gradient equal to a zero value;
selecting a neighbor node from the list of neighbor nodes; or
exchanging weights with the selected neighbor node and averaging exchanged weights to generate a weighted vector.
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kumar into Steele’s teaching because it would provide for the purpose of computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners (Kumar, paragraph 0005).

As per claim 8, Steele discloses a system for decentralized distributed deep learning in a computing environment, comprising: 
one or more computers with executable instructions that when executed cause the system to: 
perform asynchronous distributed training of one or more machine learning models (FIGs. 5 and 9; paragraphs 0051, 0057, 0073 and 0091: “FIG. 5 illustrates an example of asynchronous scheduling of jobs at a machine learning service, according to at least some embodiments. In the depicted example, a client has invoked four MLS APIs, API1 through API4, and four corresponding job objects J1 through J4 are created and placed in job queue 142. Timelines TL1, TL2, and TL3 show the sequence of events from the perspective of the client that invokes the APIs, the request handler that creates and inserts the jobs in queue 142, and a job scheduler that removes the jobs from the queue and schedules the jobs at selected resources.”))  by generating a list of neighbor nodes for each node in a plurality of nodes (paragraphs 0052, 0057, 0073 and 0187: “In at least some implementations, job queue 142 may be managed as a first-in-first-out (FIFO) queue, with the further constraint that the dependency requirements of a given job must have been met in order for that job to be removed from the queue. In some embodiments, jobs created on behalf of several different clients may be placed in a single queue, while in other embodiments multiple queues may be maintained (e.g., one queue in each data center of the provider network being used, or one queue per MLS customer). Asynchronously with respect to the submission of the requests 111, the next job whose dependency requirements have been met may be removed from job queue 142 in the depicted embodiment, as indicated by arrow 113, and a processing plan comprising a workload distribution strategy may be identified for it.”).
Steele does not explicitly disclose creating a first thread for continuous communication according to a weight management operation and a second thread for continuous computation of a gradient for each node, wherein one or more variables are shared between the first thread and the second thread.
Kumar further discloses creating a first thread for continuous communication according to a weight management operation (paragraphs 0023-0027: “The generate thread then checks to see if the toLearner buffer was filled since it was last checked, and, if so, swaps its content with weight buffer using a constant-time, atomic, non-blocking operation. It updates its weights and changes I to the index of the received weight. The received weight also has an associated mass M′. If M′ crosses an epoch multiple then a test run is triggered (with the current weights) by one of learner processing systems 210, 220, 230, 240.”) and a second thread for continuous computation of a gradient for each node (paragraphs 0025-0027: “The reconciler thread at each learner processing system 210, 220, 230, 240 receives gradients from the respective generate thread of that learner processing system, communicates with the reconciler threads at the other learner processing systems 210, 220, 230, 240, and generates new weights.”), wherein one or more variables are shared between the first thread and the second thread (paragraphs 0027-0028: “The reconciler thread has its own copy of the model and uses it to update the weights, using the incoming gradient. This updated weight is available for pickup by the generate thread in the toLearner buffer. In examples, if the weight is not picked up before a new weight is available, the new weight overwrites the existing weight in the buffer.”).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kumar into Steele’s teaching because it would provide for the purpose of computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners (Kumar, paragraph 0005).

As per claim 9, it is a system claim, which recite(s) the same limitations as those of claim 2. Accordingly, claim 9 is rejected for the same reasons as set forth in the rejection of claim 2.

As per claim 15, Steele discloses a computer program product for, by a processor, decentralized distributed deep learning in a computing environment, the computer program product comprising a non- transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: 
an executable portion that performs asynchronous distributed training of one or more machine learning models (FIGs. 5 and 9; paragraphs 0051, 0057, 0073 and 0091: “FIG. 5 illustrates an example of asynchronous scheduling of jobs at a machine learning service, according to at least some embodiments. In the depicted example, a client has invoked four MLS APIs, API1 through API4, and four corresponding job objects J1 through J4 are created and placed in job queue 142. Timelines TL1, TL2, and TL3 show the sequence of events from the perspective of the client that invokes the APIs, the request handler that creates and inserts the jobs in queue 142, and a job scheduler that removes the jobs from the queue and schedules the jobs at selected resources.”)) by generating a list of neighbor nodes for each node in a plurality of nodes (paragraphs 0052, 0057, 0073 and 0187: “In at least some implementations, job queue 142 may be managed as a first-in-first-out (FIFO) queue, with the further constraint that the dependency requirements of a given job must have been met in order for that job to be removed from the queue. In some embodiments, jobs created on behalf of several different clients may be placed in a single queue, while in other embodiments multiple queues may be maintained (e.g., one queue in each data center of the provider network being used, or one queue per MLS customer). Asynchronously with respect to the submission of the requests 111, the next job whose dependency requirements have been met may be removed from job queue 142 in the depicted embodiment, as indicated by arrow 113, and a processing plan comprising a workload distribution strategy may be identified for it.”).
Steele does not explicitly disclose creating a first thread for continuous communication according to a weight management operation and a second thread for continuous computation of a gradient for each node, wherein one or more variables are shared between the first thread and the second thread.
Kumar further discloses creating a first thread for continuous communication according to a weight management operation (paragraphs 0023-0027: “The generate thread then checks to see if the toLearner buffer was filled since it was last checked, and, if so, swaps its content with weight buffer using a constant-time, atomic, non-blocking operation. It updates its weights and changes I to the index of the received weight. The received weight also has an associated mass M′. If M′ crosses an epoch multiple then a test run is triggered (with the current weights) by one of learner processing systems 210, 220, 230, 240.”) and a second thread for continuous computation of a gradient for each node (paragraphs 0025-0027: “The reconciler thread at each learner processing system 210, 220, 230, 240 receives gradients from the respective generate thread of that learner processing system, communicates with the reconciler threads at the other learner processing systems 210, 220, 230, 240, and generates new weights.”), wherein one or more variables are shared between the first thread and the second thread (paragraphs 0027-0028: “The reconciler thread has its own copy of the model and uses it to update the weights, using the incoming gradient. This updated weight is available for pickup by the generate thread in the toLearner buffer. In examples, if the weight is not picked up before a new weight is available, the new weight overwrites the existing weight in the buffer.”).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kumar into Steele’s teaching because it would provide for the purpose of computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners (Kumar, paragraph 0005).

As per claim 16, it is a computer program product claim, which recite(s) the same limitations as those of claim 2. Accordingly, claim 16 is rejected for the same reasons as set forth in the rejection of claim 2.

Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Steele in view of Kumar, as applied to claims 1, 8 and 15, and further in view of US 2012/0284067 to Labat et al. (hereafter “Labat”)

As per claim 3, Steele does not explicitly disclose defining the continuous computation of the gradient for the second thread for each of the nodes to include: continuously determining a gradient for a selected node from the list of neighbor nodes based on input data of the selected node; sending the determined gradient of the selected node to the first thread; assigning weighted vector received from the first thread to a current weight of the selected node.
Kumar further discloses defining the continuous computation of the gradient for the second thread for each of the nodes (paragraphs 0023-0027) to include:
continuously determining a gradient for a selected node from the list of neighbor nodes based on input data of the selected node (paragraphs 0025-0027: “The reconciler thread at each learner processing system 210, 220, 230, 240 receives gradients from the respective generate thread of that learner processing system, communicates with the reconciler threads at the other learner processing systems 210, 220, 230, 240, and generates new weights.”); 
sending the determined gradient of the selected node to the first thread (paragraphs 0027-0028: “The reconciler thread has its own copy of the model and uses it to update the weights, using the incoming gradient. This updated weight is available for pickup by the generate thread in the toLearner buffer. In examples, if the weight is not picked up before a new weight is available, the new weight overwrites the existing weight in the buffer.”).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Kumar into Steele’s teaching because it would provide for the purpose of computing, by a generator processor on each of a plurality of learners, a gradient for a mini-batch using a current weight at each of the plurality of learners, the current weight being uniquely identified by a weight index of each of the plurality of learners (Kumar, paragraph 0005).
Labat further discloses assigning weighted vector received from the first thread (paragraphs 0007-0008) to a current weight of the selected node (paragraphs 0008, 0038 and 0063).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Labat into Steele’s teaching and Kumar’s teaching because it would provide for the purpose of calculating the set of component revenues associated with the service components and the resources used by the software offering involves using the total revenue as a component revenue for a root node of the multidimensional model, and calculating a set of child component revenues for each set of child nodes in the multidimensional model by applying a weight vector associated with the set of child nodes to a parent component revenue for a parent node of the child nodes (Labat, paragraph 0008).

As per claim 10, it is a system claim, which recite(s) the same limitations as those of claim 3. Accordingly, claim 10 is rejected for the same reasons as set forth in the rejection of claim 3.
As per claim 17, it is a computer program product claim, which recite(s) the same limitations as those of claim 3. Accordingly, claim 17 is rejected for the same reasons as set forth in the rejection of claim 3.

Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Steele in view of Kumar, as applied to claims 1 and 8, and further in view of US 2016/0088006 to Gupta et al. (hereafter “Gupta”)

As per claim 5, Steele does not explicitly disclose detect one or more failures of one or more of the plurality of nodes or links in the list of neighbor nodes.
Gupta further discloses detect one or more failures of one or more of the plurality of nodes or links in the list of neighbor nodes (FIG. 8; paragraphs 0081).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Gupta into Steele’s teaching and Kumar’s teaching because it would provide for the purpose of improving total execution time of the reducer tasks (Gupta, paragraph 0081).

As per claim 12 it is a system claim, which recite(s) the same limitations as those of claim 5. Accordingly, claim 12 is rejected for the same reasons as set forth in the rejection of claim 5.

Claims 7, 14 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Steele in view of Kumar, as applied to claims 1, 8 and 15, and further in view of US 2007/0094060 to Apps et al. (hereafter “Apps”)
As per claim 7, Steele does not explicitly disclose restricting a weight from being updated at a selected node in the list of neighbor nodes when one or more weights are transmitted and averaged to maintain data consistency.
Apps further discloses restricting a weight from being updated at a selected node in the list of neighbor nodes when one or more weights are transmitted and averaged to maintain data consistency (FIG. 8; paragraph 0081).
It would have been obvious to a person having ordinary skill in the art before the effective filling date of the claimed invention to combine a teaching of Apps into Steele’s teaching and Kumar’s teaching because it would provide for the purpose of applying a strategy to a dataset in a data mining system to address a business problem (Apps, paragraph 0006).

As per claim 14 it is a system claim, which recite(s) the same limitations as those of claim 7. Accordingly, claim 14 is rejected for the same reasons as set forth in the rejection of claim 7.

As per claim 20, it is a computer program product claim, which recite(s) the same limitations as those of claim 7. Accordingly, claim 20 is rejected for the same reasons as set forth in the rejection of claim 7.

Conclusion
The following prior art made of record and not relied upon is cited to establish the level of skill in the applicant’s art and those arts considered reasonably pertinent to applicant’s disclosure. See MPEP 707.05(c).
Any inquiry concerning this communication should be directed to examiner Tuan Dao, whose telephone/fax numbers are (571) 270 3387 and (571) 270 4387, respectively. The examiner can normally be reached on every Monday-Thursday, and the second Friday of the bi-week from 7:30AM to 5:00PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do, can be reached at (571) 272 3721.
The fax phone number for the organization where this application or proceeding is assigned is (571) 273 8300.
Any inquiry of a general nature of relating to the status of this application or proceeding should be directed to the TC 2100 Group receptionist whose telephone number is (571) 272 2100.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/TUAN C DAO/            Primary Examiner, Art Unit 2193