Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Detailed Action
The office action is in response to communication filed on 07/25/2018. Claims 1-12 are presented for examination and are pending.
Oath/Declaration
For the record, the Examiner acknowledges that the Oath/Declaration submitted on
07/25/2018 has been received.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/25/2018 has been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, an initialed and dated copy of the Applicant’s IDS forms 1449 filed 07/25/2018 are attached to the instant Office action.
Claim Objections
Claims 2-5 (and similarly 8 and 10-12) are objected to because of the following informalities:  Each of these dependent claims use the language “A computer implemented method as claimed in claim 1”. When writing a claim in dependent form, the language should be “The system/method of claim 1…” Claims 8 and 10-12 are phrased similarly and are therefore objected to for the same reasoning. Appropriate correction is required. Examiner note: Replace “A” with “The”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 6 is rejected under 35 U.S.C. 101 because the claim is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim has the language “A non-transient data carrier” which does not exclude transitory signals. Examiner note: Replace the phrase “non-transient” with “non-transitory.” 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claim 1 lines 8-9, recites “parameters being suitable for improving the global parameter set…” The word “suitable” is indefinite. Neither the claim nor the specification explains the criteria for a parameter being suitable for improving the global parameter set.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3 and 6-10 are rejected under 35 U.S.C. 103 as being unpatentable over Pub. No. US 2015/0324690 A1 to Chilimbi et al., (hereinafter, “Chilimbi”) in view of Patent No. US 10,152,676 B1 to Strom.
As per claim 1, Chilimbi teaches a computer implemented method of distributed learning in a system comprising a parameter server configured to maintain a global parameter set of a model to be trained and a plurality of workers, (Chilimbi, Par. [0008]; “Other known embodiments, describe large-scale distributed systems comprised of tens of thousands of CPU cores for training large deep neural networks…The model replicas share a common set of parameters that is stored on a global parameter server.” Par. [0009];  “…methods to train large neural network models by providing training input to model training machines organized as multiple replicas that asynchronously update a shared model via a global parameter server are described herein.” Examiner note: a plurality of workers is interpreted as “tens of thousands of CPU cores”)
	the method comprising: transmitting a current global parameter set to a worker (Chilimbi Par. [0067]; “FIG. 8 is a diagram 800 of the global parameter sever(s) 706. As described above, the global parameter server (s) 706 may be in constant communication with the model training machines (e.g., Machine 1, Machine 2, etc.), asynchronously receiving updates to model parameters and sending the current weight values. These communications are illustrated by arrows 712 and 714.”);
	the worker performing a training step based on training data available to the worker, thereby generating a local set of parameters of the model (Chilimbi, Par. [0009] “…model replicas that asynchronously update a shared model via a global parameter server…” Par. [0076 - 0077]; “Block 1102 illustrates receiving a batch of data items, as described above. The deep learning training module 616 may receive the batch of data items from the data server(s) 702. The batch of data items may have been pre-processed in the data server(s) 702 as described in FIG. 10 below. Block 1104 illustrates processing individual data items to calculate updates. The deep learning training module 616 may input the batch of data items into a model to calculate activation values, error terms, and/or weight updates.”  Examiner note: Each of the models is a worker);
Chilimbi fails to explicitly teach the worker determining a likelihood of the local set of parameters being suitable for improving the global parameter set and omitting transmission of the local set of parameters to the parameter server if it is determined that the local set of parameters is not likely suitable for improving the global parameter set.
However, Strom teaches the worker determining a likelihood of the local set of parameters being suitable for improving the global parameter set and omitting transmission of the local set of parameters to the parameter server if it is determined that the local set of parameters is not likely suitable for improving the global parameter set (Strom, Col. 3 line 66 – Col. 4 line 6;  “In this way, updates which may be substantial in aggregate may be retained, while updates which are too small to make a substantial difference to the model, or which may be cancelled by other updates calculated in a subsequent iteration, are not applied. In addition, each computing device can maintain its own residual gradient, and the updates that do not meet or exceed the threshold are not transmitted to the other computing devices.”).   
Chilimbi and Strom are analogous because they are both directed to distributed model training systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of 
	As per claim 6, Strom teaches a non-transient data carrier comprising a computer program instructions suitable for execution by a processor, the computer program instructions, when executed by the processor causing the processor to perform the method as claimed in Claim 1 (Strom, Par. Col. 11, lines 15-22; “The process 600 may be embodied in a set of executable program instructions stored on non-transitory computer-readable media, such as short-term or long-term memory of one or more computing devices associated with a model training node 102A. When the process 600 is initiated, the executable program instructions can be loaded and executed by the one or more computing devices.”)
As per claim 7, the claim is analogous to claim 1 and is therefore rejected with the same rationale applied against claim 1.
	As per claim 9, Strom teaches A device comprising a processor, memory storing instructions executable by the processor (Strom, Col. 13 lines 10-15; “The elements of a method, process, routine, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor device, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory…”) and a communication interface (Strom, Col. 6 lines 34-37; “In some embodiments, the features and services provided by the model training nodes 102A, 102B and/or the data sources 104, 106 may be implemented as services consumable via a communication network.”) The remaining claim limitations are similar to those in claim 1 and are therefore rejected with the same rationale applied against claim 1. 
As per claim 2, the combination of Chlimbi and Strom teach A computer implemented method as claimed in Claim 1, Strom further teaches further comprising the worker using the local set of parameters as starting parameters in a further training step if it is determined that the local set of parameters is not likely suitable for improving the global parameter set (Strom, Col. 4 lines 6-9; “In addition, each computing device can maintain its own residual gradient, and the updates that do not meet or exceed the threshold are not transmitted to the other computing devices. In this way, the bandwidth savings described above can be maintained, while all updates to the parameters calculated at a given computing device can be preserved for future use.” Examiner note: Examiner interprets future use as further training steps).
Chilimbi and Strom are analogous because they are both directed to distributed model training systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom’s method of only exchanging local worker updates with the global server if the updates will make a difference, into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19).
As per claim 3, the combination of Chlimbi and Strom teach A computer implemented method as claimed in Claim 1, Strom further teaches further comprising, if it is determined that the local set of parameters is likely suitable for improving the global parameter set, the worker: transmitting the local set of parameters to the parameter server (Col. 3, lines 17-19; “…only those updates which are expected to provide a substantive change to the model may be applied and exchanged.”);
Strom does not explicitly teach receiving an updated global parameter set from the parameter server and using the updated global parameter set as starting parameters in a further training step.
However, Chilimbi teaches receiving an updated global parameter set from the parameter server and using the updated global parameter set as starting parameters in a further training step          (Chilimbi, [0079-0080] “The global parameter server(s) 706 may provide updated weight values based on receiving updates from one or more replicas 704A-704N. The updated weight values take into account activation values, error terms, and/or weight updates from each of the individual replicas 704A-704N running asynchronously. Block 1110 illustrates modifying the model to reflect the updated weight values, as described above. As described above, the deep learning training module 616 may calculate a model prediction error based at least in part on the updated individual weight values and the new updated weight values. The deep learning training module 616 may process subsequent batches of data items by repeating process 1100…”)
 Chilimbi and Strom are analogous because they are both directed to distributed model training systems. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom’s method of only exchanging local worker updates with the global server if the updates will make a difference, into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19).
	As per claim 8, the claim is analogous to claim 3 and is therefore rejected with the same rationale applied against claim 3.
	As per claim 10, the claim is analogous to claim 3 and is therefore rejected with the same rationale applied against claim 3.
Claims 4-5 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi in view of Strom, and further in view of Patent No. US 8874440 B2 to Park et al. (hereinafter, “Park”).
As per claim 4, the combination of Chilimbi and Strom as shown above teaches A computer implemented method as claimed in Claim 1, Chilimbi further teaches and wherein a loss function or learning curve of the training step is used as observation (Chilimbi [0036]; “…typically, training continues for multiple epochs, reprocessing the training data set each time, until the validation set error converges to a desired value below a predetermined threshold.” Examiner Note: the loss function is interpreted as the “validation set error”).  
	The combination of Chilimbi and Strom does not explicitly teach wherein the determining is based on a Partially Observable Markov Decision Process.
	However, Park teaches wherein the determining is based on a Partially Observable Markov Decision Process (Park Col. 10, lines 56-59; “For example, the action determining unit 130 may use a learning model designed using a reinforcement learning model such as a partially observable Markov decision process (POMDP)”).
Chilimbi, Strom, and Park are analogous because they are all directed to learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Park’s determining method into Chilimbi’s system of deep learning on distributed systems as modified by Strom in order to determine an optimal action that maximizes a reward (Park Col. 11, lines 45-46).
As per claim 11, the claim is analogous to claim 4 and is therefore rejected with the same rationale applied against claim 4. 
As per claim 5, the combination of Chilimbi and Strom teach A computer implemented as claimed in Claim 1. 
The combination of Chilimbi and Strom fails to explicitly teach wherein the worker uses reinforcement learning for said determining.
 However, Park teaches wherein the worker uses reinforcement learning for said determining (Park Col. 10, lines 56-59; “For example, the action determining unit 130 may use a learning model designed using a reinforcement learning model such as a partially observable Markov decision process (POMDP)”).
Chilimbi, Strom, and Park are analogous because they are all directed to learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Park’s determining method into Chilimbi’s system of deep learning on distributed systems as modified by Strom in order to determine an optimal action that maximizes a reward (Park Col. 11, lines 45-46).
As per claim 12, the claim is analogous to claim 5 and is therefore rejected with the same rationale applied against claim 5. 
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure are listed below:
Chen et al. (NPL: “Adaptive Residual Gradient Compression for Data-Parallel Distributed Training”): discloses a distributed training system involving a parameter server. 
Li et al. (NPL: “Scaling Distributed Machine Learning with the Parameter Server”): discloses a distributed training system involving a parameter server.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHMUEL Y. WEINFELD whose telephone number is (571)272-9893.  The examiner can normally be reached on Mon-Fri 08:00-17:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SHMUEL Y WEINFELD/Examiner, Art Unit 2126                                                                                                                                                                                                        /ANN J LO/Supervisory Patent Examiner, Art Unit 2126