Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 1, 6-7, 18 are objected to because of the following informalities: 
In Claim 1, line 2, “a model” was probably meant to be the model. The same objection is made for Claim 18 at line 7.
In Claim 6, lines 4-5, “a number of simultaneous threads” was probably meant to be the number of simultaneous threads.
In Claim 7, line 3, “the subset of training data” was probably meant to be a subset of the training data (or maybe the claim should be dependent on Claim 4).
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 8 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 8, line 2, recites the limitation “progress of pre-processing”. It is unclear in the claim what the progress of pre-processing is referencing (is this for: a subset of the training data; or maybe the claim should be dependent on Claim 7 with appropriate amendments to both claims to negate any 35 USC 112 issues).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Looking at similar independent Claims 1 and 18 we see limitations directed towards training of models with associated identifiers, that are machine learning models, in a distributed system of processors with attributes that includes the processing thread capability of some of the processors, and aggregating of the trained models in the distributed system of processors based on the memory capacities of some of the processors. The training of the models being based on training data and definitions or parameters for the models. These limitations, under their broadest reasonable interpretation, are directed towards mathematical relationships and calculations and fall under the “Mathematical Concepts” grouping of abstract ideas. That is the training and aggregating of machine learning models are based on mathematical relationships and calculations. Dependent Claims 2-17 and 19-20 that discusses user inputs for pre-processing the training data, metrics for evaluating the performance of the model, an input pipeline to receive the training data, generating processing threads based on the number of copies of the model trained or the size of the training data, changing the number of threads for data intake or pre-processing, moving portions of the training data to different sets of processors for training different copies of the model, scaling the processing threads based on the amount of training data received or an amount of pre-processed training data, aggregating the models based on performance data or weights, selecting processors based on evaluating their number of cores and/or memory, the identifiers being unique, and the selection of the processor being based on size or speed; are all considered insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g). 
This judicial exception is not integrated into a practical application. The additional elements of “processors” or “computing devices” or “computing system” as recited in the claims for implementing the limitations of the claims are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claims are therefore directed to an abstract idea.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements for implementing the limitations of the claims amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are therefore not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Strom, US 10,152,676 B1, in view of Wesolowski, US 2019/0114537 A1.

Regarding Claim 1, Strom teaches:
A method for training a model on a distributed system, comprising: receiving, by the distributed system, user inputs including definitions for a model and training data for training the model (Abstract; C1, L28-34: distributed training of models using training data and parameters/definitions for the model); 
identifying, by the distributed system, a plurality of available processors having one or more attributes, the plurality of available processors being located on a plurality of computing devices in the distributed system (Abstract; C6, L3-6: training models over multiple computing nodes/processors that are individual computing devices that would have their own attributes/features); 
generating, by the distributed system, a copy of the model on each of the first subset of processors, each copy of the model having an identifier associated therewith (Abstract; C4, L49-51; C9, L18-27: the model training nodes obtain a copy of the model to be trained, and an identifier is used to identify model parameters used for a specific model trained at a particular model training node which would therefore also identify the copy of the model); 
training, by the distributed system, the copies of the model on the first subset of processors (C7, L7-12: training the copies of the models on each of the model training nodes);
and aggregating, by the distributed system based on the identifiers of each copy of the model, the trained copies of the model on the second subset of processors (C7, L64 to C8, L2; C9, L35-37: aggregating partial gradients of each copy of the trained model to get a current trained model. Examiner’s note: Wu, WO 2019/042571 A1, also teaches this, see for example Abstract).
Strom may not have explicitly taught:
automatically selecting, by the distributed system based on the one or more attributes, a first subset of processors among the available processors to train the model, the processors in the first subset each being configured to handle a threshold amount of simultaneous processing threads; 
automatically selecting, by the distributed system, a second subset of processors among the available processors to aggregate training results, the processors in the second subset each having a threshold amount of memory for aggregation.
However, Wesolowski in a similar field of endeavor shows (paragraphs 5, 60: wherein it discussed a master machine learning control system or scheduler that determines access to different computing systems depending on task requirements that includes memory and parallel processing threads requirements. Examiner’s note: see also Sridharan, US 2018/0322387 A1, paragraphs 53, 271, 294).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the teachings of Wesolowski with that of Strom for using different sets of processors to train the models and aggregate training results based on their processing threads and memory requirements respectively.
The ordinary artisan would have been motivated to modify Strom in the manner set forth above for the purposes of establishing access to different types of computing systems configured for different types of primary tasks used for training machine learning models with their own respective memory and parallel processing threads capabilities [Wesolowski: paragraph 5].

Regarding Claim 2, Wesolowski further teaches:
The method of claim 1, wherein the user inputs further include instructions for pre- processing the training data (Fig. 3; paragraph 44: preprocessing the training data. Examiner’s note: see also Strom, C7, L39-42).

Regarding Claim 3, Wesolowski further teaches:
The method of claim 1, wherein the user inputs further include metrics for evaluating performance of the model (paragraph 68: adjusting hyper-parameters in accordance with performance characteristics).

Regarding Claim 4, Strom teaches:
The method of claim 1, further comprising: generating, by the distributed system, an input pipeline to receive a subset of the training data for each of the plurality of computing devices on which one or more copies of the model are to be trained (C7, L7-12: data flow pipeline for training the models based on a subset of the training data).

Regarding Claim 5, Wesolowski further teaches:
The method of claim 4, further comprising: generating, by the distributed system, a number of simultaneous processing threads to perform data intake at each of the input pipelines, wherein the number of simultaneous processing threads for data intake is scaled by a number of copies of the model to be trained on a respective computing device (paragraph 5: has as many processing cores for parallel training of the multiple neural network models).

Regarding Claim 6, Wesolowski further teaches:
The method of claim 5, further comprising: monitoring, by the distributed system, progress of data intake on the first subset of processors; and changing, by the distributed system based on the progress, a number of simultaneous threads for the data intake for one or more of the first subset of processors (paragraphs 26, 63: monitoring progress/performance of each computing machine and if necessary transfer execution to another machine).

Regarding Claim 7, Wesolowski further teaches:
The method of claim 1, further comprising: generating, by the distributed system, a number of simultaneous processing threads to pre-process the subset of training data received at each of the plurality of computing devices, wherein the number of simultaneous processing threads for pre-processing is scaled by a size of the subset of training data received at a respective computing device (paragraph 71: wherein the batch size of the training data determines the appropriate machine used for training).

Regarding Claim 8, Wesolowski further teaches:
The method of claim 1, further comprising: monitoring, by the distributed system, progress of pre-processing on the first subset of processors; and changing, by the distributed system based on the progress, a number of simultaneous threads for pre-processing for one or more of the first subset of processors (paragraphs 26, 63: monitoring progress/performance of each computing machine and if necessary transfer execution to another machine).

Regarding Claim 9, Wesolowski further teaches:
The method of claim 1, further comprising: monitoring, by the distributed system, progress of training on the first subset of processors; and either: changing, by the distributed system based on the progress, a number of simultaneous threads for training for one or more of the first subset of processors, or moving, by the distributed system based on the progress, a portion of training data on a first processor of the first subset of processors for training a first copy of the model to a second processor of the first subset of processors for training a second copy of the model (paragraphs 26, 63: monitoring progress/performance of each computing machine and if necessary transfer execution of a portion or training of the machine learning model to another machine). 

Regarding Claim 10, Wesolowski further teaches:
The method of claim 1, further comprising: generating, by the distributed system, a number of simultaneous processing threads to train copies of the model on each of the plurality of computing devices, wherein the number of simultaneous processing threads for training is scaled by either (i) an amount of the subset of training data received at a respective computing device, or (ii) an amount of pre-processed training data on the respective computing device (paragraph 71: wherein the batch size or subset of the training data determines the appropriate machine used for training). 

Regarding Claim 11, with Strom teaching aggregating of the models as previously pointed out, Wesolowski further teaches:
The method of claim 1, further comprising: generating, by the distributed system, performance data for each of the trained copies of the model; and comparing, by the distributed system, the trained copies of the model based on the performance data, wherein aggregating the trained copies of the model is based on the comparison (paragraph 26: monitoring performance).

Regarding Claim 12, Strom teaches:
The method of claim 11, further comprising selecting, by the distributed system based on the comparison, a subset of the trained copies of the model, wherein only the selected subset of trained copies are aggregated (C7, L64 to C8, L2; C9, L35-37: aggregating partial gradients of each copy of the trained model to get a current trained model. Examiner’s note: Wu, WO 2019/042571 A1, also teaches this, see for example Abstract).

Regarding Claim 13, with Strom teaching aggregating of the models as previously pointed out, Wesolowski further teaches:
The method of claim 11, further comprising assigning, by the distributed system based on the comparison, weights to the trained copies of the model, wherein the trained copies of the model are aggregated based on the assigned weights (paragraph 22: check points that can include an iteration number or weight value is used in the machine learning model training).

Regarding Claim 14, Wesolowski further teaches:
The method of claim 1, wherein selecting the first subset of processors and selecting the second subset of processors is based on evaluating a number of cores in each of the plurality of available processors (paragraphs 5, 62: evaluating the processing cores).

Regarding Claim 15, Wesolowski further teaches:
The method of claim 1, wherein: each of the first subset of processors has at least a first number of cores and a first amount of memory, each of the second subset of processors has no more than a second number of cores and a second amount of memory, the first number of cores being greater than the second number of cores, and the first amount of memory is smaller than the second amount of memory (paragraphs 19-20: the different computing machines having differing memory capacities).

Regarding Claim 16, Strom teaches:
The method of claim 1, wherein the identifiers are unique identifiers (C9, L25-27: identifier is an integer value which is a unique number).

Regarding Claim 17, Wesolowski further teaches:
The method of claim 1, wherein selecting the first subset of processors is based on at least one of a size of the model or a speed at which the model can be trained by the plurality of available processors (paragraph 5: processors used based on size and/or speed).

Claims 18-20 are similar to Claims 1, 9 and 11 respectively and are rejected under the same rationale as stated above for those claims.

Examiner's Note:
The Examiner cites particular pages, sections, columns, line numbers, and/or paragraphs in the references as applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner and the additional related prior arts made of record that are considered pertinent to applicant's disclosure to further show the general state of the art. The Examiner's interpretations in parenthesis are provided with the cited references to assist the applicants to better understand how the examiner interprets the prior art to read on the claims. Such comments are entirely consistent with the intent and spirit of compact prosecution.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892 for the relevant pertinent prior art relating to this application where for example Tan, US 2018/0240011 A1, teaches distributed training of machine learning models.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVE MISIR whose telephone number is (571)272-5243. The examiner can normally be reached M-R 8-5 pm, F some hours.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on 5712703169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVE MISIR/Primary Examiner, Art Unit 2127