DETAILED ACTION
This Non-Final Office Action is responsive to Applicant’s Amendment filed on 8 Dec 2020 in which claims status is: 
Amended Claims: 1, 7, and 13 are amended
Canceled Claims: 2, 5-6, 8, 11-12, 14, and 17-18.
Claims 1, 3-4, 7, 9-10, 13, and 15-16 are currently pending and under examination, of which claims 1, 7, and 13 are independent claims. No claims are currently in condition for allowance.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
An additional request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/8/2020 has been entered.
 
Response to Remarks
Applicant’s remarks dated 12/08/2020 regarding the prior art have been considered, but they are moot in view of the new grounds of rejection as necessitated by applicant’s amendments. Examiner has updated search and consideration to reflect present status of claims. Newly identified art comprises more recent work of author Kadav as well as Lian and LeCun. Note that dependents have art rejection in the alternative. Further, see additional pertinent art in conclusion. 

Claim Interpretation
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. In view of the foregoing, examiner notes the independent claims disclose “each different portion at each computing node is collectively used to train an untrained data analytics model”. The term “collective” is afforded little weight as it is not used in the specification and does not impart special meaning to training under which temporal resolution could be interpreted either sequential or parallel. The term “untrained model” is interpreted as a state or version of a model, e.g., initial or current model.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3-4, 7, 9-10, 13, and 15-16 (all pending claims) are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the Examiner applies the 2019 USPTO Patent Eligibility Guidelines. (2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50, Jan. 7, 2019.) which is now largely incorporated into MPEP 2106.
Step 1: Is the claim to a process, machine, manufacture, or composition of matter? Yes—the claims are all directed to one of the four statutory categories. Claims 1 and 3-4 are method/process, claims 7 and 9-10 are computer program product/manufacture, claims 13 and 15-16 are system/machine.
Step 2A, prong one: Does the claim recite an abstract idea, law of nature or natural phenomenon? Yes—the claims are directed to an abstract idea as distributed model training. Limitations comprise: 
dividing a dataset among a plurality of computing nodes, each computing node comprising at least one processor, memory, and communications circuitry, the memory of each computing node storing a different portion of the dataset, wherein each different portion at each computing node is collectively used to train an untrained data analytics model; 
performing a first training pass on the untrained data analytics model by: 
receiving at a first computing node the untrained data analytics model; 
training the untrained data analytics model by integrating the portion of the dataset stored on the first computing node into the data analytics model; 
repeating the training of the data analytics model by transmitting the data analytics model to a next computing node and training the data analytics model at the next node by integrating the portion of the dataset stored on the next node into the data analytics model and continue the training of the data analytics model until the data analytics model has been trained by the plurality of computing nodes; 
wherein the training comprises transmitting, in response to the untrained data analytics model completing a first pass through each computing node of the plurality of computing nodes, the untrained data analytics model back to the first computing node and the next computing node; and 
outputting the trained data analytics model, further comprising: 2Application No. 15/227,101 Attorney Docket No. CH920160046US1 
training a plurality of data analytics models, wherein at least some of the plurality of data analytics models are independent of each other; and 
training the plurality of data analytics models simultaneously using the plurality of computing nodes, each data analytics model trained on the plurality of computing nodes using a different succession of computing nodes than the successions of computing nodes with which other data analytics models are trained.

The limitations, as drafted, amount to a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer functionality. That is, other than reciting “computing node comprising at least one processor, memory, and communications circuitry, the memory of each computing node storing a different portion of the dataset”, nothing in the claim element precludes the step from practically being performed in the mind. Therefore, the claim recites a mental process.
Step 2A, prong two: Does the claim recite additional elements that integrate the judicial exception into a practical application? No—the judicial exception is not integrated into a practical application. Although the claim recites that the recited functionality is performed by “computing node comprising at least one processor, memory, and communications circuitry, the memory of each computing node storing a different portion of the dataset”, the recited processor is recited at a high-level of generality such that it amounts to no more than a mere instructions to apply the exception using a generic computer component. Examiner provides additional supporting evidence by way of publications US9,984,337B2 and/or Xie et al., “Distributed Machine Learning via Sufficient Factor Broadcasting” by which more than one reference provide illustration of the technological environment as relates to the instant application Figure 1.
			Xie				          Kadav ‘337

    PNG
    media_image1.png
    320
    337
    media_image1.png
    Greyscale

    PNG
    media_image2.png
    455
    471
    media_image2.png
    Greyscale

	To this effect, examiner notes instant specification which states [0026] “one of ordinary skill in the art would recognize that these routines, along with the memory contents related to those routines, may not be included on one system or device, but rather may be distributed among a plurality of systems or devices, based on well-known engineering considerations. The present invention contemplates any and all such arrangements.” That is, well-known is established by applicant admission.
	Finally, examiner gives consideration to the improvement of computer functionality. In this regard, the training may be alleged as being optimized with respect to some pattern among nodes. Neither the claim itself nor the specification provides clear evidence in support of improvement to functionality. This is largely due to a lack of particularity in transforming a model through the training. In other words, one does not improve computer functionality by simply stating a betterment of learning, but by specifically providing a particular solution or at least by demonstrating some unexpected result. The entire field of machine learning, along with hundreds if not thousands of patent applications for distributed training, are all directed to some form of improvement, but arriving at improvement is by way of enriching the public knowledge of the means by which one arrives at a result. When considered as a whole, the specification is notably thin, totaling 37 paragraphs - inclusive of much form paragraph. 
It is not readily apparent that a clear derivation of the claimed model training is disclosed anywhere in the application. Instead, various functionalities are tied together with vague description in a way that would not reasonably apprise one of ordinary skill in the art as to the claimed technique. For example, some steps are “simultaneous” while others are in “succession” as a training pass. This would lead to lack of clarity as to whether the training is synchronous or asynchronous, or somehow both at the same time. It is inscrutable how a node is considered first node, next node, or final node when the training process is distributed as the title of the application points to or “simultaneous” in language of the claim. Remarks 12/08/2020 even states [P.9 ¶1] “there is no designated final node” which appears to be in direct conflict with drawings Fig 2 as filed. It seems that the claim intends to identify a process which becomes parallel responsive to sequential iteration, and does so without any clear guidance as to the handling of such a transition. Further issues are evidenced by lack of detail as to how the training of one model becomes training of a plurality of models, or how any model could be considered “untrained” – is a model still a model without trained parameters? The application’s background points to addressing parameter space which is not always convex, but is completely silent as to gradient calculation, and provides only cursory mention of error/cost or processing via traditional batching (in severe contrast to contemporary mini-batch). In understanding the application’s key functionality for distributed training, a skilled artisan should be apprised of knowing what calculation is performed where, when, how, and its inter-relation to the process as a whole. 
Moreover, the consideration of functionality may actually present an intractable solution since an arbitrarily large number of training nodes and training passes directly effects performance in yielding convergence of a model and there is no precise technique for handling of inter-nodal delays. Rather, applicant continues adding to the claim equating narrow with novel. However, refining a claim does not atone for a specification which fails to shed light into the heart of a process. The generally described idea with results-oriented solution does not reflect a level of quality to bestow patent eligibility by way of improvement to functionality of a computer. Applicant may wish to consider filing of CIP, continuation-in-part, in order to strengthen the application’s disclosure.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception? No—as noted above, the only limitation on the performance of the described method is that it must be performed by “computing node comprising at least one processor, memory, and communications circuitry”. The claim thus recites computing components only at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. The statement that the method is performed by computer does not satisfy the test of inventive concept and the courts have identified such elements as failing to add significantly more to the abstract idea. A recent example offers insight into findings with regard to improvement of computer functionality, with finding of ineligibility, see Simio, LLC v. Flexsim Software Products, Inc. (Fed Cir. 2020).
For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claims 7 and 13, which recite a computer program product and system, respectively, as well as to dependent claims 3-4, 9-10, and 15-16.
Dependent claims 3, 9, and 15 disclose “wherein the output trained data analytics model has accuracy to that of a data analytics model trained by training the data analytics model with the dataset sequentially”. This is considered part of the abstract idea, distributed training. The sequential nature is equivalent to independent claim “succession” or training passes. 
Dependent claims 4, 10, and 16 disclose “training the plurality of data analytics models, the plurality of data analytics models resulting from varying and choosing different combinations of model structure, model meta-parameters that are not learned though training, and training algorithm parameters”. This is considered part of the abstract idea as different combinations amount to the pattern of training. The meta-parameters not learned through training amount to mere data-gathering, see MPEP 2106.05 (g) and (h).
Taken alone, their additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-4, 7, 9-10, 13, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over: 
Xu et al., US Patent 8,027,938B1, hereinafter Xu, in view of 
Kadav et al., US Patent No 10,679,145B2, hereinafter Kadav, in view of 
Chen et al., US PG Pub No 20190050743A1, hereinafter Chen, in view of
Xing et al., “Petuum: A New Platform for Distributed Machine Learning on Big Data”, hereinafter Xing.
With respect to claim 1, Xu teaches: 
	A computer-implemented method for training of data analytics models {Xu Fig 3; [Abstract] “Systems, methods, and apparatuses” for [Col2 Lines23-44] “distributing portions of training data… Training the model includes distributing the model to each worker”} comprising: 
	dividing a dataset among a plurality of computing nodes {Xu Fig 3-302 “Separate training data among worker processes” worker is computing node; [Col6 Lines13-26] “training data is evenly divided among the workers”}, each computing node comprising at least one processor, memory, and communications circuitry {Xu Fig 4 illustrated computer hardware; [Col11 Lines59-63] “the system can be interconnected… communication network”}, the memory of each computing node storing a different portion of the dataset {Xu [Col4 Lines6-16] “Each worker carries out the computation (e.g., according to the received model) for a portion of training data assigned to the respective worker” assigned portions}, wherein each different portion at each computing node is collectively used to train an untrained data analytics model {Xu [Col3 Lines59-67] “the master 102 initializes the computation operation (e.g., the model) at a beginning of a discriminative learning” initialized model having [Col6 Lines27-43] “initial parameter values… If the predicted output y matches the correct output y*, then the model parameters are unchanged” and [Col8 Lines64-67] “The master initiates each subsequent iteration using an updated model after receiving updates from all workers”};
	performing a first training pass on the untrained data analytics model {Xu Fig 3 iterative loop to update model parameters, i.e., training} by: 
	receiving at a first computing node the untrained data analytics model {Xu Fig 3-306 broadcast model to worker and Fig3-308 worker “received model”. See also Fig 2-202/04 received model, training examples}; 
	training the untrained data analytics model by integrating the portion of the dataset stored on the first computing node into the data analytics model {Xu Fig 3-308 “Process data at each worker” by [Col2 Lines 14-17, 59-61] “Each worker can independently process the respective portion of the training data in an online manner to generate worker level updates to model parameters… generating parameter updates by applying the portion of the training data assigned to each worker to the updated model of the iteration” where a first iteration pass is current/untrained model}; 
	repeating the training of the data analytics model by transmitting the data analytics model to a next computing node and training the data analytics model at the next node by integrating the portion of the dataset stored on the next node into the data analytics model and continue the training of the data analytics model until the data analytics model has been trained by the plurality of computing nodes {Xu Fig 3-312 “Send updates up from each lower level worker process and combine updates received from lower level worker processes” as [Col8 Lines61-63] “each worker receives an updated model that includes parameter updates from all other workers” and/or [Col7 Lines17-18] “first level of workers, which in turn broadcasts the model to the next level of workers. The model is broadcast level-by-level until all workers include the model”. [Col6 Lines41-42] “The process is repeated for each training example assigned to the respective worker”}; and 
	outputting the trained data analytics model {Xu [Col1 Line60] “distributing the updated model” having learned parameters [Col5 Line63] “the system uses 208 the model including the learned parameter values”}, further comprising: 
	Xu discloses a methodology substantively akin to the claimed process. This is deduced quite readily from a simple side-by-side comparison of flow-charts as illustrated below
Instant Application 			      Prior Art: Xu Fig 3

    PNG
    media_image3.png
    535
    339
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    661
    440
    media_image4.png
    Greyscale

However, Xu does not teach the amended limitation which is largely due to the master being static. This deficiency is cured by Kadav
Kadav teaches:
wherein the training comprises transmitting, in response to the untrained data analytics model completing a first pass through each computing node of the plurality of computing nodes, the untrained data analytics model back to the first computing node and the next computing node; and 
Kadav discloses distributed training for model replicas in parallel. The functionality is intuitively illustrated by Figure 3 below and aligns appropriately with remarks 12/08/2020 [P.9 ¶4] which describe the functionality in terms of an environment having nodes 1, 2, 3 and models X, Y, Z. Kadav Figs 3-4 illustrate a triad of nodes and trains model replicas (MR) in parallel, see [Col2 Lines59-62] “plurality of model replicas train in parallel using parameter updates. The model replicas train and compute new model weights. They send/receive parameters from everyone and apply them to their own model”. Further, Fig 4 illustrates “Send model computed after <N> iterations to all other replicas” strongly supports sending the model among distributed nodal environment. Additionally, [Col1 Lines27-63] “different models process at different speeds… mismatch in processing abilities on different computer learning nodes” is a distribution of modeling among different nodes.
Prior Art: Kadav Fig 3			   Prior Art: Kadav Fig 4

    PNG
    media_image5.png
    270
    306
    media_image5.png
    Greyscale
 
    PNG
    media_image6.png
    458
    641
    media_image6.png
    Greyscale

	Kadav is directed to distributed model training thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to distribute model parallelization according to Kadav in combination with Xu in order to provide the benefit of “dynamically adjusting the communication batch size to balance computation and communication overhead and ensuring convergence even with a mismatch in processing abilities… More accurate models, with faster training time” (Kadav [Col1 Lines48-51], [Col2 Lines21-39]).
	However, the combination of Xu and Kadav does not expressly disclose wherein “models are independent of each other”.
	Chen teaches:
further comprising: training a plurality of data analytics models, wherein at least some of the plurality of data analytics models are independent of each other {Chen [0005] “a plurality of updated local models from the plurality of worker nodes, wherein each updated local model is generated by one of the plurality of worker nodes independently”. Models broadcast over training cycles for data splits [0064], [0080]}; and 
Chen is directed to distributed model training thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to specify the different models of Kadav as being independent as set forth by in order to train a wide mixture of parameter sets thereby “speed up training of the learning machine and improve accuracy” (Chen [0014], [0045]).
Finally, the combination of Xu, Kadav, and Chen does not teach “each data analytics model trained on the plurality of computing nodes using a different succession”.
Xing teaches:
training the plurality of data analytics models simultaneously using the plurality of computing nodes {Xing [P.52 Sect2.2-2.3] “model-parallel ML, the model A is partitioned and assigned to workers p=1..P and updated therein in parallel” is simultaneous}, each data analytics model trained on the plurality of computing nodes using a different succession of computing nodes than the successions of computing nodes with which other data analytics models are trained {Xing [P.54 Sect3.2.2 – 3.2.3] “allowing users to control which model parameters are updated by worker machines. This is performed through a user-defined scheduling function ()… scheduling control channel (Fig 5)… Several common patterns for schedule design are worth highlighting: fixed-scheduling (schedule_fix()) dispatches model parameters A in a predetermined order; static, round-robin schedules… Dependency-aware (schedule_dep()) scheduling… prioritized scheduling… Each worker p receives parameters to be updated from schedule()” whereby a user defined scheduling according to a plurality of patterns is differing nodal successions. Figs 6/7 scheduling}.
Xing discloses distributed model training whereby [P.52 Sect2.2-2.3] “model-parallel ML, the model A is partitioned and assigned to workers p=1..P and updated therein in parallel” is simultaneous. Further, [P.54 Sect3.2.2 – 3.2.3] “allowing users to control which model parameters are updated by worker machines. This is performed through a user-defined scheduling function ()… scheduling control channel (Fig 5)… Several common patterns for schedule design are worth highlighting: fixed-scheduling (schedule_fix()) dispatches model parameters A in a predetermined order; static, round-robin schedules… Dependency-aware (schedule_dep()) scheduling… prioritized scheduling… Each worker p receives parameters to be updated from schedule()” wherein a user defined scheduling according to a plurality of patterns is differing nodal successions. Figs 6/7 scheduling
Xing is directed to distributed model training thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to combine the scheduler of Xing with the models of Xu, Kadav, and Chen in order to give the user control over scheduling of training over sequential nodes. In resolving the level of ordinary skill in the art, examiner notes Xing [P.53 ¶1-3] “One could certainly modify Hadoop’s or Spark’s built-in schedulers to expose the required level of control, but we do not consider this reasonable for the average ML practitioner without strong systems expertise… A core goal of Petuum is to allow practitioners to easily implement data-parallel and model-parallel ML algorithms. Petuum provides APIs to key systems that make data- and model-parallel programming easier”. Thus, by implementing the Petuum scheduler for fine-grained control over nodal ordering, one would reduce the complexity burden for one of ordinary skill.

With respect to claim 3, the combination of Xu, Kadav, Chen, and Xing teaches the method of claim 1, wherein  
	the output trained data analytics model has accuracy to that of a data analytics model trained by training the data analytics model with the dataset sequentially.
Xu teaches evaluation of model accuracy by threshold scoring, see [Col2 Lines 17-22, 50-55] “Generating the updated model can further include comparing parameter updates for each feature to a threshold contribution score and when the parameter updates for a particular feature are below a threshold contribution score, updating the model without updating the parameter value for the particular feature” and [Col5 Lines11-15] “compute a best scoring output according to (Equation)”. See also loss function [Col7 Line5].

With respect to claim 4, the combination of Xu, Kadav, Chen, and Xing teaches the method of claim 1, further comprising: 
	training the plurality of data analytics models, the plurality of data analytics models resulting from varying and choosing different combinations of model structure, model meta-parameters that are not learned through training, and training algorithm parameters.
Xing discloses exploration of parameter space per [P.52 Sect2.2 ¶3] “scheduling function S() opens up a large design space, such as fixed, randomized, or even dynamically-changing scheduling on the whole space, or a subset of, the model parameters” and [P.55 Sect4.2 ¶2] “selects a random subset of parameters to be updated in parallel”. Further, meta-parameters that are not learned through training amount to design choice as is established by the claim language “not learned through training”.
One of ordinary skill in the art would have considered it obvious prior to the effective filing date to implement a randomized scheduling function as disclosed by Xing in combination with the plurality of models disclosed by Kadav because “the model state A can be easily synchronized… the scheduler may use the new model state to generate future scheduling functions” (Xing [P.54 Sect3.2.3 ¶1]) and/or because it “enables ML programs to analyze dependencies at run time (implemented via schedule()), and selects the subsets of independent (or nearly-independent) parameters for parallel updates” (Xing [P.59 Sect6.2 ¶2]).

With respect to claim 7, Xu teaches: 
	A computer program product for training of data analytics models, the computer program product comprising a non-transitory computer readable storage having program instructions therewith, the program instructions executable by a computer, to cause the computer to perform a method comprising: 
Xu teaches [Col10 Lines35-56] “computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution… storage device… software application” illustrated computers and hardware of Fig 4. [Col2 Lines23-44] “method is provided… Training the model includes distributing the model to each worker”
	The remainder of this claim is rejected for the same rationale as claim 1.

Claims 9-10 are rejected for the same rationale as claims 3-4, respectively.

With respect to claim 13, Xu teaches: 
	A system for training of data analytics models, the system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor to perform: 
Xu teaches [Col1 Line41] “Systems, methods, and apparatuses” Fig 4 illustrated computers and hardware with memory, processor, and bus for interconnect; [Col6 Lines7-13] “online training of model parameters… system”; [Col10 Lines35-56] “instructions encoded on a computer readable medium for execution”.

Claims 15-16 are rejected for the same rationale as claims 3-4, respectively.

Claims 3, 9, and 15 are rejected in the alternative under 35 U.S.C. 103 as being unpatentable over Xu, Kadav, Chen, and Xing in view of: 
Lian et al., “Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization”, hereinafter Lian. Examiner notes Lian cover page having conflicting dates, accordingly examiner has appended multiple bibliographic support documents at end of reference.
With respect to claim 3, the combination of Xu, Kadav, Chen, and Xing teaches the method of claim 1. Lian teaches wherein 
	the output trained data analytics model has accuracy to that of a data analytics model trained by training the data analytics model with the dataset sequentially.
Lian discloses [P.7-8 PgBrk] “an important observation is that as long as the number of workers (which is proportional to T) is bounded by O(√(K/M)), the iteration complexity to achieve the same accuracy level will be roughly the same. In other words, the average work load for each worker is reduced by the factor T comparing to the serial SGD” emphasis same accuracy level, serial=sequential.
	Lian is directed to distributed model training thus being analogous. A person having ordinary skill in the art would have considered it obvious prior to the effective filing date to implement accuracy constraint as detailed by Lian in combination with the other references as applying a known technique to a known method to yield predictable results and/or in order to achieve linear speedup with a convergence rate consistent with serial SG for convex, synchronous parallel (or mini-batch) SG for convex optimization, and nonconvex smooth optimization (Lian [P.7-8 PgBrk]). This method is enabled by way of algorithmic pseudocode and extensive statistical derivation.

Claims 9 and 15 are rejected for the same rationale as claim 3.

Claims 4, 10, and 16 are rejected in the alternative under 35 U.S.C. 103 as being unpatentable over Xu, Kadav, Chen, and Xing in view of: 
LeCun et al., “Deep learning with Elastic Averaging SGD”, hereinafter Lecun.
With respect to claim 4, the combination of Xu, Kadav, Chen, and Xing teaches the method of claim 1, further comprising: 
	training the plurality of data analytics models, the plurality of data analytics models resulting from varying and choosing different combinations of model structure, model meta-parameters that are not learned through training, and training algorithm parameters.
LeCun discloses elastic averaging SGD which [P.7 ¶1] “allows for more exploration of the parameter space”.
One of ordinary skill in the art would have considered it obvious prior to the effective filing date to implement elastic averaging in combination with the other references for the benefit or reducing test error and enhance stability in a round-robin scheme (LeCun [P.7 Last¶], [P.3]).

Claims 10 and 16 are rejected for the same rationale as claim 4.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Meng et al., “Asynchronous Accelerated Stochastic Gradient Descent” published in IJCAI discloses AASGD with partial gradients.
Chen et al., “Revisiting Distributed Synchronous SGD” Google Brain discloses backup workers with sync/async and convergence analysis.
Odena, Augustus, “Faster Asynchronous SGD” discloses FASGD with timestamped updates and step-staleness of gradient.
Ma et al., “Theano-MPI: a Theano-based Distributed Training Framework” utilizes both sync and async with parallel loading and “collective” synchronization.
He et al., “Large Scale Distributed Hessian-Free Optimization for Deep Neural Network” discloses algorithms for non-convex optimization and model/data parallelism
Yan et al., “Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems” discloses model replicas across defined parallel config, see Fig 7.
Xing et al., “Strategies and Principles of Distributed Machine Learning on Big Data” most recent work of same author Xing, see Fig 3.
Gupta et al., US Patent No 10,755,152B2 MIT discloses partial training.
Cruz Mota et al., US PG Pub No 20150193695 A1 “Distributed Model Training” Cisco discloses [0074] “collectively training a machine learning model” [claims 7-8] “determining a sequence of training devices… sequentially train the ML model”
Scardapane, Simone, “Distributed Supervised Learning using Neural Networks” thesis with extensive detail comprising “first node” and “final node” for distributed training.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chase P Hinckley whose telephone number is (571)272-7935.  The examiner can normally be reached on M-F 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda M. Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/CHASE P. HINCKLEY/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124