DETAILED ACTION
Status of Claim
This action is in response to the amendment filed on 10/5/2021. Claim 1 – 2, 5, 9 – 14, 16, 36 – 37, 40, 43 – 47, 50 and 68 are pending and have been examined. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 8/10/2018 and 10/5/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:


(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1, 9 – 11, 16, 36, 43 – 45, 50 and 68 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Grau, Parallel Computing for Neural Networks, Rochester Institute of Technology RIT, EECC756 Course Slide, http://meseec.ce.rit.edu/756-projects/spring2013/1-4.pdf, 2013.

Regarding Claim 1, Grau discloses: a system, comprising: at least one processor, and at least one memory including program code which when executed by the at least one processor provides operations comprising (Grau, page 12, where the system in each implementation using servers, each have processor, memory and program codes to perform tasks): 
partitioning, based at least on a resource constraint of a platform, a global machine learning model into a plurality of local machine learning models (Grau, page 14 – 15, where considering the communication bandwidth [resource constraints], partition the model to be performed in multiple processing units);
each local machine learning model of the plurality of local machine learning models having a subset of a plurality of neurons and interconnections included in the global machine learning model (Grau, page, 12, where in the neuron level parallelism, each of the 3 servers having a subset of neurons and interconnections as local machine learning model of the global model); 
transforming training data to at least conform to the resource constraint of the platform (Grau, page 12, where the training data is split [transformed] based on the input layer of the 3 local models; page 14 – 15, where the design of the local model are based on the communication constraint [resource constraints])
and training the global machine learning model by at least processing, at the platform, the transformed training data with a first local machine learning model of the plurality of local machine learning models (Grau, page. 16, where each processor perform full sequential training of the network for one portion of the training set; page. 12, where training data is trained in the model [first local machine learning model] on MARS server).
Regarding Claim 9, depending on Claim 1, Grau further discloses: wherein the training of the global machine learning model further comprises processing the transformed training data with a second of the plurality of local machine learning models (Grau, page, 12, where training data is trained in the model [second of the plurality of local machine learning model] on VENUS server).

Regarding Claim 10, depending on Claim 9, Grau further discloses: wherein the transformed training data is processed, at the platform, with the first local machine learning model and with the second local machine learning model in parallel, when the resource constraint of the platform enables the transformed training data to be processed in parallel at the platform with the first local machine learning model and with the second local machine learning model (Grau, page, 12, where the model [first local machine learning model] on MARS server and the model [second local machine learning model] on VENUS server are run parallel; page 14 – 15, where the design of the local model are based on the communication constraint [resource constraints]).

Regarding Claim 11, depending on Claim 9, Grau further discloses: wherein the transformed training data is processed with the first local machine learning model at the platform, wherein the transformed training data is processed with the second local machine learning model at another platform, and wherein the transformed training data is processed at the platform and at the other platform in parallel (Grau, page, 12, where training data is processed by the model [first local machine learning model] on MARS server [the platform] and by the model [second local machine learning model] on VENUS server [another platform] in parallel).

Regarding Claim 16, depending on Claim 1. Grau further discloses: 
wherein the global machine learning model comprises a first neural network having the plurality of neurons and interconnections, wherein each local machine learning model comprises a second neural network that is formed by a depth first partitioning of the first neural network, wherein the second neural network comprises a same number of layers as the first neural network, and wherein the depth first partitioning of the first neural network enables the transformed training data to be processed at least a second time with the first local machine learning model prior to updating the global machine learning model (Grau, page. 12, where in network level parallelism setting, the global ensemble model [first neural network model] is an ensemble of models of rank 1 – 4; the model [local machine learning model] in MARS server comprise rank1 [second neural network] and rank 4 model; rank 1 model has the same number of layers as the global ensemble model [first neural network model]; page 16 – 17, where each perform full sequential training of the network and can be processed independently [depth first partitioning]). 

Regarding Claim 36, Claim 36 is the computer implemented method claim corresponding to Claim 1. Claim 36 is rejected for the same reason as Claim 1.

Regarding Claim 43 – 45, Claim 43 – 45 are the computer implemented method claim corresponding to Claim 9 – 11. Claim 43 – 45 are rejected for the same reason as Claim 9 – 11.  

Regarding Claim 50, Claim 50 is the computer implemented method claim corresponding to Claim 16. Claim 50 is rejected for the same reason as Claim 16.

Regarding Claim 68, Claim 68 is the non-transitory computer readable storage medium claim corresponding to Claim 1. Grau further discloses: a non-transitory computer-readable storage medium including program code which when executed by at least one data processor causes operations (Grau, page 12, where the neural network are executed on computer servers each of which has storage medium and program code to be executed by processor). Claim 68 is rejected for the same reason as Claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims, the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103, which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or no obviousness.

Claim 2, 12, 37 and 46 are rejected under 35 U.S.C. 103 as being unpatentable over Grau, Parallel Computing for Neural Networks, Rochester Institute of Technology RIT, EECC756 Course Slide, http://meseec.ce.rit.edu/756-projects/spring2013/1-4.pdf, 2013 in view of Dean, Large Scale Distributed Deep Networks NIPS 12 Proceedings of the 25th International Conference on Neural Information Processing Systems, Dec. 2012.

Regarding Claim 2, depending on Claim 1. Grau discloses the method of Claim 1, Grau further discloses: 
backward propagating an error in an output of the processing of the transformed training data with the first local machine learning model, the error being backward propagated through the first local machine learning model (Grau, page. 16, where during back-propagation training, each process [first local machine learning ] perform full sequential training of the network for one portion of the training set; i.e., the error is backward propagated through each of the local machine learning model)
minimizing the error by at least adjusting a parameter applied by the first local machine learning model (Grau, page. 6, where training cycle adjust weights [parameter] to improve the performance [minimizing the error]),
updating, based at least on the adjusting of the parameter at the first local machine learning model, a corresponding parameter at the global machine learning mode (Grau, page.  16, where parameter are merged to the global model based on each of the local models);
Grau does not explicitly disclose:
adjusting a frequency of updating the corresponding parameter at the global machine learning model, the adjusting of the frequency being based at least on a communication cost of the updating and a computation cost of having a stale parameter at the global machine learning model.
Dean explicitly disclose: 
adjusting a frequency of updating the corresponding parameter at the global machine learning model, the adjusting of the frequency being based at least on a communication cost of the updating and a computation cost of having a stale parameter at the global machine learning model (Dean, sec. 4.1, para. 4, where reduce the communication overhead [communication cost] … by limiting each model … send updated gradient values [updating corresponding parameter at global machine learning model] only every npush steps [frequency]; para. 5, where asynchronous learning is more robust to machines failures … due to … computing its gradients based on a set of parameters that are slightly out of date [stale parameter] … in practice we found relaxing consistency requirements to be remarkably effective; machine failures increase computation cost, adjust the update frequency to have relaxed consistency is effective way to mitigate the risk of machine failure).
Grau and Dean both disclose distributed machine learning model and are analogous. It would have been prima facie obvious to one of ordinary sill in the art before the effective filing date of the claimed invention to combining Grau’s teaching of parallel processing of neural network with Dean’s teaching of large scale distributed learning framework to achieve the claimed invention. One of the ordinary skill in the art would have motivated to make this modification in order to reduce communication overhead (Dean, sec. 4.1, para. 4, ln. 1), reduce machine failure in the asynchronous learning (Dean, sec. 4.1, para. 5) and improve performance (Dean abs. ln. 2). 

Regarding Claim 12, depending on Claim 9. Grau discloses the method of Claim 9, Grau further discloses: 
wherein the transformed training data is processed, at the platform with the first local machine learning model and with the second local machine learning model … when the resource constraint of the platform enables the transformed training data to be processed sequentially at the platform (Grau, page, 12, where the model [first local machine learning model] on MARS server and the model [second local machine learning model] on VENUS server are run parallel; page 14 – 15, where the design of the local model are based on the communication constraint [resource constraints])
Grau did not explicitly disclose: 
wherein the transformed training data is processed, at the platform, with the first local machine learning model and with the second local machine learning model in sequence, when the resource constraint of the platform enables the transformed training data to be processed sequentially at the platform with the first local machine learning model and with the second local machine learning model.
Dean explicitly disclose: 
wherein the transformed training data is processed, at the platform, with the first local machine learning model and with the second local machine learning model in sequence, when the resource constraint of the platform enables the transformed training data to be processed sequentially at the platform with the first local machine learning model and with the second local machine learning model (Dean, fig. 1, where the local model on machine 3 and machine 1 are in sequence using all available CPU cores [resource constraint]).
The reason for combine Grau and Dean’s teaching is same as Claim 2.

Regarding Claim 37, Claim 37 is the computer implemented method claim corresponding to Claim 2. Claim 37 is rejected for the same reason as Claim 2.

Regarding Claim 46, Claim 46 is the computer implemented method claim corresponding to Claim 12. Claim 46 is rejected for the same reason as Claim 12.

Claim 5 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Grau, Parallel Computing for Neural Networks, Rochester Institute of Technology RIT, EECC756 Course Slide, http://meseec.ce.rit.edu/756-projects/spring2013/1-4.pdf, 2013 in view of Sorzano, A Survey of Dimensionality Reduction Technique, arXiv, 2014.

Regarding Claim 5, depending on Claim 1. Grau discloses the method of Claim 1, Grau further discloses: 
 wherein the first input is processed in parallel with the second input, when the resource constraint of the platform enables the first input to be processed in parallel at the platform with the second input (Grau, page, 12, where in neuron level parallelism, a set of input [first input] is processed on MARS server and another set of input [second input] is processed on VENUS server in parallel; page 14 – 15, where the design of the local model are based on the communication constraint [resource constraints]).
wherein the training of the global machine learning model further comprises processing a first of the plurality of inputs and a second of the plurality of inputs (Grau, page, 12, where to train the global machine learning model in neuron level parallelism, a set of input [first plurality of input] is processed on MARS server and another set of input [second plulrality of input] is processed on VENUS server in parallel;)
Grau does not explicitly disclose: 
transforming of the training data comprises reducing a dimensionality of the training data, 
wherein the reducing of the dimensionality of the training data comprises factorizing the training data into a corresponding dictionary and a plurality of encodings, the plurality of encodings having a lower dimension than the training data, 
Sorzano explicitly discloses: 
transforming of the training data comprises reducing a dimensionality of the training data (Sorzano, sec. 1, para. 2, ln. 1 – 3, where during machine learning, normally the number of input variables is reduced before a data mining algorithm can be successfully applied), 
wherein the reducing of the dimensionality of the training data comprises factorizing the training data into a corresponding dictionary and a plurality of encodings, the plurality of encodings having a lower dimension than the training data (Sorzano, sec. 1, para, 2, ln. 4 – 6 & sec. 3, sec. 3.1, fig. 14 where exploiting the redundancy of the input data and by finding [factorizing] a smaller set of new variables [dictionary] each being a combination of the input variables containing basically the same information as the input variables this technique is called dimensionality reduction; in fig. 14, U is the encoding based on the dictionary W. U has smaller dimension than X [training data]),
Grau and Sorzano both disclose a method to perform machine learning and are analogous. It would have been prima facie obvious to one of ordinary sill in the art before the effective filing date of the claimed invention to combining Grau’s teaching of parallel computing of neural network with Sorzano’s teaching of dimension reduction of the input data to achieve the claimed invention. One of the ordinary skill in the art would have motivated to make this modification in order to overcome the challenges dealing with high dimensional data (Sorzano, abs, ln. 3 - 4). 

Regarding Claim 40, Claim 40 is the computer implemented method claim corresponding to Claim 5. Claim 40 is rejected for the same reason as Claim 5.

Claim 13 – 14, and 47 are rejected under 35 U.S.C. 103 as being unpatentable over Grau, Parallel Computing for Neural Networks, Rochester Institute of Technology RIT, EECC756 Course Slide, http://meseec.ce.rit.edu/756-projects/spring2013/1-4.pdf, 2013 in view of Bhuiyan, Md. (2015), Re: How do I know when to stop training a neural network? Retrieved from: https://www.researchgate.net/post/How-do-I-know-when-to-stop-training-a-neural-network/55c4db086225ffb0148b45a2/citation/download

Regarding Claim 13, depending on Claim 1. Grau discloses the method of Claim 1, Grau does not explicitly disclose: 
determining whether the global machine learning model has achieved convergence; and continuing to train the global machine learning model, when the global machine learning is determined to not have achieved convergence.
Bhuiyan explicitly discloses: 
determining whether the global machine learning model has achieved convergence; and continuing to train the global machine learning model, when the global machine learning is determined to not have achieved convergence (Bhuiyan, a neural network is stopped training when the error, i.e., the difference between the desired output and the expected output is below some threshold [achieve convergence]; continuing training if the model is not converged).
Grau and Bhuiyan both disclose a method of performing machine learning training and are analogous. It would have been prima facie obvious to one of ordinary sill in the art before the effective filing date of the claimed invention to combining Grau’s teaching of collaborative framework for machine learning with Bhuiyan’s teaching of the criteria to stop training to achieve the claimed invention. One of the ordinary skill in the art would have motivated to make this modification in order to implement the stopping mechanism of training a machine learning model (Bhuiyan, title question). 

Regarding Claim 14, depending on Claim 1. Grau discloses the method of Claim 1, Grau does not explicitly disclose: 
determining whether the global machine learning model has been subject to a threshold number of training iterations; and continuing to train the global machine learning model, when the global machine learning model is determined to not have been subject to the threshold number of training iterations (Bhuiyan, a neural network is stopped training when … the number of iterations or epochs is above some threshold value; continue training if the iterations do not reach the threshold number of iterations).

Regarding Claim 47, Claim 47 is the computer implemented method claim corresponding to the combination of Claim 13 and 14. Claim 47 is rejected for the same reason as Claim 13 and 14.

Response to Applicant’s Remarks
Applicant’s arguments with respect to claim rejection under U.S.C. 102 and U.S.C. 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIEN MING CHOU whose telephone number is (571)272-9354.  The examiner can normally be reached on Monday- Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHAKI KAKALI can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.C./Examiner, Art Unit 2122       

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122