DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 07/25/2018.
This action is in response to arguments and/or amendments filed on 03/23/2022. In the current amendments, claims 1, 7 and 9 have been amended. Claims 1-12 are currently pending and have been examined. 


Response to Arguments
Applicant’s arguments with respect to claim(s) 1-12 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3 and 6-10 are rejected under 35 U.S.C. 103 as being unpatentable over Pub. No. US 2015/0324690 A1 to Chilimbi et al., (hereinafter, “Chilimbi”) in view of Patent No. US 10,152,676 B1 to Strom and further in view of Engelsen et al. (US 2011/0004453 A1).
Regarding claim 1 (Currently Amended)
Chilimbi teaches a method of distributed learning, (para [0007] “Deep models may be trained on graphics processing units (GPUs). While this works well when the model fits within 2-4 GPU cards attached to a single server, it limits the size of models that can be trained. For example, known embodiments include a large-scale distributed system comprised of commodity servers to train extremely large models to high accuracy on a hard visual object recognition task— classifying images into one of twenty-two thousand distinct categories using raw pixel information.”)
the method comprising: receiving a global parameter set of a model to be trained from a parameter server; (para [0002] “As shown in FIG. 1, training data 102 is provided to humans 104 for labeling. The training data 102 and/or human labeled data (as output from humans 104) may also be processed to correspond to hand crafted features 106 associated with the training data set 102. Then a variety of machine learning algorithms can be applied to learn a classifier 108that maps each data row to a prediction 110. The classifier 108 may process the training data 102 to calculate errors 112 and update the classifier 108.”)
performing a training step based on training data and the received global parameter set, (para [0002] “As shown in FIG. 1, training data 102 is provided to humans 104 for labeling. The training data 102 and/or human labeled data (as output from humans 104) may also be processed to correspond to hand crafted features 106 associated with the training data set 102. Then a variety of machine learning algorithms can be applied to learn a classifier 108that maps each data row to a prediction 110. The classifier 108 may process the training data 102 to calculate errors 112 and update the classifier 108.”)
thereby generating a local set of parameters of the model; (Chilimbi, Par. [0009] “…model replicas that asynchronously update a shared model via a global parameter server…” Par. [0076 - 0077]; “Block 1102 illustrates receiving a batch of data items, as described above. The deep learning training module 616 may receive the batch of data items from the data server(s) 702. The batch of data items may have been pre-processed in the data server(s) 702 as described in FIG. 10 below. Block 1104 illustrates processing individual data items to calculate updates. The deep learning training module 616 may input the batch of data items into a model to calculate activation values, error terms, and/or weight updates.”)
Chilimbi does not teach determining whether to transmit the local set of parameters to the parameter server or continue a training process based on a comparison of a first value of a loss function or learning curve of the training step to a second value of a loss function or learning curve associated with the received global parameter set; 
and omitting transmission of the local set of parameters to the parameter server if it is determined by the comparison of the first value to the second value to continue the training process.
Strom teaches determining set whether to transmit the local set of parameters to the parameter server (col 8 lines 32-42 “At decision block 412, the model synchronization module 124 or some other module or component can determine, for a given model parameter, whether a particular update value (the combined update value from the partial gradient and the residual gradient) meets or exceeds a threshold. If so, the process 400 may proceed to block 414; otherwise, the process 400 may proceed to decision block 416. In some embodiments, the threshold may be selected such that the total number of update values that exceeds the threshold is some predetermined number or range (e.g., 1/10,000 of the total number of update values in the gradient).”)
…
and omitting transmission of the local set of parameters to the parameter server if it is determined… to continue the training process. (Col 4 lines 3-9 “In addition, each computing device can maintain its own residual gradient, and the updates that do not meet or exceed the threshold are not transmitted to the other computing devices. In this way, the bandwidth savings described above can be maintained, while all updates to the parameters calculated at a given computing device can be preserved for future use.”)
Chilimbi and Strom are analogous because they are both directed to distributed model training systems. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom’s method of only exchanging local worker updates with the global server if the updates will make a difference into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19 “In order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices, only those updates which are expected to provide a substantive change to the model may be applied and exchanged.”).
Chilimbi in view of Strom does not teach …continue a training process based on a comparison of a first value of a loss function or learning curve of the training step to a second value of a loss function or learning curve associated with the received global parameter set.
Englesen teaches …or continue a training process based on a comparison of a loss function or learning curve of the training step to a loss function or learning curve associated with the received global parameter set. (Examiner interprets error as loss values and Englese teaches comparing local errors to a global error see para [0016] “calculating error estimates for the global multivariate regression analysis and for each of the local multivariate regression analyses for the lipoprotein entity as quantified using the reference quantification method;” and also see para [0021] “For example, a NMR (H) spectrum with chemical shifts ranging from 0.0 to 5.8 ppm may be divided into 16 equidistant intervals. The NMR sub spectrum for each interval of chemical shifts is then subjected to a local multivariate regression analysis. This will allow error estimates to be calculated for each local multivariate regression analysis and compare this to the error estimate for the global multivariate regression analysis.”)
Chilimbi, Strom and Englesen are analogous because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom in view of Chilimbi to incorporate the teaching of Englesen to include a method of preparing regression coefficients in a multivariate analysis for predicting the quantity of a component.
One of ordinary skill in the art would have been motivated to make this modification in order to provide a system for comparing errors between global and local model for the purpose of measuring and monitoring data modeling system effectively as disclosed by Englesen (para [0002] “In the method a regression model is prepared from NMR spectra of biological samples from a group of non-fasting subjects, which samples have been analysed using a reference quantification method. By comparing error estimates for NMR sub spectra for local multivariate regression analyses with an error estimate for a global multivariate regression analysis, a local multivariate regression model may be selected for preparing the regression coefficients.”).


Regarding claim 2 (Previously Presented)
Chlimbi in view of Strom with Englesen teaches the method of claim 1.  
Strom further teaches the method further comprising using the local set of parameters as starting parameters in a further training step if the it is determined to continue the training process. (Strom, Col. 4 lines 6-9; “In addition, each computing device can maintain its own residual gradient, and the updates that do not meet or exceed the threshold are not transmitted to the other computing devices. In this way, the bandwidth savings described above can be maintained, while all updates to the parameters calculated at a given computing device can be preserved for future use.” Examiner note: Examiner interprets future use as further training steps).
Chilimbi, Englesen and Strom are analogous because they all both directed to distributed model training systems. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom with Englesen method of only exchanging local worker updates with the global server if the updates will make a difference, into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19).

Regarding claim 3 (Previously Presented)
Chlimbi in view of Strom with Englesen teaches the method of claim 1.  
Strom further teaches the method further comprising, if it t is determined to transmit the local set of parameters to the parameter server transmitting the local set of parameters to the parameter server; (Col. 3, lines 17-19; “…only those updates which are expected to provide a substantive change to the model may be applied and exchanged.” Also see col 8 lines 32-42 “At decision block 412, the model synchronization module 124 or some other module or component can determine, for a given model parameter, whether a particular update value (the combined update value from the partial gradient and the residual gradient) meets or exceeds a threshold. If so, the process 400 may proceed to block 414; otherwise, the process 400 may proceed to decision block 416. In some embodiments, the threshold may be selected such that the total number of update values that exceeds the threshold is some predetermined number or range (e.g., 1/10,000 of the total number of update values in the gradient).”);
Strom does not explicitly teach receiving an updated global parameter set from the parameter server and using the updated global parameter set as starting parameters in a further training step.
However, Chilimbi teaches receiving an updated global parameter set from the parameter server and using the updated global parameter set as starting parameters in a further training step. (Chilimbi, [0079-0080] “The global parameter server(s) 706 may provide updated weight values based on receiving updates from one or more replicas 704A-704N. The updated weight values take into account activation values, error terms, and/or weight updates from each of the individual replicas 704A-704N running asynchronously. Block 1110 illustrates modifying the model to reflect the updated weight values, as described above. As described above, the deep learning training module 616 may calculate a model prediction error based at least in part on the updated individual weight values and the new updated weight values. The deep learning training module 616 may process subsequent batches of data items by repeating process 1100…”)
 Chilimbi, Englesen and Strom are analogous because they are all directed to distributed model training systems. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom’s method of only exchanging local worker updates with the global server if the updates will make a difference, into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19).
Regarding claim 8
	As per claim 8, the claim is analogous to claim 3 and is therefore rejected with the same rationale applied against claim 3.
Regarding claim 10	
As per claim 10, the claim is analogous to claim 3 and is therefore rejected with the same rationale applied against claim 3.

Regarding claim 6 
	Chilimbi in view of Strom with Englesen teaches claim 1. 
Strom further teaches a non-transitory data carrier comprising a computer program instructions suitable for execution by a processor, the computer program instructions, when executed by the processor causing the processor to perform the method as claimed in Claim 1. (Strom, Par. Col. 11, lines 15-22; “The process 600 may be embodied in a set of executable program instructions stored on non-transitory computer-readable media, such as short-term or long-term memory of one or more computing devices associated with a model training node 102A. When the process 600 is initiated, the executable program instructions can be loaded and executed by the one or more computing devices.”)

Regarding claim 7 
Chilimbi teaches a distributed learning system comprising a parameter server, (para [0007] “Deep models may be trained on graphics processing units (GPUs). While this works well when the model fits within 2-4 GPU cards attached to a single server, it limits the size of models that can be trained. For example, known embodiments include a large-scale distributed system comprised of commodity servers to train extremely large models to high accuracy on a hard visual object recognition task— classifying images into one of twenty-two thousand distinct categories using raw pixel information.”) a first electronic apparatus and a second electronic apparatus, (FIG. 6 shows user servers and other machines 610 corresponds to first and second electronic apparatus see para [0040] “As shown, the service provider 602 may include one or more server(s) and other machines 610, any of which may include one or more processing unit(s) 612 and computer readable media 614. In various embodiments, the service provider 602 may train large neural network models for speech and/or visual object recognition, text processing, and other tasks.”)
the parameter server configured to: transmit a global parameter set of a model to be trained to the first electronic apparatus and the second electronic apparatus, (para [0038] “FIG. 6 illustrates an example operating environment 600 that includes a variety of devices and components that may be implemented in a variety of environments for providing training input to model training machines organized as multiple replicas that asynchronously update a shared model via a global parameter server.”)
the first electronic apparatus configured to: receive the global parameter set from the parameter server (para [0002] “As shown in FIG. 1, training data 102 is provided to humans 104 for labeling. The training data 102 and/or human labeled data (as output from humans 104) may also be processed to correspond to hand crafted features 106 associated with the training data set 102. Then a variety of machine learning algorithms can be applied to learn a classifier 108that maps each data row to a prediction 110. The classifier 108 may process the training data 102 to calculate errors 112 and update the classifier 108.”) perform a first training step based on first training data and on the received global parameter set, (para [0066] “Such local computation and asynchronous communication may offload computing from the deep learning training module 616 and minimizes communication between the deep learning training module 616 and the model module 618. The global parameter server(s) 706 com bine the updates received from each of the replicas 704A 704N before the updates are applied to the stored shared parameters.”)
thereby generating a first local set of parameters of the model; (Chilimbi, Par. [0009] “…model replicas that asynchronously update a shared model via a global parameter server…” Par. [0076 - 0077]; “Block 1102 illustrates receiving a batch of data items, as described above. The deep learning training module 616 may receive the batch of data items from the data server(s) 702. The batch of data items may have been pre-processed in the data server(s) 702 as described in FIG. 10 below. Block 1104 illustrates processing individual data items to calculate updates. The deep learning training module 616 may input the batch of data items into a model to calculate activation values, error terms, and/or weight updates.”)
….
the second electronic apparatus configured to: receive the global parameter set from the parameter server; (Since the learning steps in FIG. 6 repeat the process, the second round where it receives global parameter set from the server corresponds to second training step based on the second training data see para [0080] “The deep learning training module 616 may process subsequent batches of data items by repeating process 1100 until the model prediction error converges to a value below a predetermined threshold.”)
perform a second training step based on second training data and on the received global parameter set, (para [0002] “As shown in FIG. 1, training data 102 is provided to humans 104 for labeling. The training data 102 and/or human labeled data (as output from humans 104) may also be processed to correspond to hand crafted features 106 associated with the training data set 102. Then a variety of machine learning algorithms can be applied to learn a classifier 108that maps each data row to a prediction 110. The classifier 108 may process the training data 102 to calculate errors 112 and update the classifier 108.”)
thereby generating a second local set of parameters of the model; (Chilimbi, Par. [0009] “…model replicas that asynchronously update a shared model via a global parameter server…” Par. [0076 - 0077]; “Block 1102 illustrates receiving a batch of data items, as described above. The deep learning training module 616 may receive the batch of data items from the data server(s) 702. The batch of data items may have been pre-processed in the data server(s) 702 as described in FIG. 10 below. Block 1104 illustrates processing individual data items to calculate updates. The deep learning training module 616 may input the batch of data items into a model to calculate activation values, error terms, and/or weight updates.”)
Chilimbi does not teach determining whether to transmit the local set of parameters to the parameter server or continue a training process based on a comparison of a first value of a loss function or learning curve of the training step to a second value of a loss function or learning curve associated with the received global parameter set; 
and omitting transmission of the local set of parameters to the parameter server if it is determined by the comparison of the first value to the second value to continue the training process.
Strom teaches determining set whether to transmit the local set of parameters to the parameter server (col 8 lines 32-42 “At decision block 412, the model synchronization module 124 or some other module or component can determine, for a given model parameter, whether a particular update value (the combined update value from the partial gradient and the residual gradient) meets or exceeds a threshold. If so, the process 400 may proceed to block 414; otherwise, the process 400 may proceed to decision block 416. In some embodiments, the threshold may be selected such that the total number of update values that exceeds the threshold is some predetermined number or range (e.g., 1/10,000 of the total number of update values in the gradient).”)
…
and omitting transmission of the local set of parameters to the parameter server if it is determined… to continue the training process. (Col 4 lines 3-9 “In addition, each computing device can maintain its own residual gradient, and the updates that do not meet or exceed the threshold are not transmitted to the other computing devices. In this way, the bandwidth savings described above can be maintained, while all updates to the parameters calculated at a given computing device can be preserved for future use.”)
Chilimbi and Strom are analogous because they are both directed to distributed model training systems. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom’s method of only exchanging local worker updates with the global server if the updates will make a difference into Chilimbi’s system of deep learning on distributed systems in order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices (Strom, Col. 3, lines 15-19 “In order to reduce the bandwidth required to continuously or periodically exchange such update data among the multiple computing devices, only those updates which are expected to provide a substantive change to the model may be applied and exchanged.”).
Chilimbi in view of Strom does not teach …continue a training process based on a comparison of a first value of a loss function or learning curve of the training step to a second value of a loss function or learning curve associated with the received global parameter set.
Englesen teaches …or continue a training process based on a comparison of a loss function or learning curve of the training step to a loss function or learning curve associated with the received global parameter set. (Examiner interprets error as loss values and Englese teaches comparing local errors to a global error see para [0016] “calculating error estimates for the global multivariate regression analysis and for each of the local multivariate regression analyses for the lipoprotein entity as quantified using the reference quantification method;” and also see para [0021] “For example, a NMR (H) spectrum with chemical shifts ranging from 0.0 to 5.8 ppm may be divided into 16 equidistant intervals. The NMR sub spectrum for each interval of chemical shifts is then subjected to a local multivariate regression analysis. This will allow error estimates to be calculated for each local multivariate regression analysis and compare this to the error estimate for the global multivariate regression analysis.”)
Chilimbi, Strom and Englesen are analogous because they are all directed to data automation. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Strom in view of Chilimbi to incorporate the teaching of Englesen to include a method of preparing regression coefficients in a multivariate analysis for predicting the quantity of a component.
One of ordinary skill in the art would have been motivated to make this modification in order to provide a system for comparing errors between global and local model for the purpose of measuring and monitoring data modeling system effectively as disclosed by Englesen (para [0002] “In the method a regression model is prepared from NMR spectra of biological samples from a group of non-fasting subjects, which samples have been analysed using a reference quantification method. By comparing error estimates for NMR sub spectra for local multivariate regression analyses with an error estimate for a global multivariate regression analysis, a local multivariate regression model may be selected for preparing the regression coefficients.”).

Regarding claim 9 
Strom teaches a device comprising a processor, memory storing instructions executable by the processor (Strom, Col. 13 lines 10-15; “The elements of a method, process, routine, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor device, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory…”) and a communication interface (Strom, Col. 6 lines 34-37; “In some embodiments, the features and services provided by the model training nodes 102A, 102B and/or the data sources 104, 106 may be implemented as services consumable via a communication network.”) The remaining claim limitations are similar to those in claim 1 and are therefore rejected with the same rationale applied against claim 1. 


Claims 4-5 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Chilimbi in view of Strom, in view of Englesen and further in view of Patent No. US 8874440 B2 to Park et al. (hereinafter, “Park”).
Regarding claim 4 
Chlimbi in view of Strom with Englesen teaches the method of claim 1.  
Chilimbi further teaches and wherein the loss function or learning curve of the training step is used as observation (Chilimbi [0036]; “…typically, training continues for multiple epochs, reprocessing the training data set each time, until the validation set error converges to a desired value below a predetermined threshold.” Examiner Note: the loss function is interpreted as the “validation set error”).  
	The combination of Chilimbi and Strom with Wang does not explicitly teach wherein the determining is based on a partially observable Markov Decision Process.
	However, Park teaches wherein the determining is based on a Partially Observable Markov Decision Process (Park Col. 10, lines 56-59; “For example, the action determining unit 130 may use a learning model designed using a reinforcement learning model such as a partially observable Markov decision process (POMDP)”).
Chilimbi, Englesen, Strom, and Park are analogous because they are all directed to learning models. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Park’s determining method into Chilimbi’s system of deep learning on distributed systems as modified by Strom in order to determine an optimal action that maximizes a reward (Park Col. 11, lines 45-46).
Regarding claim 11
As per claim 11, the claim is analogous to claim 4 and is therefore rejected with the same rationale applied against claim 4. 

Regarding claim 5 
Chlimbi in view of Strom with Englesen teaches the method of claim 1.  
The combination of Chilimbi, Wang and Strom fails to explicitly teach wherein the worker uses reinforcement learning for said determining.
 However, Park teaches the method further comprising using reinforcement learning for the determining. (Park Col. 10, lines 56-59; “For example, the action determining unit 130 may use a learning model designed using a reinforcement learning model such as a partially observable Markov decision process (POMDP)”).
Chilimbi, Strom, Englesen and Park are analogous because they are all directed to learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Park’s determining method into Chilimbi’s system of deep learning on distributed systems as modified by Strom in order to determine an optimal action that maximizes a reward (Park Col. 11, lines 45-46).

Regarding claim 12
As per claim 12, the claim is analogous to claim 5 and is therefore rejected with the same rationale applied against claim 5. 


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure are listed below:
Chen et al. (NPL: “Adaptive Residual Gradient Compression for Data-Parallel Distributed Training”): discloses a distributed training system involving a parameter server. 
Li et al. (NPL: “Scaling Distributed Machine Learning with the Parameter Server”): discloses a distributed training system involving a parameter server.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598. The examiner can normally be reached Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/V.M./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126