DETAILED CORRESPONDENCE
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  This non-final office action is in response to the Patent Application filed on 5 June 2019.  Claims 1-19 are pending and considered below.         

Claim Rejections - 35 USC § 101
	Analysis of the claims and written description under the 2019 PEG results in the conclusion that the instant invention, while directed to an abstract idea or judicial exception related to mathematical concepts such as mathematical relationships, formulae, or calculations, and as well is directed to mental processes including concepts performed in the human mind, and is further directed to the practical application of iteratively calculating gradient information related to a machine learning model, deleting a subset of sample data prior to the next subset is read, accumulating a plurality of sets of gradient information, and update the machine learning model with the accumulated information.  The performance of the claimed invention results in the optimization of memory usage as disclosed by written description at least paragraph [60], wherein storage resources are released by the deletion of intermediate calculation results.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Narsky (9,652,722) in view of Brueckner et al. (20160078361).

Claims 1, 7, and 19:	Narsky discloses a data processing apparatus, method, and computing device comprising: 
a gradient calculation module (2:55-67, 3:1-36, Fig. 2) configured to: 
calculate gradient information of each of a plurality of parameters of the machine learning model (6:18-26), wherein after a set of gradient information of each parameter is calculated using one sample data subset, the sample data subset is deleted before a next sample data subset is read (3:32-36 “Observation remover 215 may remove observations from the training data after some number of iterations. For example, observation remover 215 may determine how many observations to remove based on the predictions generated by convergence predictor,” 5:9-23 “removing observations from the training data (step 330). At step 405, system 100 may estimate the number of additional iterations needed until a tolerance threshold ε is met,” 5:50-67 “system 100 may calculate the number M.sub.t of observations to remove in the current iteration. For example, there may be total number of desired observations that system 100 is to remove by the actual final iteration t.sub.final at which the criterion value meets the tolerance threshold ε. In one embodiment, this total number may be based on a parameter q, which represents a fractional number of the total number of observations in the training data that is to be removed by the final iteration t.sub.final,” 6:1-17, Figs. 3, 4), and another set of gradient information of each parameter is calculated using the next sample data subset, and wherein the machine learning model has an initialized global parameter or was updated in a last iterative operation (5:50-67, 6:1-17); 
an accumulation module configured to in the process of the one iterative operation, accumulate a plurality of sets of calculated gradient information of each parameter to obtain an update gradient of each parameter (3:1-31 “Model optimizer 205 may receive training data and iterate a model optimization process that improves a classification model for the training data. With successive iterations, model optimizer 205 may generate an updated classification model, with improved classification accuracy. In an embodiment, each iteration may be associated with an iteration index t, e.g., 1 to T iterations. In one example, the classification model may be a SVM model and the optimization process may be, for example, an iterative single data algorithm (ISDA), a sequential minimal optimization (SMO), quadratic programming optimization,”); and 
a sending module configured to send the update gradient of each parameter in the process of the one iterative operation, wherein the update gradient of each parameter is used to update the machine learning model (3:1-31, 3:41-50 “initial classification model is updated in the next iteration of the optimization process,” 7:26-31 “recalculate the gradients for all of the observations remaining in the training data, to reflect the fact that the removed observations are no longer in the training data,”). 
Narsky does not explicitly disclose, however Brueckner discloses:
in a process of one iterative operation, sequentially read a plurality of sample data subsets from a sample data set, wherein each sample data subset comprises at least one piece of sample data ([92, 132 “sub-division of the concatenated data set 1804 into contiguous chunks (rather than, for example, randomly selected sub-portions) may increase the fraction of the data set that can be read in via more efficient sequential reads than the fraction that has to be read via random reads,” 137]); 
enter each read sample data subset into a machine learning model ([55, 56, 92 “initial set of statistics 763 based on a sub-sample (e.g., a randomly-selected subset of the large data set) may be obtained in a first phase, while the generation of full-sample statistics 764 derived from the entire data set may be deferred to a second phase. Such a multi-phase approach towards statistics generation may be implemented, for example, to allow the client to get a rough or approximate summary of the data set values fairly rapidly in the first phase,” 93, Fig. 7]).
Therefore it would be obvious for Narsky to, in a process of one iterative operation, sequentially read a plurality of sample data subsets from a sample data set, wherein each sample data subset comprises at least one piece of sample data and enter each read sample data subset into a machine learning model as per the steps of Brueckner in order to optimize the implementation of sampling data to improve the output from a machine learning model and thereby more precisely update the gradient of each parameter of the model. 

Claims 2 and 8:	Narsky in view of Brueckner discloses the data processing apparatus according to claims 1 and 7 above, and Narsky further discloses wherein the gradient calculation module, the accumulation module, and the sending module are further configured to participate in a plurality of iterative operations after the one iterative operation, until the machine learning model converges or a specified quantity of iterations are completed in calculation (3:15-31 “Convergence predictor 210 may predict the trend based on the convergence criterions recorded at previous iterations, and may further predict a number of additional iterations needed before the convergence criterion of the classification model matches or is below a desired tolerance threshold,” 4:63-67, 5:1-48); 
in each of the plurality of iterative operations after the one iterative operation, the gradient calculation module, the accumulation module, and the sending module repeat actions in the process of the one iterative operation; and in the one iterative operation and the plurality of iterative operations after the one iterative operation, after the machine learning model is updated using an update gradient obtained in an iterative operation, a next iterative operation is performed (3:15-31, 4:63-67, 5:1-48).  

Claims 3 and 10:	Narsky in view of Brueckner discloses the data processing apparatus according to claims 1 and 7 above, and Narsky further discloses wherein the accumulation module is configured to: for the plurality of sets of gradient information of each updated in the next iteration of the optimization process at t=2,” 6:43-56, 7:26-31 “recalculate the gradients for all of the observations remaining in the training data, to reflect the fact that the removed observations are no longer in the training data,”).  

Claims 4 and 11:	Narsky in view of Brueckner discloses the data processing apparatus according to claims 1 and 7 above, and Narsky further discloses wherein the accumulation module is configured to: 
for one set of gradient information of each parameter obtained based on each sample data subset, accumulate one set of gradient information of a same parameter to obtain an accumulation gradient of each parameter, so that a plurality of accumulation gradients of each parameter are obtained based on the plurality of read sample data subsets (6:18-61 “calculate the gradients of all of the observations currently in the training data set. In one embodiment, the gradient of each observation may be based on a summation of contributions from every other observation in the training data set…system 100 may remove the number M.sub.t of observations with the largest magnitude gradients from the training data set In some embodiments, system 100 may remove both positive and negative gradients. In some other embodiments, system 100 may remove only the observations with the largest negative gradients (i.e. the badly misclassified observations). In some embodiments, system 100 may not remove any observations from the training data if M.sub.t is zero or negative. In some embodiments, system 100 may add observations that were previously removed back into the training data if M.sub.t is negative,”), and 
accumulate the plurality of accumulation gradients of each parameter to obtain the update gradient of each parameter (6:18-61).  

Claims 5 and 12:	Narsky in view of Brueckner discloses the data processing apparatus according to claims 1 and 7 above, and Narsky further discloses wherein the accumulation module is configured to: for the plurality of sets of gradient information of each parameter that are obtained based on the plurality of read sample data subsets, collect a plurality of sets of gradient information of a same parameter together, wherein the plurality of sets of gradient information of each parameter that are collected together are used as the update gradient of each parameter (3:1-14 “model optimizer 205 may generate an updated classification model, with improved classification accuracy. In an embodiment, each iteration may be associated with an iteration index t, e.g., 1 to T iterations. In one example, the classification model may be a SVM model and the optimization process may be, for example, an iterative single data algorithm (ISDA), a sequential minimal optimization (SMO), quadratic programming optimization,” 6:18-56 “calculate the gradients of all of the observations currently in the training data set. In one embodiment, the gradient of each observation may be based on a summation of contributions from every other observation in the training data set,”).  

Claim 6:	Narsky in view of Brueckner discloses the data processing apparatus according to claim 1 above, and Narsky further discloses wherein the gradient calculation module is configured to: 
read and use an intermediate calculation result to calculate the gradient information of each of the plurality of parameters of the machine learning model, wherein the intermediate calculation result is used as input information to calculate the gradient information (3:1-31 “Model optimizer 205 may receive training data and iterate a model optimization process that improves a classification model for the training data. With successive iterations, model optimizer 205 may generate an updated classification model, with improved classification accuracy. In an embodiment, each iteration may be associated with an iteration index t, e.g., 1 to T iterations. In one example, the classification model may be a SVM model and the optimization process may be, for example, an iterative single data algorithm (ISDA), a sequential minimal optimization (SMO), quadratic programming optimization,”), and after a set of gradient information of each parameter is calculated using one sample data subset, the sample data subset is deleted before a next sample data subset is read, and another set of gradient information of each parameter is calculated using the next sample data subset (3:32-36 “Observation remover 215 may remove observations from the training data after some number of iterations. For example, observation remover 215 may determine how many observations to remove based on the predictions generated by convergence predictor,” 5:9-23 “removing observations from the training data (step 330). At step 405, system 100 may estimate the number of additional iterations needed until a tolerance threshold ε is met,” 5:50-67 “system 100 may calculate the number M.sub.t of observations to remove in the current iteration. For example, there may be total number of desired observations that system 100 is to remove by the actual final iteration t.sub.final at which the criterion value meets the tolerance threshold ε. In one embodiment, this total number may be based on a parameter q, which represents a fractional number of the total number of observations in the training data that is to be removed by the final iteration t.sub.final,” 6:1-17, Figs. 3, 4); Examiner Note: Examiner, under a broadest reasonable interpretation, interprets intermediate calculation results to be results obtained during the calculation of data gradients and the continuous refining of results and machine learning operations to determine a desired result.
after the intermediate calculation result is used, delete the intermediate calculation result, wherein an operation of deleting the intermediate calculation result needs to be completed before the next sample data subset is read (3:32-36, 5:9-23, 5:50-67, 6:1-17, Figs. 3, 4). 
Narsky does not explicitly disclose, however Brueckner discloses:
in the process of the one iterative operation, sequentially read the plurality of sample data subsets from the sample data set ([92, 132 “sub-division of the concatenated data set 1804 into contiguous chunks (rather than, for example, randomly selected sub-portions) may increase the fraction of the data set that can be read in via more efficient sequential reads than the fraction that has to be read via random reads,” 137]); 
enter each read sample data subset into the machine learning model ([55, 56, 92 “initial set of statistics 763 based on a sub-sample (e.g., a randomly-selected subset of the large data set) may be obtained in a first phase, while the generation of full-sample statistics 764 derived from the entire data set may be deferred to a second phase. Such a multi-phase approach towards statistics generation may be implemented, for example, to allow the client to get a rough or approximate summary of the data set values fairly rapidly in the first phase,” 93, Fig. 7]).
Therefore it would be obvious for Narsky to, in a process of one iterative operation, sequentially read a plurality of sample data subsets from a sample data set, wherein each sample data subset comprises at least one piece of sample data and enter each read sample data subset into a machine learning model as per the steps of Brueckner in order to optimize the implementation of sampling data to improve the output from a machine learning model and thereby more precisely update the gradient of each parameter of the model.

Claim 9:	Narsky in view of Brueckner discloses the method according to claim 7 above, and Narsky further discloses updating the machine learning model using the update gradient of each parameter during each iteration operation (3:1-14).  

Claim 13:	Narsky in view of Brueckner discloses the method according to claim 8 above, and Narsky further discloses wherein in a process of entering each sample data subset into the machine learning model and calculating gradient information of each of the plurality of parameters of the machine learning model during each iterative operation, one piece of gradient information of each parameter is obtained correspondingly based on one piece of sample data in the sample data subset, wherein the sample data subset comprises at least one piece of sample data ([3:1-36, 4:22-29 “Several algorithms can be used to solve the QP problem shown above on large datasets, such as ISDA and SMO. ISDA and SMO are based on an iterative process that inspects a subset of the data at each step,” 6:18-66); and 
correspondingly, one set of gradient information of each parameter is obtained correspondingly based on one sample data subset, wherein the one set of gradient information comprises at least one piece of gradient information ([3:1-36, 4:22-29, 6:18-66).  

Claim 14:	Narsky in view of Brueckner discloses the method according to claim 8 above, and Narsky does not explicitly disclose, however Brueckner discloses wherein updating the machine learning model using an update gradient of each parameter in each iterative operation comprises: updating the machine learning model according to a model update formula of stochastic gradient descent and using the update gradient of each parameter ([204 “parameters or weights may be updated if needed in one or more learning iterations, e.g., using a stochastic gradient descent technique or some similar optimization approach,” 212]). 
Therefore it would be obvious for Narsky to update the machine learning model according to a model update formula of stochastic gradient descent and using the update gradient of each parameter as per the steps of Brueckner in order to optimize the implementation of sampling data in accordance with stochastic gradient techniques to improve the output from a machine learning model and thereby more precisely update the gradient of each parameter of the model.

Claim 15:	Narsky in view of Brueckner discloses the method according to claim 8 above, and Narsky further discloses wherein the method runs on at least one computing node, and the computing node comprises at least one processor and a memory configured for the processor (2:55-67, 10:61-67, 11:1-13, Fig. 2).  

Claim 16:	Narsky in view of Brueckner discloses the method according to claim 8 above, and Narsky does not explicitly disclose, however Brueckner discloses wherein in each iterative operation, the plurality of sample data subsets sequentially read from the sample data set are stored in the memory, and after gradient information of each parameter is calculated using one sample data subset, the sample data subset is deleted from the memory before a next sample data subset is read into the memory ([188-190, 196 “subsequent tree-pruning pass of the training phase, to determine an order in which nodes can be pruned or removed from the tree without affecting the quality of the model predictions significantly,” 198 “tree generation may be paused, the created nodes may be examined for pruning (e.g., based on their PUM values and on the optimization goals) in a first tree-pruning period, and some nodes may be removed based on the analysis. More nodes may be generated for the resulting tree in the next tree-generation period, followed by removal of zero or more nodes during the next tree-pruning period, and so on. Such iterative generation and pruning may help eliminate nodes with low utility from the tree earlier than in an approach in which the entire tree is generated before any nodes are pruned,” 206, Figs. 36, 37]). 
Therefore it would be obvious for Narsky wherein in each iterative operation, the plurality of sample data subsets sequentially read from the sample data set are stored in the memory, and after gradient information of each parameter is calculated using one sample data subset, the sample data subset is deleted from the memory before a next sample data subset is read into the memory as per the steps of Brueckner in order to optimize the implementation of sampling data in accordance with system resource management techniques to improve system performance and to improve the output from a machine learning model and thereby more precisely update the gradient of each parameter of the model.

Claim 17:	Narsky in view of Brueckner discloses the method according to claim 16 above, and Narsky does not explicitly disclose, however Brueckner discloses wherein a storage space occupied by one sample data subset is less than or equal to a storage space reserved for the sample data subset in the memory, and a storage space occupied by two sample data subsets is greater than the storage space reserved for the sample data subset in the memory ([57 “ identifying the appropriate set of resources (e.g., CPUs/cores, storage or memory) for the plan, scheduling the execution of the plan, gathering results, providing/saving the results in an appropriate destination, and at least in some cases for providing status updates or responses to the requesting clients,” 60, 62, 68, 183]). 
Therefore it would be obvious for Narsky wherein a storage space occupied by one sample data subset is less than or equal to a storage space reserved for the sample data subset in the memory, and a storage space occupied by two sample data subsets is greater than the storage space reserved for the sample data subset in the memory as per the steps of Brueckner in order to optimize the implementation of sampling data in accordance with system resource management techniques to improve system performance and to improve the output from a machine learning model and thereby more precisely update the gradient of each parameter of the model.

Claim 18:	Narsky in view of Brueckner discloses the method according to claim 16 above, and Narsky further discloses: 
in each iterative operation, in a process of entering each sample data subset into the machine learning model and calculating gradient information of each of the plurality of parameters of the machine learning model, reading and using an intermediate calculation result stored in the memory, wherein the intermediate calculation result is used as input information to calculate the gradient information (3:1-31 “Model optimizer 205 may receive training data and iterate a model optimization process that improves a classification model for the training data. With successive iterations, model optimizer 205 may generate an updated classification model, with improved classification accuracy. In an embodiment, each iteration may be associated with an iteration index t, e.g., 1 to T iterations. In one example, the classification model may be a SVM model and the optimization process may be, for example, an iterative single data algorithm (ISDA), a sequential minimal optimization (SMO), quadratic programming optimization,”); and 
after the intermediate calculation result is used, deleting the intermediate calculation result from the memory, wherein an operation of deleting the intermediate calculation result needs to be completed before a next sample data subset is read into the memory (3:32-36, 5:9-23, 5:50-67, 6:1-17, Figs. 3, 4).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Please see attached References Cited form 892
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David Stoltenberg whose telephone number is (571) 270-3472. 
The examiner can normally be reached on Monday-Friday 8:30AM to 5:00PM EST.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Waseem Ashraf, can be reached on (571) 270-3948.  The fax phone number for the organization where this application or proceeding is assigned is (571)-273-8300, or the examiner’s direct fax phone number is 571 270 4472.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published application may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center at (866) 217-9197 (toll free).  If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786-9199 (IN USA OR CANADA) or (571) 272-1000.

/DAVID J STOLTENBERG/Primary Examiner, Art Unit 3682