DETAILED ACTION
1.	This action is in response the communications filed on 02/22/2021 in which claims 1, 8, and 15 are amended, and claims 1-4, 8-11, 15-18, and 21-23 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 8-10, 15-17 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee) in view of Duchi (Adaptive Subgradient Methods for Online Learning and Stochastic Optimization) in view of Krishnan (Interactive Data Cleaning For Statistical Modeling) in view of Guilak (US 20030225763 A1) in view of Hall (MapReduce/Bigtable for Distributed Optimization) in further view of Sallinen (High Performance Parallel Stochastic Gradient Descent in Shared Memory)
In regard to claims 1, 8 and 15, Zhao teaches: A computer-implemented method for parallel stochastic gradient descent using linear and non-linear activation functions, the method comprising: 
receiving a set of input examples, the set of examples including a plurality of vectors of feature values and a corresponding label to learn; ("Assume we have a set of labeled instances {(xi, yi)|i = 1, . . . , n}, where xi ∈ Rd is the feature vector for instance i, d is the feature size and yi ∈ {1,−1} is the class label of xi." — pg. 2379 Introduction lines 1-4) 
receiving a global model, the global model used to compute a plurality of local models based on the set of input examples; and… ("where w is the parameter to learn, fi(w) is the loss functiondefined on instance i" — pg. 2379 Introduction lines 8-9; "Assume that we have p threads (processors) which can access a shared memory, and w is stored in the shared memory. Furthermore, we assume each thread has access to a shared data structure for the vector w" — pg. 2380 Approach lines 1-4; a global model corresponds to w in the shared memory; local models corresponds to parameters from p threads; input examples corresponds to instance i)
… learning a new global model based on the global model and the set of input examples by iteratively performing the following steps: ("Our AsySVRG algorithm is presented in Algorithm 1. Wecan find that in the tth iteration, each thread completes the following operations…" — pg. 2380 Approach lines 6-8; See Algorithm 1 last 3 lines, "Take wt+1 to be the current value of u in the shared memory"; a new global model corresponds to wt+1)
computing… a plurality of local models having a plurality of model parameters based on the global model and at least a portion of the set of input examples;… ("By using a temporary variable u0 to store wt (i.e., u0 = wt), all threads parallelly compute the full gradient ∇f(u0) = 1 n n i=1 ∇fi(u0) = 1 n  n i=1 ∇fi(wt). Assume the gradients computed by thread a are denoted by φa which is a subset of {∇fi(wt)|i = 1, . . . , n}. We have φa φb = ∅ if a  = b, and p a=1 φa = {∇fi(wt)|i = 1, . . . , n}" — pg. 2380 Approach lines 9-15; local parameters correspond to u0)
computing, for each local model… a corresponding model combiner based on the global model and at least a portion of the set of input examples; and ("Then each thread parallelly runs an inner-loop in each iteration of which the thread reads the current value of u, denoted as ˆu, from the shared memory and randomly chooses an instance indexed by i to compute the vector ˆv = ∇fi(ˆu)−∇fi(u0) + ∇f(u0). (2)…" — pg. 2380 Approach lines 16-20; a corresponding model combiner corresponds to ˆv)
combining… the plurality of local models into the new global model based on the current global model and the plurality of corresponding model combiners. ("Then update the vector u ← u − ηˆv…" — pg. 2380 Approach lines 21-22; "In this paper, we find that there exists a synthetic process to generate the final value of u after all threads have completed their updates in the inner-loop of Algorithm 1. It means that we can generate a sequence of synthetic values of u with some order to get the final u, based on which we can prove the convergence of the lock-free AsySVRG in Algorithm 1." — pg. 2381 An Equivalent Synthetic Process lines 6-12; "… After all the threads have completed the inner-loops, we take wt+1 to be the current value of u in the shared memory." — pg. 2380 Approach lines 26-28; the new global model corresponds to wt+1)
Zhao fails to tech, but Duchi teaches: … ("… our algorithms dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Informally, our procedures give frequently occurring features very low learning rates…" — Page 2122 lines 2-5)
… for each determined infrequent feature value … (Duchi, "… and infrequent features high learning rates, where the intuition is that each time an infrequent feature is seen, the learner should 'take notice…'" — Page 2122 lines 5-6) 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao and Hall to incorporate the teachings of Duchi by observing frequently occurring features. Doing so would allow the adaptation to facilitate finding and identifying very predictive but comparatively rare features. (Duchi— Page 2122 lines 6-7)
Zhao and Duchi fail to teach, but Krishnan teaches: 
… randomly sampling the set of input examples… (Krishnan, p.950 "In the next step, the Sampler selects a batch of data S from the data that has not been cleaned already. To ensure convergence, the Sampler has to do this in a randomized way...")
determining the frequent feature values based on the random sampling; and (Krishnan, p.951 "… the Sampler has to do this in a randomized way, but can assign higher probabilities [the feature values] to some data as long as no data has a zero sampling probability."; p. 954 "The sampling algorithm is designed to include records in each batch that are most valuable to the analyst’s model with a higher probability."; The Sampler in Krishnan selects the input data randomly, and assigns / determines a probability p(r) to every input data r.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao and Duchi to incorporate the teachings of Krishnan by including the sampling distribution. Doing so would prioritize data records with higher gradients, i.e., make a larger impact during optimization. (Krishnan, p. 954 "Intuitively, this sampling distribution prioritizes records with higher gradients, i.e., make a larger impact during optimization.")
Zhao, Duchi and Krishnan fail to teach, but Guilak teaches: … wherein a frequent feature value is included in at least 10% of the set of input examples; and (Guilak, [0045] "... The collected data is statistically analyzed to identify a list of features [frequent features] useful in identifying the particular category (e.g., pornographic or not pornographic) of the pre-classified document. In one embodiment, the list of features is limited to a specified percentage (e.g., 30%) [e.g. 10 %] of the most frequent features [the set of input examples / non-zero features in input example] extracted from the documents belonging to the particular category."; Based on spec., [0105] "An average NFNZ ratio (shown in Table 2 below) may be an average number of frequent features in each input example divided by a number of non-zero features in that input example. Thus, a value of zero (0) may mean all features are infrequent, and a value of one (1) may mean all features are frequent. A frequent feature may be defined as whether a particular feature shows up in at least ten percent (10%) of the input examples," because NFNZ ratio = # of frequent features / # of non-zero features, the examiner interpreted that 10% of the input example in the claim is based on non-zero features of input example.; Additionally, based on spec. [0132], and also see MPEP §2144.05 II. A, differences in frequent feature values will not support the patentability of subject matter encompassed by the prior art unless there is evidence indicating such value is critical. See In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955))
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao, Duchi and Krishnan to incorporate the teachings of Guilak by including features extracted from the training data. Doing so would make the data reflect the frequency of occurrence of features in each of the documents. (Guilak, [0045] "Features such as described above in reference to FIG. 2 are extracted from the training set 310, and data (e.g., feature vectors) reflecting the frequency of occurrence of one or more features in each of the documents in the training set 310 is collected. ")
Zhao, Duchi, Krishnan and Guilak fail to teach, but Hall teaches: … during a map phase …during a reduce phase… ("MapReduce [4] is a distributed computation model, consisting of two main phases, a map phase and a reduce phase. In the map phase, each worker processes each item in isolation and produces an intermediate output consisting of a key-value pair. The reduce phase then collects all outputs with matching keys, and performs an operation on that set of values, to yield a final combined output (or set of outputs) for that key" — Page 3, 3.1)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao, Duchi, Krishnan and Guilak to incorporate the teachings of Hall by including MapReduce. Doing so would allow the method to provide equally-predictive models and take far less time to train. (Hall — Page 4, Conclusion lines 1-3).
Zhao, Duchi, Krishnan, Guilak and Hall fail to teach, but Sallinen teaches: … asynchronously updating… the plurality of model parameters ("… but other threads may asynchronously update the model vector in the middle of a thread’s batch processing. In that case, the thread would get a more current model in the middle of processing a batch…" — Page 876 left col. lines 2-6)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao, Duchi, Krishnan, Guilak and Hall to incorporate the teachings of Sallinen by including asynchronous updates. Doing so would allow the method to get a more current model in the middle of processing a batch (Sallinen — Page 876 left col. lines 2-6).
Note that claim 8 recite the same substantial matter as claim 1 only differing in embodiment. The embodiment difference of a data storage device that stores instructions; and a processor configured to execute the instructions is taught by Zhao ("The experiments are conducted on a server with 12 Intel cores and 64G memory." — pg. 2383 Experiments lines 5-6)
Note that claim 15 recite the same substantial matter as claim 1 only differing in embodiment. The embodiment difference of A non-transitory computer-readable medium storing instructions that, when executed by a computer, cause the computer is taught by Zhao ("The experiments are conducted on a server with 12 Intel cores and 64G memory." — pg. 2383 Experiments lines 5-6)
In regard to claims 2, 9 and 16, Zhao, Duchi, Krishnan, Guilak, Hall, and Sallinen teach: The method according to claim 1, wherein the plurality of model parameters are linearly updated based on a machine learning algorithm including one or more of linear regression, linear regression with L2 regularization, polynomial regression, ordinary least squares ("OLS"), perceptron, support vector machine, ("SVM"), and lasso. (Zhao, "Please note that Assumptions 1 and 2 are often satisfied by most objective functions in machine learning models, such as the logistic regression (LR) and linear regression with L2-norm regularization." — pg. 2380 Preliminary lines 16-19)
In regard to claims 3, 10 and 17, Zhao, Duchi, Krishnan, Guilak, Hall and Sallinen teach: The method according to claim 1, wherein the plurality of model parameters is non-linearly updated based on a machine learning algorithm including logistic regression. (Zhao, "Please note that Assumptions 1 and 2 are often satisfied by most objective functions in machine learning models, such as the logistic regression (LR) and linear regression with L2-norm regularization." — pg. 2380 Preliminary lines 16-19)
In regard to claims 21, 22 and 23, Zhao, Duchi, Krishnan, Guilak, Hall and Sallinen teach: The method according to claim 1, wherein sampling the set of input examples to determine frequent feature values (Duchi, p. 2122 ln. 2-5 "… our algorithms dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Informally, our procedures give frequently occurring features very low learning rates…") includes randomly sampling a portion of the set of input examples to determine the frequent feature values. (Zhao, p. 2380 Approach; "… Assume the gradients computed by thread a are denoted by φa which is a subset of {∇fi(wt)|i = 1, . . . , n}."; p. 2380 Approach, "Then each thread parallelly runs an inner-loop in each iteration of which the thread reads the current value of u, denoted as ˆu, from the shared memory and randomly chooses [randomly sampling] an instance indexed by i [input examples]…")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao to incorporate the teachings of Duchi by observing frequently occurring features. Doing so would allow the adaptation to facilitate finding and identifying very predictive but comparatively rare features. (Duchi— Page 2122 lines 6-7)
Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of in view of Duchi in view of Krishnan in view of Guilak  in view of Hall in view of Sallinen in further view of Drineas (RandNLA: Randomized Numerical Linear Algebra) 
In regard to claims 4, 11 and 18, Zhao, Duchi, Krishnan, Guilak, Hall, and Sallinen teach: The method according to claim 1, wherein computing, for each local model, a corresponding model combiner includes: 
computing, for each local model, a corresponding projected model combiners based on the global model and at least a portion of the set of input examples. (Zhao, "Then each thread parallelly runs an inner-loop in each iteration of which the thread reads the current value of u, denoted as ˆu, from the shared memory and randomly chooses an instance indexed by i to compute the vector ˆv = ∇fi(ˆu)−∇fi(u0) + ∇f(u0). (2)…" — pg. 2380 Approach lines 16-20)
Zhao, Duchi, Krishnan, Guilak, Hall, and Sallinen fail to teach, but Drineas teaches: a corresponding projected model ("RandNLA algorithms involve taking an input matrix; constructing a 'sketch' of that input matrix—where a sketch is a smaller or sparser matrix that represents the essential information in the original matrix—by random sampling…"—p.83 Col.1 lines 1-8)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Zhao, Duchi, Krishnan, Guilak, Hall, and Sallinen to incorporate the teachings of Drineas by including a 'sketch' of that input matrix. Doing so would allow the method to use that sketch as a surrogate for the full matrix to help compute quantities of interest (Drineas — p.83 Col.1 lines 8-9).
Response to Arguments
Applicant's arguments filed on 02/22/2021 with respect to the rejections under 35 USC § 103 have been fully considered but they are not moot. 
Applicant argues: (see p. 14 middle): “Thus, Zhao, Duchi, Guilak, Hall, and Sallinen, whether taken individually or in combination, fail to disclose or render obvious a least ‘randomly sampling the set of input examples to determine frequent feature values…; determining the frequent feature values based on the random sampling;…’ as recited in claim 1.”
Examiner answers: the arguments do not apply to the references (Krishnan) being used in the current rejection. The Sampler in Krishnan selects the input data randomly, and assigns / determines a probability p(r) to every input data r. Therefore, Krishnan teaches randomly sampling input data, and determines a frequency value based on the random sampling.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.C./Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122