DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Acknowledgement is made of Applicant's claim amendments on 8/25/2021 in response to the Non-Final Office Action by Examiner Cheung. The claim amendments are entered. Presently, claims 1-7 and 9-12 are now pending. Claim 8 has been cancelled. Claims 1-4, 6, 7, and 9-12 have been amended.

Applicant has amended claims 2 and 9 to overcome the previous claim objections. Accordingly, the objections against those claims are withdrawn. An updated claim objection is presented for a new issue. 

Applicant has sufficient amended Figs. 6, 7, 21, and 22 to include the requisite reference labels. Accordingly, the drawing objections are withdrawn. 

Response to Arguments
Applicant's arguments filed on 8/25/2021 have been fully considered in response to the Non-Final Office Action by Examiner Cheung. 

Applicant argues that the claims have been sufficiently amended to include further details about the machine learning model and its training to be significantly more than the judicial 

Applicant argues that the combination of the cited references allegedly fails to cure the deficiencies because they do not teach the newly amended claim limitations (Applicant’s Reply pgs. 10-11). While the cited references do not explicitly teach the newly amended claim limitations, their combination does teach the amended claim limitations when considered in conjunction with Lin, which has been incorporated into the rejection of the independent claims as necessitated by Applicant’s amendments. 

Claim Objections
Claim 4, line 4 is objected to because of the following informalities: “by previous -the training the machine learning” is strangely worded. Applicant is advised to review the wording and fix accordingly so that it is clearer. Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims l-7 and 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Drevo et al. (U.S. Pat. App. Pre-Grant Pub. No. 2016/0132787, hereinafter Drevo) in view of Lin et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2013/0346351, hereinafter Lin) and Baughman et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2017/0116372, hereinafter Baughman).

Regarding claim 1, Drevo teaches:
A non-transitory computer-readable storage medium storing therein a machine learning program that causes a computer to execute a process comprising ([0162]: describing that “[t]he non-volatile memory 806 stores computer instructions 812, an operating system 814, and data 816. In one example, the computer instructions 812 are executed by the processor 802 out of volatile memory 804. In one embodiment, an article 580 comprises non-transitory computer-readable instructions.” This is depicted in Fig. 8 elements 806 & 812: showing computer instructions 812 (i.e. learning program) and non-volatile memory 806 (i.e. computer- readable storage medium).): 
… a discontinuity point ([0080]-[0081]: describing that “When searching spaces of multiple modeling methodologies, a number of challenges to finding the best model arise either in the isolation of one methodology or from an aggregation. In particular, the following challenges can be expected. 
Discontinuity and non-differentiability….”)…; 
...
selecting a second value of the [[a]] learning parameter each the calculated estimation value ([0139]-[0142]: describing a computation for selecting an optimal parametrization value associated with an optimal performance of the machine learning model, wherein the computation is performed for each proposed parametrization value that can be selected out of a plurality of such values. See also [0091] and [0093]: describing examples parameters that can be selected as a second choice selection.); and 
training another machine learning model based on the second value of the ([0086]-[0091]: describing the traversing of a conditional parameter tree (CPT) to determine optimal machine learning model(s) from training/ learning, wherein the CPT comprises of various roots, leaves, and nodes denoting a plurality of parameter values that can be selected, with a particular parameter being available for a first choice and another parameter being available for an additional choice. The models can be a plurality of models as run on a system comprising a plurality of worker nodes ([0043] and [0064]-[0065]).

While the cited reference teaches the limitations of claim 1, it does not explicitly teach: “specifying, for each of the specified ranges, accuracy of a machine learning model obtained by training based on a first value of the learning parameter included in the specified ranges” on lines 9-11 and “calculating, for each of the specified ranges, an estimation value of the machine learning model in accordance with both the specified accuracy and spent for the training of the machine learning model Lin 
“specifying, for each of the specified ranges, accuracy of a machine learning model obtained by training based on a first value of the learning parameter included in the specified ranges”: describing that the specified accuracy score of the machine learning model training can correspond to various selected parameter and feature configurations, wherein the selected configurations can be based on the accuracy score values related to a “predicted range” (Lin [0043]-[0046] and [0114]). See also Lin [0068], [0078]-[0079], [0082], and [0100]: describing the accuracy score computations. 
calculating, for each of the specified ranges, an estimation value of the machine learning model in accordance with both the specified accuracy and spent for the training of the machine learning model describing that “a model selection module 210 is operable to estimate the accuracy of each trained predictive model to determine an initial accuracy score and subsequent new accuracy scores” (Lin [0045]). Wherein the estimated accuracy comprises accuracy score computations of the machine learning models (Lin [0068], [0078]-[0079], and [0082]) that can included a predicted range (Lin [0114]) and time related to training (Lin [0096] and [0100]). 
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the components in the cited reference to include the computation in Lin. Doing so would enable “[m]ethods and systems … that provide accuracy assessments of trained predictive models. The trained predictive models can be included in a dynamic repository of trained predictive models, at least some of which can be updated as new training data becomes available. As new training data is received and used to update the trained predictive models, the accuracy of the models can change. As such, accuracy assessments are also updated to reflect the current state of the trained predictive models included in the dynamic repository.” (Lin [0024]). 

While the cited references teach the limitations of claim 1, they do not explicitly teach: “determining whether or not there is … at which a variation in a learning time relative to a variation in a learning parameter is discontinuous; specifying, when the discontinuity point is present, ranges of the learning parameter in which the variation in the learning time relative to the variation in the learning parameter is continuous, based on the discontinuity point” on lines 4-8. Baughman discloses the claim limitations, teaching: 
“determining whether or not there is … at which a variation in a learning time relative to a variation in a learning parameter is discontinuous (Baughman [0116]-[0117]: describing that “[i]n step 201,… processors … using means known to those skilled in the art of statistical analysis, divide an input data set into contiguous segments bounded by a set of knots… As is known in the art, a knot may be a critical point, inflection point, or discontinuity in a data set.” Wherein “this document will refer to the data subset being regressed by the current iteration as comprising data points collected over a range of time” (Baughman [0120]). 
This is depicted in Baughman Fig. 2 element 201. Wherein knots can include discontinuity and inflection point (i.e. discontinuity point) in a data set with data points (i.e. learning parameter) collected over a range of time (i.e. learning time). The discontinuity in a dataset with data points collected over a range of time is interpreted as discontinuous with a variation in learning time to a variation in a learning parameter and bounding contiguous segments by a set of knots as determining there are discontinuity points.); 
specifying, when the discontinuity point is present, ranges of the learning parameter in which the variation in the learning time relative to the variation in the learning parameter is continuous, based on the discontinuity point (Baughman [0116]: describing that “[i]n step 201, … processors … divide an input data set into contiguous segments bounded by a set of knots. This allows the linear regression analysis to be performed on smaller, contiguous subsets of data that may be at least partially free of discontinuities that hamper efforts to efficiently and accurate fit a data set to a linear function….” And “[i]n step 205, … processors select initial ranges of values of an array of beta coefficients of the function Y to be regressed. These initial beta ranges may be estimated by mathematical methods known to those skilled in the art, such as transfer learning.” (Baughman [0121]). 
Wherein selecting initial ranges of values of an array of beta coefficients from the divided data set of contiguous segments bounded by discontinuities/knots can teach the claim limitations. See also Fig. 2 elements 201-205: showing continuous ranges of the learning parameter based on discontinuity points.)”
Thus, it would have been obvious to Person Having Ordinary Skill in the Art (PHOSITA) before the effective filing date (EFD) to modify the components in the cited references to include the computation in Baughman. Doing so would enable a cost-effective, energy efficient, and massively parallel scheme that “would extend the benefits of machine-learning applications into areas where it would not otherwise be cost-effective or computationally practical” (Baughman [0011]). 

Regarding claim 2, the rejection of claim 1 is incorporated. Drevo teaches:
The non-transitory computer-readable storage medium according to claim 1, the non-transitory computer-readable storage medium according to claim 1, the process further comprising: 
receiving a plurality of pieces of performance information respectively including the learning parameter [[of]] included in the machine learning model and the learning time spent for the training of the machine learning model based on the first value of the learning parameter ([0063]: describing that “many aspects of the model search process within the data hub 106, [can] includ[e] model training times, measures of predictive power, average performance for evaluation, training time, number of features, baselines, and comparative performance among models and modeling techniques”. Wherein “the worker nodes 100 coordinate using the hyperpartitions and performance tables 106c, 106d to recommend, optimize, and/or train a suitable model for the dataset” ([0068]). This is shown in Fig. 1 elements 106a-d and in Fig. 2. The training time can be computed for each proposed parametrization value that can be selected out of a plurality of such values ([0142]-[0145]).), 
wherein the determining includes determining whether or not the discontinuity point is present by referring to the learning parameter and the learning time included in each of the plurality of received pieces of performance information ([0080]-[0082]: describing the challenges in finding the best machine learning model, wherein such challenges can comprise discontinuities and varying dimensions of the search space related to the hyperparameters. See also Fig. 2: showing performance metrics as well as learning time in 208 and 206.). 

Regarding claim 3, the rejection of claim 2 is incorporated. Drevo teaches:
The non-transitory computer-readable storage medium according to claim 2, wherein in the determining, a determination that the discontinuity point is present is made when the training the machine learning model based on  the training the machine learning model capable of using a result of previous the training the machine learning model based on  ([0086]: describing that “the system 100 represents conditional parameter spaces as a tree-based data structure referred to herein as a Conditional Parameter Tree (CPT)... that compactly expresses every parameter, hyperparameter and design choice, in general, for a modeling methodology. This representation allow system 100 to both generate parameterizations and learn from previously attempted parameterizations by correlating their performance to suggest new parameterizations and find the best predictive model. [0097] … CPTs solves challenges of searching spaces of multiple modeling methodologies, including discontinuity and non-differentiability, varying dimensions of the search space, and non-transferability of methodology performance.”).

Regarding claim 4, the rejection of claim 3 is incorporated. Drevo teaches:
The non-transitory computer-readable storage medium according to claim 3, wherein the result is learning data generated from the learning parameter included in the machine learning model learned by previous the training the machine learning model based on  ([0094]-[0095]: “From the CPT 320, nine hyperpartitions can be derived by selecting (or “freezing”) values for the categorical parameters 330 and 339. An example hyperpartition for DBN is (Hidden Layers-1, Activation Function=linear, Epochs, Learn Rate, Pretrain Learn Rate, Learn Rate Decay, Layer 1 Size). Within this hyperpartition, the system 100 can optimize for the parameters “Epochs” (node 332), “Learn Rate” (node 326), “Pretrain Learn Rate” (node 328), “Learn Rate Decay” (node 324), and “Layer 1 Size” (node 334).” Wherein “[t]he CPT 340 includes four continuous parameters: intercept 344, Gamma 306, Eta 348, and Alpha 350; and three categorical parameters: Learning rate 352, Loss 354, and Penalty 356. Twenty-four hyperpartitions can be formed from the CPT 340.” ([0095]). 
Optimized parameters like epochs and learn rate in the above example (i.e. learning data) from the hyperpartitions by selecting categorical parameters such as loss and penalty (i.e. trial parameter) can denote learning parameters and training of the machine learning models at various current and previous training iterations.).

Regarding claim 5, the rejection of claim 1 is incorporated. Baughman further teaches:
wherein the specifying ranges of the learning parameter includes dividing the ranges of the learning parameter at each of the discontinuity points (Baughman [0116]: recites “[i]n step 201,… processors … using means known to those skilled in the art of statistical analysis, divide an input data set into contiguous segments bounded by a set of knots…” Wherein “a knot may be a critical point, inflection point, or discontinuity in a data set” (Baughman [0117]).).
Baughman. A motivation to combine the cited references with Baughman was previously given.

Regarding claim 6, the rejection of claim 1 is incorporated. Drevo teaches:
The non-transitory computer-readable storage medium according to claim 1, wherein the selecting the second value of the learning parameter includes selecting the second value of the learning parameter which causes the estimation ([0139]-[0142]: describing a computation for selecting an optimal parametrization value associated with an optimal performance of the machine learning model, wherein the computation is performed for each proposed parametrization value that can be selected out of a plurality of such values. Wherein the prescribed condition can be denoted by an optimal performance as well as optimal parameters to achieve such performance. See also [0091] and [0093]: describing examples parameters that can be selected as a second choice selection.).

Regarding claim 7, the rejection of claim 6 is incorporated. Drevo teaches:
The non-transitory computer-readable storage medium according to claim 6, wherein the selecting the second value of the learning parameter includes selecting the second value of the learning parameter which causes a largest estimation estimation ([0139]-[0142]: describing a computation for selecting an optimal parametrization value associated with an optimal performance of the machine learning model, wherein the computation is performed for each proposed parametrization value that can be selected out of a plurality of such values. Wherein the prescribed condition can be denoted by an optimal performance as well as optimal parameters to achieve such performance. The largest estimation can comprise a parametrization with the “highest corresponding” parameter denoting optimal parameters and performance ([0136] and [0142]). See also [0091] and [0093]: describing examples parameters that can be selected as a second choice selection. ).

Regarding independent claim 9, claim 9 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 9 is an apparatus claim that corresponds to system claim 1. 
A mapping is shown below for the limitations of claim 9 that differ from claim 1. Drevo teaches:
“A machine learning apparatus comprising: 
a processor that executes a process including: ([0165]: “[p]rocessing may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system”)….”

Regarding claim 10, claim 10 is substantially similar to claim 2 and therefore is rejected on the same grounds as claim 2. Claim 10 is an apparatus claim that corresponds to medium claim 2.


Regarding independent claim 11, claim 11 is substantially similar to independent claim 1 and therefore is rejected on the same grounds as claim 1. Claim 11 is a method claim that corresponds to system claim 1. 
A mapping is shown below for the limitation of claim 11 that differs from claim 1. Drevo teaches:
	“…, by a processor ([0165]: “[p]rocessing may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system”), ….” 

Regarding claim 12, claim 12 is substantially similar to claim 2 and therefore is rejected on the same grounds as claim 2. Claim 12 is a method claim that corresponds to medium claim 2.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:
Zhan et. al. (U.S. Pat. App. Pre-Grant Pub. No. 2020/0090073): describing a process for generating machine learning model that includes training and validation operations in parallel. Wherein the generated machine learning model can possess optimal model parameters that are determined by computing an average parameter value. Validation scores can be computed to determine an accuracy of the machine learning model, as well as a “ratio of consistency tween data types between data types corresponding to output vectors output by the machine leaning models based on the validation data and types of the validation data”.
 Simkoff et. al. (U.S. Pat. No. 10,255,550): describing training of machine learning model using a plurality of training data sets. Wherein the training of the model is an “iterative process that adjusts features and associated weights to some specified degree of accuracy relative to the known parameter values”. A generated prediction value for a parameter and its magnitude can be determined via a threshold comparison. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SELENE A HAEDI whose telephone number is (571)270-5762.  The examiner can normally be reached on M-F 11 AM - 7 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.H./Examiner, Art Unit 2121                                                                                                                                                                                                        




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121