Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 2 is objected to because of the following informalities: the present claim ends with a semicolon rather than a period. Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 8-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 8, it recites “A system for generating a learning curve to aid in predicting a metric for a deep learning model, the method comprising . . .” (emphasis added). It is unclear if the present claim is directed towards a system or a method. After “the method comprising,” the present claim further recites “one or more processors; a non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by at least one of the one or more processors . . .” These are not method steps, so 
Further regarding claim 8, it recites “the set of model candidates” (line 8). This term lacks antecedent basis and is therefore indefinite; although line 7 recites “a set of models,” it makes no mention of model candidates. Lines 11-12 then recite “”the set of trained model candidates,” which is similarly indefinite. In addition, Line 9 recites “the shard sizes.” This too lacks antecedent basis as there is no previous mention of shard sizes. This is further confused by line 13, which recites “that shard, which has a shard size.” It is confusing to recite “a shard size” after reciting the indefinite “the shard sizes.” For the purpose of examination under prior art, the examiner will interpret the set of model candidates to be any one or more models in the set of models, and will assume that each shard has a shard size.
Regarding Claim 15, it recites terms similar to those of claim 8, including “the set of model candidates” (line 5), “the shard sizes” (line 6), and “a shard size” (line 10), so it is indefinite for the same reasons.
Regarding Claims 9-14 and 16-20, they are rejected as being dependent on rejected base claims.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


system for generating a learning curve to aid in predicting a metric for a deep learning model, the method comprising . . .” The claim further mixes system elements (one or more processors and a non-transitory computer-readable medium or media) with method steps.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 7-13, and 15-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kobayashi et al. (U.S. 2018/0018587, hereinafter “Kobayashi”).
Regarding Claim 1, Kobayashi teaches a computer-implemented method for generating a learning curve to aid in predicting a metric for a deep learning model (fig. 1; ¶ [0043] – [0045]), the method comprising:
splitting a data set into a set of shards such that the shard sizes span multiple orders of magnitude (fig. 1; ¶ [0047] – [0048], [0064], and [0068]—the samples of various sample sizes are shards that are split from the data set. The shards may double in size from one to the next, thus spanning multiple orders of magnitude);

using a validation set to identify a best model for each shard from among the set of trained model candidates, in which each best model has a corresponding validation accuracy for that shard, which has a shard size (¶ [0062] – [0063] and [0069]);
fitting a power-law learning curve model using the validation accuracies and corresponding shard sizes of the best models selected for the shards (figs. 4 and 5; ¶ [0074] – [0075] and [0082] – [0083]—a curve is fitted to the graphs of shard size to prediction performance for each model to determine the improvement in performance for larger shard sizes); and
using the fitted power-law learning curve to predict a metric associated with a deep learning model (¶ [0084] – [0086]—a regression analysis is used with the curve to estimate a metric such as prediction performance for a larger shard or a probability of improvement).
Regarding Claim 8, Kobayashi teaches a system for generating a learning curve to aid in predicting a metric for a deep learning model (fig. 1; ¶ [0043] – [0045]), the method comprising:
one or more processors (fig. 2, CPU 101; ¶ [0055]);
a non-transitory computer-readable medium or media comprising one or more sequences of instructions (fig. 2, RAM 102 and/or HDD 103; ¶ [0056], [0058], and claim 1) which, when executed by at least one of the one or more processors, causes steps to be performed comprising:
training a set of models on each shard from a set of shards in which the models from the set of model candidates vary in architecture, hyperparameters, or both, and the 
using a validation set to identify a best model for each shard from among the set of trained model candidates, in which each best model has a corresponding validation accuracy for that shard, which has a shard size (¶ [0062] – [0063] and [0069]);
fitting a power-law learning curve model using the validation accuracies and corresponding shard sizes of the best models selected for the shards (figs. 4 and 5; ¶ [0074] – [0075] and [0082] – [0083]—a curve is fitted to the graphs of shard size to prediction performance for each model to determine the improvement in performance for larger shard sizes); and
using the fitted power-law learning curve to predict a metric associated with a deep learning model (¶ [0084] – [0086]—a regression analysis is used with the curve to estimate a metric such as prediction performance for a larger shard or a probability of improvement).
Regarding Claim 15, Kobayashi teaches a non-transitory computer-readable medium or media comprising one or more sequences of instructions (fig. 2, RAM 102 and/or HDD 103; ¶ [0056], [0058], and claim 1) which, when executed by at least one of the one or more processors, causes steps to be performed comprising:
training a set of models on each shard from a set of shards in which the models from the set of model candidates vary in architecture, hyperparameters, or both, and the set of shards 
using a validation set to identify a best model for each shard from among the set of trained model candidates, in which each best model has a corresponding validation accuracy for that shard, which has a shard size (¶ [0062] – [0063] and [0069]);
fitting a power-law learning curve model using the validation accuracies and corresponding shard sizes of the best models selected for the shards (figs. 4 and 5; ¶ [0074] – [0075] and [0082] – [0083]—a curve is fitted to the graphs of shard size to prediction performance for each model to determine the improvement in performance for larger shard sizes); and
using the fitted power-law learning curve to predict a metric associated with a deep learning model (¶ [0084] – [0086]—a regression analysis is used with the curve to estimate a metric such as prediction performance for a larger shard or a probability of improvement).
Regarding Claim 2, Kobayashi teaches the step of randomly shuffling the data set to maximize likelihood that shards of the data set have similar data distribution to the data set (¶ [0070]—training datasets and testing datasets are randomly sampled {i.e. shuffled}, thus maximizing likelihood that shards of the data set have similar data distribution to the data set);
Regarding Claims 9 and 16, Kobayashi teaches wherein the set of shards are generated from a data set of training data (fig. 1; ¶ [0047] – [0048], [0064], and [0068]—the samples of various sample sizes are shards that are split from a set of training data) and the non-transitory 
randomly shuffling the data set to maximize likelihood that shards of the data set have similar data distribution to the data set (¶ [0070]—training datasets and testing datasets are randomly sampled {i.e. shuffled}, thus maximizing likelihood that shards of the data set have similar data distribution to the data set); and
splitting the data set into a set of shards such that the shard sizes span multiple orders of magnitude (¶ [0064]—the shards may double in size from one to the next, thus spanning multiple orders of magnitude).
Regarding Claims 3, 10, and 17, Kobayashi teaches wherein the step of splitting the data set into a set of shards such that the shard sizes span multiple orders of magnitude comprises splitting the data set into a set of shards such that the shard sizes span multiple orders of magnitude in steps of approximately twice a size of a prior shard's size (¶ [0064]—the shards may double in size from one to the next, thus spanning multiple orders of magnitude).
Regarding Claims 4, 12, and 18, Kobayashi teaches wherein the predicted metric is improvement in accuracy for the deep learning model given increase in training data set size (figs. 4 and 5; ¶ [0074] – [0075] and [0082] – [0083]—a calculated metric is improvement in prediction performance {accuracy} for each model to determine the improvement in performance for larger shard sizes).
Regarding Claims 5, 13, and 19, Kobayashi teaches wherein the predicted metric is one or more compute requirements for the deep learning model (figs. 4 and 5; ¶ [0074] – [0075] and 
Regarding Claims 7 and 11, Kobayashi teaches the step of using at least some of the data in the data set to form the validation set, in which none of the data in the validation set is shared with any of the shards (¶ [0062] and [0069] – [0070]—some of the data are used for a testing {validation} set, and are sampled such that the validation set is made up of different samples from the shards of the training sets).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kobayashi, as applied to claims 5, 13, and 15, above, in view of Li et al. (U.S. 2019/0325307, hereinafter “Li”).
Regarding Claims 6, 14, and 20, Kobayashi does not specifically teach wherein a compute requirement for the deep learning model comprises a predicted training data set size times a number of parameters of the deep learning model. However, Li teaches a compute requirement for a deep learning model comprises a predicted training data set size times a number of parameters of the deep learning model (¶ [0056]—a compute requirement is determined by a correspondence among dimensions including structures of deep learning models 
All of the claimed elements were known in Kobayashi and Li and could have been combined by known methods with no change in their respective functions. It therefore would have been obvious to a person of ordinary skill in the art at the time of filing of the applicant’s invention to combine the compute requirement determination using structures of a deep learning model and dataset size of Li with the parameters of the deep learning model and data set size of Kobayashi to yield the predictable result of wherein a compute requirement for the deep learning model comprises a predicted training data set size times a number of parameters of the deep learning model. One would be motivated to make this combination for the purpose of optimizing the scheduling and use of computing resources (Li, ¶ [0003]).

	
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. This art includes Figueroa, Rosa L., et al., “Predicting sample size required for classification performance,” BMC medical informatics and decision making 12.1 (2012): 1-10, which teaches training a model with multiple dataset sample sizes and fitting a power law curve to predict the dataset size needed for a desired neural network prediction accuracy.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAL W SCHNEE whose telephone number is (571)270-1918. The examiner can normally be reached M-F 7:30 a.m. - 6:00 p.m..

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on 303-297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAL SCHNEE/Primary Examiner, Art Unit 2129