DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The present application is being examined under the claims filed on 12/17/2020.
Claims 1-6 are amended.
Claims 1-6 are rejected.
Claims 1-6 are pending.

Drawings
The Drawings filed on 06/30/2017 are acceptable for examination purposes.

Specification
The Specification filed on 06/30/2017 is acceptable for examination purposes.

Response to Arguments
In reference to Priority Document
Examiner notes that Box 12 of PTOL-326 reflects the priority under 35 USC § 119.

In reference to Objection to the Title


In reference to Rejections under 35 USC § 101
Examiner notes that the Rejections under 35 USC § 101 have been withdrawn in view of amendments. Particularly because the newly amended limitations of “executing the […] machine learning algorithm using the […] training dataset size […]” are being interpreted as training the machine learning algorithm using the training dataset size. If applicant disagrees with Examiner’s interpretation of the newly amended limitation as training a machine learning algorithm, the applicant should indicate this on the record.

In reference to Rejections under 35 USC § 103
Applicant asserts that the cited references do not disclose “executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size”.
Examiner respectfully disagrees. Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped. 
Applicant's arguments filed 12/17/2020 have been fully considered but they are not persuasive. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the 

Claims 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta et al. (hereinafter Gupta) “Model Accuracy and Runtime Tradeoff in Distributed Deep Learning” in view of Azimi et al. (hereinafter Azimi) “Hybrid Batch Bayesian Optimization”.
In reference to claim 1. Gupta teaches a non-transitory computer-readable storage medium storing a computer program (Gupta in at least § 4.1) that causes a computer to perform a procedure comprising:
“calculating, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, first estimated prediction performance scores and first estimated runtimes for a case of executing the […] machine learning algorithm using each of a plurality of second training dataset sizes larger than the first training dataset size” (Gupta in at least § 3.1, and § 5 to § 6 discloses calculating, based on execution results obtained by executing the first machine learning algorithm using the one or more training dataset sizes, a first estimated prediction performance scores and first estimated runtimes for a case of executing the first machine learning algorithm using each of two or more training dataset sizes different from the one or more training dataset sizes. See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);
 “determining, […], the first estimated prediction performance scores, and the first estimated runtimes, a third training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based on the first estimated prediction performance scores, and the first estimated runtimes, a first training dataset size to be used when the first machine learning algorithm is executed next Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);
“calculating, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, second estimated prediction performance scores and second estimated runtimes for a case of executing the […] machine learning algorithm using each of the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses calculating a second estimated prediction performance scores and second estimated runtimes for a case of executing the second machine learning algorithm using each of two or more training dataset sizes different from the one or more training dataset sizes. See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);
“determining, […], the second estimated prediction performance scores, and the second estimated runtimes, a fourth training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based the second estimated prediction performance scores, and the second estimated runtimes, a second training dataset size to be used when the second machine learning algorithm is executed next time. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);

Gupta does not explicitly disclose:
“identifying a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using a first training dataset size”;
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms”;
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores”;
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms”;
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size”; and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size”.
However, Azimi discloses:
“identifying a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using a first training dataset size” (Azimi in at least § 3 and § 4 discloses identifying a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using one or more training dataset sizes);
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses a first machine learning algorithm having 
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores” (Azimi in at least § 3 and § 4 discloses the maximum prediction performance score);
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms);
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped); and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped).


In reference to claim 2. Gupta and Azimi teaches the non-transitory computer-readable storage medium according to claim 1 (as mentioned above), wherein:
Gupta further discloses:
“the determining the third training dataset size includes calculating, for each of the plurality of second training dataset sizes, […], the first estimated prediction performance scores, and the first estimated runtimes, a first increase rate indicating an increment in the maximum 
“determining the third training dataset size based on calculated first increase rates” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining the first training dataset size based on calculated first increase rates), and 
“the determining the fourth training dataset size includes calculating, for each of the plurality of second training dataset sizes, […], the second estimated prediction performance scores, and the second estimated runtimes, a second increase rate indicating an increment in the maximum prediction performance score per unit time” (Gupta in at least § 3.1, and § 5 to § 6 discloses wherein the determining the second training dataset size includes calculating, for each of the two or more training dataset sizes, based the second estimated prediction performance scores, and the second estimated runtimes, a second increase rate indicating an increment in the maximum prediction performance score per unit time), and
“determining the fourth training dataset size based on the calculated second increase rates” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining the second training dataset size based on the calculated second increase rates).

Azimi further discloses:
“the maximum prediction performance score” (Azimi in at least § 3 and § 4 discloses the maximum prediction performance score);

In reference to claim 3. Gupta and Azimi teaches the non-transitory computer-readable storage medium according to claim 2 (as mentioned above), wherein:
Gupta further discloses:
“the determining the third training dataset size includes setting, when a maximum first increase rate amongst the calculated first increase rates is higher than a maximum second increase rate amongst the calculated second increase rates, the third training dataset size larger than a training dataset size associated with the maximum first increase rate” (Gupta in at least § 3.1, and § 5 to § 6 discloses wherein the determining the first training dataset size includes setting, when a maximum first increase rate amongst the calculated first increase rates is higher than a maximum second increase rate amongst the calculated second increase rates, the first training dataset size larger than a training dataset size associated with the maximum first increase rate. Examiner notes that Gupta discloses setting different training data sizes as shown by the many different configurations. Moreover, Azimi also discloses setting different training data sizes in at least § 4).

In reference to claim 4. Gupta and Azimi teaches the non-transitory computer-readable storage medium according to claim 2 (as mentioned above), wherein:
Gupta further discloses:
“the determining the fourth training dataset size includes setting, when the second estimated prediction performance scores and the second estimated runtimes satisfy a predetermined condition, the fourth training dataset size smaller than a training dataset size associated with a maximum second increase rate amongst the calculated second increase rates” (Gupta in at least § 3.1, and § 5 to § 6 discloses wherein the determining the second training dataset size includes setting, when the second estimated prediction performance 

In reference to claim 5. Gupta teaches a machine learning management apparatus comprising:
“a memory configured to store information on a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using a first training dataset size” (Gupta in at least § 4.1);
“a processor configured to perform a procedure” (Gupta in at least § 4.1) including:
“calculating, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, first estimated prediction performance scores and first estimated runtimes for a case of executing the […] machine learning algorithm using each of a plurality of second training dataset sizes larger than the first training dataset size” (Gupta in at least § 3.1, and § 5 to § 6 discloses calculating, based on execution results obtained by executing the first machine learning algorithm using the one or more training dataset sizes, a first estimated prediction performance scores and first estimated runtimes for a case of executing the first machine learning algorithm using each of two or more training dataset sizes different from the one or more training dataset sizes. See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on
 “determining, […], the first estimated prediction performance scores, and the first estimated runtimes, a third training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based on the first estimated prediction performance scores, and the first estimated runtimes, a first training dataset size to be used when the first machine learning algorithm is executed next time. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);
“calculating, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, second estimated prediction performance scores and second estimated runtimes for a case of executing the […] machine learning algorithm using each of the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses calculating a second estimated prediction performance scores and second estimated runtimes for a case of executing the second machine learning algorithm using each of two or more training dataset sizes different from the one or more training dataset sizes. See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on), and
“determining, […], the second estimated prediction performance scores, and the second estimated runtimes, a fourth training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based the second estimated prediction performance scores, and the second estimated runtimes, a second training dataset size to be used when the second machine learning algorithm is executed next time. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on).

Gupta does not explicitly disclose:
“identifying a maximum prediction performance score amongst the prediction performance scores”;
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms”;
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores”;
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms”;
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size”; and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size”.
However, Azimi discloses:
“identifying a maximum prediction performance score amongst the prediction performance scores” (Azimi in at least § 3 and § 4 discloses identifying a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using one or more training dataset sizes);
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms);
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores” (Azimi in at least § 3 and § 4 discloses the maximum prediction performance score);
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms);
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped); and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as 
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Gupta and Azimi. Gupta teaches a new learning rate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Azimi teaches a systematic way to analyze the performance and limits of simulation-based batch Bayesian Optimization methods, and an algorithm that at each step decides whether or not to pick another query to add to the current batch, and as such dynamically determines the appropriate batch size at each step. It would be obvious apply the disclosed “learning rate modulation strategy to counter the effect of stale gradients” and apply the disclosed “synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy” to the disclosure of Azimi. One of ordinary skill would have motivation to combine Gupta and Azimi because MPEP 2143 sets forth the Supreme Court rationales for obviousness including: (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results; (E) "Obvious to try" choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success; (F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art. 

In reference to claim 6. Gupta teaches a machine learning management method comprising:
“calculating, by the processor, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, first estimated prediction performance scores and first estimated See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on), and
 “determining, by the processor, […], the first estimated prediction performance scores, and the first estimated runtimes, a third training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based on the first estimated prediction performance scores, and the first estimated runtimes, a first training dataset size to be used when the first machine learning algorithm is executed next time. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on);
“calculating, by the processor, for a […] machine learning algorithm […], based on an execution result obtained by executing the […] machine learning algorithm using the first training dataset size, second estimated prediction performance scores and second estimated runtimes for a case of executing the […] machine learning algorithm using each of the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses calculating a second estimated prediction performance scores and second estimated runtimes for a case of executing the second machine learning algorithm using each of two or more training dataset sizes different from the one or more training dataset sizes. See at least table 2, examiner notes that μ: mini-batch size. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on), and
“determining, by the processor, […], the second estimated prediction performance scores, and the second estimated runtimes, a fourth training dataset size among the plurality of second training dataset sizes” (Gupta in at least § 3.1, and § 5 to § 6 discloses determining, based the second estimated prediction performance scores, and the second estimated runtimes, a second training dataset size to be used when the second machine learning algorithm is executed next time. Examiner notes that in table 2, Gupta discloses multiple training datasets with different sizes, i.e. μ = 4, μ = 8, μ = 16, and so on).

Gupta does not explicitly disclose:
“identifying, […], a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using a first training dataset size”;
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms”;
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores”;
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms”;
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size”; and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size”.
However, Azimi discloses:
“identifying, […], a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using a first training dataset size” (Azimi in at least § 3 and § 4 discloses identifying a maximum prediction performance score amongst a plurality of prediction performance scores corresponding to a plurality of models generated by executing each of a plurality of machine learning algorithms using one or more training dataset sizes);
“a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses a first machine learning algorithm having generated a model corresponding to the maximum prediction performance score amongst the plurality of machine learning algorithms);
“determining, based on the maximum prediction performance score, the […] estimated prediction performance scores” (Azimi in at least § 3 and § 4 discloses the maximum prediction performance score);
“a second machine learning algorithm different from the first machine learning algorithm amongst the plurality of machine learning algorithms” (Azimi in at least § 3 and § 4 discloses 
“executing the first machine learning algorithm using the third training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the third training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped); and
“executing the second machine learning algorithm using the fourth training dataset size while skipping one or more second training dataset sizes between the first training dataset size and the fourth training dataset size” (Azimi in at least Fig. 2 discloses multiple algorithms executed using a batch size of 5, while Fig. 3 discloses multiple algorithms executed using a batch size of 10. Examiner notes that the skipping is being interpreted as only using a particular batch size for training. I.e. in Fig. 2 only a batch size 5 is used and in Fig. 3 only a batch size 10 is used. Examiner notes that the other batch sizes are skipped).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Gupta and Azimi. Gupta teaches a new learning rate modulation strategy to counter the effect of stale gradients and propose a new synchronization protocol that can effectively bound the staleness in gradients, improve runtime performance and achieve good model accuracy. Azimi teaches a systematic way to analyze the performance and limits of simulation-based batch Bayesian Optimization methods, and an algorithm that at each step decides whether or not to pick another query to add to the current batch, and as such dynamically determines the appropriate batch size at each step. It would be obvious apply the disclosed “learning rate modulation strategy to counter . 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Viker A. Lamardo whose telephone number is (571)270-5871.  The examiner can normally be reached on Mon. - Fri. 9 AM - 5 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on (571)272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/VIKER A LAMARDO/Examiner, Art Unit 2126