DETAILED ACTION
This action is in response to claims filed 03/22/2021 for application 16/014503 filed 06/21/2018. Claims 1, 3, 8, 10, 15, and 17 are amended and claims 1, 3-8, 10-15, and 17-21 are pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tamano et al. ("Optimizing Multiple Machine Learning Jobs on MapReduce", hereinafter "Tamano") in view of Nykiel et al. ("MRShare: Sharing Across Multiple Queries in MapReduce, hereinafter "Nykiel").

Regarding claim 1, Tamano teaches A method for efficient machine and deep learning hyperparameter tuning in a distributed computing system (“Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required.” [Abstract; pg. 59; col 1, lines 1-4]), by a processor, comprising:
	collecting runtime metrics of each of a plurality of training iterations to identify candidate jobs (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. (Tamano discloses: The better pattern depends on the job characteristics. To execute jobs efficiently, we need to choose the best assignment among the various patterns. [pg. 59; col 2, lines 35-38]) For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters.” [pg. 59; col 2, lines 14-21]) to merge during an execution phase (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern.” [pg. 62; col 1, lines 24-26]), wherein the candidate jobs comprise hyperparameter search jobs based on a training dataset (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters. Fig. 1 shows two patterns.” [Fig. 1; pg. 59; col 2, lines 14-20]);
identifying the candidate jobs based on the collected runtime metrics (“To evaluate the proposed method, we implemented experimental MapReduce runtime based on the Message Passing Interface (MPI) and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment, which results in minimum execution time.” [pg. 60, left col, ¶2; note: Examiner is interpreting predicting to be equivalent to identifying. The prediction of optimal jobs is based off MapReduce runtime.]);
grouping the candidate jobs into job groups (“Twenty learning jobs with different parameters are assigned to the group. MapReduce runs on twenty nodes in parallel. On the other hand, the right pattern shows that the cluster is partitioned into ten groups. Each group consists of two nodes. Two learning jobs with different parameters are assigned to each group. Since there are ten groups, twenty jobs are executed in total.” [pg. 59; col 2, lines 22-28]); and
merging the job groups containing the candidate jobs together prior to executing the candidate jobs during the execution phase (“Since our runtime supports job integration, the forty jobs are integrated and executed so as not to read the data set forty times. Pattern B partitions the cluster into two groups and assigns twenty MapReduce jobs to each group. Twenty jobs are integrated and executed in each group.” [pg. 62; col 1, lines 9 – 13]), wherein the merging of the job groups for execution is performed for each of a plurality of accelerator devices performing the execution (“we have twenty nodes(i.e. accelerator devices) in a cluster and execute a learning job twenty times using different parameters. [pg. 59; col 2, lines 17-18]).  
However Tamano fails to explicitly teach wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations;
Nykiel teaches wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations (“We implemented our framework, MRShare, on top of Hadoop. However, it can be easily plugged in any MapReduce system. First, we get a batch of MapReduce jobs from queries collected in a short time interval T. The choice of T depends on the query characteristics and arrival times. Then, MultiSplitJobs is called to compute the optimal grouping of the jobs. Afterwards, the groups are rewritten, using a meta-map and a meta-reduce function. These are MRShare specific containers, for merged map and reduce functions of multiple jobs, which are implemented as regular map and reduce functions, and their functionality relies on tagging (explained below). The new jobs are then submitted for execution” [pg. 498, § 5. Implementing MRShare, ¶1; note: See [pg. 497, § 4.1.4 Improving SplitJobs] discloses plurality of training iterations.]);
Tamano and Nykiel are both in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm to collect queries as taught by Nykiel in order to group candidate jobs into job groups. One would have been motivated to make this modification in order to avoid redundancy and improve the execution time of the jobs. [§ 1. Introduction, Nykiel] 

Regarding claim 8, Tamano teaches A system for efficient machine and deep learning hyperparameter tuning in a distributed computing system (“Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required.” [Abstract; pg. 59; col 1, lines 1-4]), by a processor, comprising:
a processor executing instructions stored in a memory device; wherein the processor (“Each node consisted of Intel Xeon 2.00-GHz 4 Core, 12-GB memory, 178-MB/s HDD bandwidth, and 64-bit Linux (v2.6.26-2) and connected it to a 1-Gbps network. [pg. 64, § Experimental Settings, ¶1):
	collects runtime metrics of each of a plurality of training iterations to identify candidate jobs (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. (Tamano discloses: The better pattern depends on the job characteristics. To execute jobs efficiently, we need to choose the best assignment among the various patterns. [pg. 59; col 2, lines 35-38]) For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters.” [pg. 59; col 2, lines 14-21]) to merge during an execution phase (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern.” [pg. 62; col 1, lines 24-26]), wherein the candidate jobs comprise hyperparameter search jobs based on a training dataset (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters. Fig. 1 shows two patterns.” [Fig. 1; pg. 59; col 2, lines 14-20]);
identify the candidate jobs based on the collected runtime metrics (“To evaluate the proposed method, we implemented experimental MapReduce runtime based on the Message Passing Interface (MPI) and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment, which results in minimum execution time.” [pg. 60, left col, ¶2; note: Examiner is interpreting predicting to be equivalent to identifying. The prediction of optimal jobs is based off MapReduce runtime.]);
groups the candidate jobs into job groups (“Twenty learning jobs with different parameters are assigned to the group. MapReduce runs on twenty nodes in parallel. On the other hand, the right pattern shows that the cluster is partitioned into ten groups. Each group consists of two nodes. Two learning jobs with different parameters are assigned to each group. Since there are ten groups, twenty jobs are executed in total.” [pg. 59; col 2, lines 22-28]); and
merges the job groups containing the candidate jobs together prior to executing the candidate jobs during the execution phase (“Since our runtime supports job integration, the forty jobs are integrated and executed so as not to read the data set forty times. Pattern B partitions the cluster into two groups and assigns twenty MapReduce jobs to each group. Twenty jobs are integrated and executed in each group.” [pg. 62; col 1, lines 9 – 13]), wherein the merging of the job groups for execution is performed for each of a plurality of accelerator devices performing the execution (“we have twenty nodes(i.e. accelerator devices) in a cluster and execute a learning job twenty times using different parameters. [pg. 59; col 2, lines 17-18]).  
However Tamano fails to explicitly teach wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations;
Nykiel teaches wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations (“We implemented our framework, MRShare, on top of Hadoop. However, it can be easily plugged in any MapReduce system. First, we get a batch of MapReduce jobs from queries collected in a short time interval T. The choice of T depends on the query characteristics and arrival times. Then, MultiSplitJobs is called to compute the optimal grouping of the jobs. Afterwards, the groups are rewritten, using a meta-map and a meta-reduce function. These are MRShare specific containers, for merged map and reduce functions of multiple jobs, which are implemented as regular map and reduce functions, and their functionality relies on tagging (explained below). The new jobs are then submitted for execution” [pg. 498, § 5. Implementing MRShare, ¶1; note: See [pg. 497, § 4.1.4 Improving SplitJobs] discloses plurality of training iterations.]);
Tamano and Nykiel are both in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm to collect queries as taught by Nykiel in order to group candidate jobs into job groups. One would have been motivated to make this modification in order to avoid redundancy and improve the execution time of the jobs. [§ 1. Introduction, Nykiel] 

Regarding claim 15, Tamano teaches A computer program product for efficient machine and deep learning hyperparameter tuning in a distributed computing system, (“Recently, MapReduce has been used to parallelize machine learning algorithms. To obtain the best performance for these algorithms, tuning the parameters of the algorithms is required.” Abstract) by a processor, the computer program product embodied on a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising (“Each node consisted of Intel Xeon 2.00-GHz 4 Core, 12-GB memory, 178-MB/s HDD bandwidth, and 64-bit Linux (v2.6.26-2) and connected it to a 1-Gbps network. [pg. 64, § Experimental Settings, ¶1): 
an executable portion that collects runtime metrics of each of a plurality of training iterations to identify candidate jobs (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. (Tamano discloses: The better pattern depends on the job characteristics. To execute jobs efficiently, we need to choose the best assignment among the various patterns. [pg. 59; col 2, lines 35-38]) For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters.” [pg. 59; col 2, lines 14-21]) to merge during an execution phase (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern.” [pg. 62; col 1, lines 24-26]), wherein the candidate jobs comprise hyperparameter search jobs based on a training dataset (“When we execute a learning process using various parameters on MapReduce, there are various patterns for assigning multiple learning jobs to a cluster, and the total execution time varies depending on the assignment patterns. For example, we have twenty nodes in a cluster and execute a learning job twenty times using different parameters. Fig. 1 shows two patterns.” [Fig. 1; pg. 59; col 2, lines 14-20]);
an executable portion that identifies the candidate jobs based on the collected runtime metrics (“To evaluate the proposed method, we implemented experimental MapReduce runtime based on the Message Passing Interface (MPI) and executed logistic regression in four cases. The results showed that the proposed method can correctly predict the optimal job assignment, which results in minimum execution time.” [pg. 60, left col, ¶2; note: Examiner is interpreting predicting to be equivalent to identifying. The prediction of optimal jobs is based off MapReduce runtime.]);
an executable portion that groups the candidate jobs into job groups (“Twenty learning jobs with different parameters are assigned to the group. MapReduce runs on twenty nodes in parallel. On the other hand, the right pattern shows that the cluster is partitioned into ten groups. Each group consists of two nodes. Two learning jobs with different parameters are assigned to each group. Since there are ten groups, twenty jobs are executed in total.” [pg. 59; col 2, lines 22-28]); and
an executable portion that merges the job groups containing the candidate jobs together prior to executing the candidate jobs during the execution phase (“Since our runtime supports job integration, the forty jobs are integrated and executed so as not to read the data set forty times. Pattern B partitions the cluster into two groups and assigns twenty MapReduce jobs to each group. Twenty jobs are integrated and executed in each group.” [pg. 62; col 1, lines 9 – 13]), wherein the merging of the job groups for execution is performed for each of a plurality of accelerator devices performing the execution (“we have twenty nodes(i.e. accelerator devices) in a cluster and execute a learning job twenty times using different parameters. [pg. 59; col 2, lines 17-18]).  
However Tamano fails to explicitly teach wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations;
Nykiel teaches wherein the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations (“We implemented our framework, MRShare, on top of Hadoop. However, it can be easily plugged in any MapReduce system. First, we get a batch of MapReduce jobs from queries collected in a short time interval T. The choice of T depends on the query characteristics and arrival times. Then, MultiSplitJobs is called to compute the optimal grouping of the jobs. Afterwards, the groups are rewritten, using a meta-map and a meta-reduce function. These are MRShare specific containers, for merged map and reduce functions of multiple jobs, which are implemented as regular map and reduce functions, and their functionality relies on tagging (explained below). The new jobs are then submitted for execution” [pg. 498, § 5. Implementing MRShare, ¶1; note: See [pg. 497, § 4.1.4 Improving SplitJobs] discloses plurality of training iterations.]);
Tamano and Nykiel are both in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm to collect queries as taught by Nykiel in order to group candidate jobs into job groups. One would have been motivated to make this modification in order to avoid redundancy and improve the execution time of the jobs. [§ 1. Introduction, Nykiel] 

Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Tamano in view of Nykiel and further in view of Koch et al. (US 10,360,517 B2, hereinafter "Koch").

Regarding claim 3, Tamano and Nykiel teaches the method of claim 1, however the combination fails to explicitly teach further including caching the runtime metrics, the runtime metrics including at least a model size and an input dataset associated with the training dataset.
Koch teaches: further including caching the runtime metrics (“In an operation 622(Fig. 6A), the results(i.e runtime metrics) are stored in evaluation cache 316 and in model data 318 in association with the set of hyperparameter values.” [pg. 40; col 32, lines 1-3]), the runtime metrics including at least a model size and an input dataset associated with the training dataset. (“Evaluation cache 316, model data 318, and selected model data 320 are created from results(i.e. runtime metrics) generated by worker system 106.” [pg. 27; col 5, lines 8-10])
Tamano, Nykiel and Koch are all in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm and Nykiel’s query sharing algorithm to cache runtime metrics which include model data and selected model data as taught by Koch to further improve optimization in machine learning jobs. 

Regarding claim 4, the combination of Tamano, Nykiel and Koch teaches the method of claim 3, where Tamano further teaches: further including, pursuant to identifying the candidate jobs: collecting a physical memory size of each of the plurality of accelerator devices (“Assuming that we have a 40-GB data set, we observe the number of MapReduce jobs each node handles and the data size each node required to read for the jobs.” [pg. 62; col 1, lines 16-18]); grouping job requests according to at least one of a model parameter, the model size, and the input dataset (Fig 2.); and P201707942US0130using the model size and input dataset to compute a memory footprint for each training iteration (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern. The amount of computation is the product of data size and number of jobs.” [pg. 62; col 1, lines 24-28]).
Regarding claim 10, Tamano and Nykiel teaches the system of claim 8, however the combination fails to explicitly teach wherein the processor caches the runtime metrics, the runtime metrics including at least a model size and an input dataset associated with the training dataset.
Koch teaches: wherein the processor caches the runtime metrics (“In an operation 622(Fig. 6A), the results(i.e runtime metrics) are stored in evaluation cache 316 and in model data 318 in association with the set of hyperparameter values.” [pg. 40; col 32, lines 1-3]), the runtime metrics including at least a model size and an input dataset associated with the training dataset. (“Evaluation cache 316, model data 318, and selected model data 320 are created from results(i.e. runtime metrics) generated by worker system 106.” [pg. 27; col 5, lines 8-10])
Tamano, Nykiel and Koch are all in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm and Nykiel’s query sharing algorithm to cache runtime metrics which include model data and selected model data as taught by Koch to further improve optimization in machine learning jobs. 

Regarding claim 11, the combination of Tamano, Nykiel and Koch teaches the system of claim 10, where Tamano further teaches: wherein the processor, pursuant to identifying the candidate jobs: collects a physical memory size of each of the plurality of accelerator devices (“Assuming that we have a 40-GB data set, we observe the number of MapReduce jobs each node handles and the data size each node required to read for the jobs.” [pg. 62; col 1, lines 16-18]); groups job requests according to at least one of a model parameter, the model size, and the input dataset (Fig 2.); and P201707942US0130uses the model size and input dataset to compute a memory footprint for each training iteration (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern. The amount of computation is the product of data size and number of jobs.” [pg. 62; col 1, lines 24-28]).

Regarding claim 17, Tamano and Nykiel teaches the computer program product of claim 15, however the combination fails to explicitly teach further including an executable portion that caches the runtime metrics, the runtime metrics including at least a model size and an input dataset associated with the training dataset.
Koch teaches: further including an executable portion that caches the runtime metrics (“In an operation 622(Fig. 6A), the results(i.e runtime metrics) are stored in evaluation cache 316 and in model data 318 in association with the set of hyperparameter values.” [pg. 40; col 32, lines 1-3]), the runtime metrics including at least a model size and an input dataset associated with the training dataset. (“Evaluation cache 316, model data 318, and selected model data 320 are created from results(i.e. runtime metrics) generated by worker system 106.” [pg. 27; col 5, lines 8-10])
Tamano, Nykiel and Koch are all in the same field of endeavor of optimizing machine learning jobs and thus are analogous. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tamano’s machine learning algorithm and Nykiel’s query sharing algorithm to cache runtime metrics which include model data and selected model data as taught by Koch to further improve optimization in machine learning jobs.

Regarding claim 18, the combination of Tamano, Nykiel and Koch teaches the computer program product of claim 17, where Tamano further teaches: further including an executable portion that, pursuant to identifying the candidate jobs: collects a physical memory size of each of the plurality of accelerator devices (“Assuming that we have a 40-GB data set, we observe the number of MapReduce jobs each node handles and the data size each node required to read for the jobs.” [pg. 62; col 1, lines 16-18]); groups job requests according to at least one of a model parameter, the model size, and the input dataset (Fig 2.); and P201707942US0130uses the model size and input dataset to compute a memory footprint for each training iteration (“Table I summarizes how much data each node reads, how many jobs each node executes, and how much computation each node requires for each partitioning pattern. The amount of computation is the product of data size and number of jobs.” [pg. 62; col 1, lines 24-28]).

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Tamano in view of Nykiel and further in view of Koch and further in view of Panda et al. (PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce).

Regarding claim 5, the combination of Tamano, Nykiel and Koch teaches: The method of claim 4, however the combination fails to explicitly teach wherein grouping the job groups further includes grouping the candidate jobs in a tree structure, the tree structure organized based on the input dataset and the model size.  
Panda teaches wherein grouping the job groups further includes grouping the candidate jobs in a tree structure (“The Controller constructs a tree using a set of MapReduce jobs, each of which builds different parts of the tree. At any point, the model file contains the entire tree constructed so far.” [pg. 4; col 1 lines 6-10]), the tree structure organized based on the input dataset and the model size. (Each MapReduce job takes as input a set of nodes (N), the training data set (D ∗ ), and the current state of the model (M). The Controller schedules two types of MapReduce jobs.” [Fig. 1; pg. 4; col 1, lines 27-29])
Tamano, Nykiel, Koch, and Panda are all in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning algorithms of Tamano and Nykiel and distributed computing system taught by Koch with the classification and regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. 

Regarding claim 12, the combination of Tamano, Nykiel, and Koch teaches The system of claim 11, however the combination fails to explicitly teach wherein grouping the job groups further includes grouping the candidate jobs in a tree structure, the tree structure organized based on the input dataset and the model size.  
Panda teaches wherein grouping the job groups further includes grouping the candidate jobs in a tree structure (“The Controller constructs a tree using a set of MapReduce jobs, each of which builds different parts of the tree. At any point, the model file contains the entire tree constructed so far.” [pg. 4; col 1 lines 6-10]), the tree structure organized based on the input dataset and the model size. (Each MapReduce job takes as input a set of nodes (N), the training data set (D ∗ ), and the current state of the model (M). The Controller schedules two types of MapReduce jobs.” [Fig. 1; pg. 4; col 1, lines 27-29])
Tamano, Nykiel, Koch, and Panda are all in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning algorithms of Tamano and Nykiel and distributed computing system taught by Koch with the classification and regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. 

Regarding claim 19, the combination of Tamano, Nykiel and Koch teaches The computer program product of claim 18, however the combination fails to explicitly teach wherein grouping the job groups further includes grouping the candidate jobs in a tree structure, the tree structure organized based on the input dataset and the model size.  
Panda teaches wherein grouping the job groups further includes grouping the candidate jobs in a tree structure (“The Controller constructs a tree using a set of MapReduce jobs, each of which builds different parts of the tree. At any point, the model file contains the entire tree constructed so far.” [pg. 4; col 1 lines 6-10]), the tree structure organized based on the input dataset and the model size. (Each MapReduce job takes as input a set of nodes (N), the training data set (D ∗ ), and the current state of the model (M). The Controller schedules two types of MapReduce jobs.” [Fig. 1; pg. 4; col 1, lines 27-29])
Tamano, Nykiel, Koch, and Panda are all in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Koch discloses a distributed computing system that caches results which includes model data and selected model data. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the machine learning algorithms of Tamano and Nykiel and distributed computing system taught by Koch with the classification and regression tree models of Panda to further improve the efficiency of the claimed distributed computing system.

Claims 6, 7, 13, 14, 20, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Tamano in view of Nykiel and further in view of Panda.

Regarding claim 6, the combination of Tamano and Nykiel teaches the method of claim 1, however the combination fails to explicitly teach further including performing the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler.
Panda teaches further including performing the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler. (“At the heart of PLANET is the Controller, a single machine that initiates, schedules and controls the entire tree induction process. The Controller has access to a compute cluster on which it schedules MapReduce jobs.” [pg. 4; col 1, lines 1-4])
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs.

Regarding claim 7, the combination of Tamano, Nykiel, and Panda teaches the method of claim 6, where Panda further teaches wherein performing the merging, by the execution engine, further includes optimizing a model graph associated with the job groups including computing the merge request associated with the model graph to determine a cost of overall memory consumption (“In other words, we disabled the optimization to construct trees entirely in memory and limited forward scheduling to 1 MapReduce in order to evaluate the performance of the algorithm in a constrained (e.g. shared cluster) environment.” Fig. 3, Fig. 4 further shows results relating to running time and data size [pg. 9; col 2, lines 13-17]).
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs.

Regarding claim 13, the combination of Tamano and Nykiel teaches the system of claim 8, however the combination fails to explicitly teach wherein the processor performs the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler.
Panda teaches wherein the processor performs the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler. (“At the heart of PLANET is the Controller, a single machine that initiates, schedules and controls the entire tree induction process. The Controller has access to a compute cluster on which it schedules MapReduce jobs.” [pg. 4; col 1, lines 1-4])
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs. 

Regarding claim 14, the combination of Tamano, Nykiel and Panda teaches the system of claim 13, where Panda further teaches wherein performing the merging, by the execution engine, further includes optimizing a model graph associated with the job groups including computing the merge request associated with the model graph to determine a cost of overall memory consumption (“In other words, we disabled the optimization to construct trees entirely in memory and limited forward scheduling to 1 MapReduce in order to evaluate the performance of the algorithm in a constrained (e.g. shared cluster) environment.” Fig. 3, Fig. 4 further shows results relating to running time and data size [pg. 9; col 2, lines 13-17]).
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs.

Regarding claim 20, the combination of Tamano and Nykiel teaches the computer program product of claim 15, however the combination fails to explicitly teach further including an executable portion that performs the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler.
Panda teaches: further including an executable portion that performs the merging of the job groups within an execution engine upon receiving a merge request triggered by a scheduler. (“At the heart of PLANET is the Controller, a single machine that initiates, schedules and controls the entire tree induction process. The Controller has access to a compute cluster on which it schedules MapReduce jobs.” [pg. 4; col 1, lines 1-4])
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs.

Regarding claim 21, the combination of Tamano, Nykiel and Panda teaches the computer program product of claim 20, where Panda further teaches wherein performing the merging, by the execution engine, further includes optimizing a model graph associated with the job groups including computing the merge request associated with the model graph to determine a cost of overall memory consumption (“In other words, we disabled the optimization to construct trees entirely in memory and limited forward scheduling to 1 MapReduce in order to evaluate the performance of the algorithm in a constrained (e.g. shared cluster) environment.” Fig. 3, Fig. 4 further shows results relating to running time and data size [pg. 9; col 2, lines 13-17]).
Tamano, Nykiel, and Panda are in the same field of endeavor of optimizing machine learning jobs. Tamano discloses a machine learning algorithm that collects results, group jobs into job groups, and merges the job groups. Nykiel discloses collecting batches of queries and grouping and merging jobs based off queries. Panda discloses using job scheduling and regression tree models. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning algorithms of Tamano and Nykiel with the job scheduling, classification, regression tree models of Panda to further improve the efficiency of the claimed distributed computing system. One would be motivated to use a scheduler to reduce repetitive jobs.

Response to Arguments

Applicant’s arguments filed 03/22/2021 have been fully considered but they are not persuasive. Applicant’s arguments with respect to claims 1-21 have been considered but are not persuasive.

Regarding claims 1, 8, and 15
In response to applicant’s argument that Tamano fails to disclose “collecting runtime metrics of each of a plurality of training iterations to identify candidate jobs…” and “grouping the candidate jobs into job groups”. Tamano does disclose “collecting runtime metrics of each of a plurality of training iterations to identify candidate jobs” (See pg. 60, left col, ¶2). Furthermore, Tamano discloses “grouping the candidate jobs into job groups” (See pg. 59; col 2, lines 22-28).

In response to applicant’s argument that Tamano fails to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (“determining which of the jobs to group based on runtime information during previous iterations”) is not recited in the rejected claims. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Regarding applicant’s argument with respect to claims 1, 8, and 15 that Tamano fails to disclose “the candidate jobs are grouped into the job groups according to the collected runtime metrics determined during the plurality of training iterations.” have been considered but are moot because the newly amended limitation is addressed by the new art presented by Nykiel. 

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 




Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        




/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122