DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim(s) 3, 10, 17 is/are objected to because of the following informalities:  
Terms PCIe, vCPU are being used without being defined.  Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-3, 8-12 and 15-17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 8, 9, 11, 12, 15 and 16 of U.S. Patent No. 11,113,093. Although the claims at issue are not identical, they are not patentably distinct from each other because claims 1, 2, 8, 9, 11, 12, 15 and 16 anticipates claims 1-3, 8-12 and 15-17 of instant application.

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Multilevel Interference-aware Scheduling on Modern GPUs by Leiming Yu (April 2019) (hereafter Leiming).

As per claim 1, Leiming teaches:
A system comprising: 
at least one computing device comprising at least one processor; and; 
at least one data store comprising machine readable instructions, wherein the instructions, when executed by the at least one processor, cause the at least one computing device to at least: ([Page 3]  [Fig. 2.1], GPUs and CPUs present two different design points in terms of architectural philosophy.  CPUs are designed to execute a small number of complex tasks, whereas GPUs are designed to execute a larger number of simpler tasks. CPUs have a small number of registers per core for a given task. Context switching between tasks on a CPU is relatively expensive since the live registers need to be stored and restored, accessing main memory…)

input, by a scheduling service executed by the at least one computing device, parameters comprising at least one baseline parameter for at least one workload currently assigned to a particular graphics processing unit (GPU) into an interference model that predicts interference between a particular workload and the at least one workload of the particular GPU in a computing environment comprising a plurality of GPUs; ([Page 29], Workloads scheduled on the same node will compete for shared resources, frequently leading to performance degradation. By characterizing shared resource utilization of each workload, a cloud system can minimize the degree of interference. In this section, we present previously published work on GPU workloads scheduling.   [Page 57], We run a similarity-based scheduler to demonstrate the effectiveness of the chosen parameters.  [Page 8-9], Figure 1.4: Speedup of Rodinia benchmarks using 2 concurrent kernels, versus a non-concurrent implementation, on a NVIDIA GTX 950 GPU. Four different block sizes are considered.  Figure 1.4 shows the speedup achieved by using CKE for 6 applications from the Rodinia benchmark suite [15], as run on an NVIDIA GTX 950. Speedup is measured versus a non-CKE baseline. Applications are configured using four different block sizes. As observed in Figure 1.4, the applications exhibit variations in speedup due to changes in block size when processing the same dataset. In some cases, such as CFD with a block size 64, CKE can actually reduce performance. Figure 1.4 highlights the complex relationship between computational characteristics and the grid properties (block size), as they both impact application performance when using CKE. Hence, it is challenging to determine the best configuration of using concurrent kernel execution to maximize performance. Typically, a developer will have to compile and run their program many times to obtain the best parameters. We attempt to tackle this problem by developing a model-based concurrent kernel analysis in this thesis.  [Page 25], Instead of focusing on predicting kernel performance, Boyer et al. proposed a data transfer model to improve the overall GPU performance prediction [13]. The data transfer model is a simple linear regression model where the intercept (i.e., the transfer overhead) and the slope (the ratio of the elapsed time over the transfer size) are learned through benchmarking…  [Page 30], By evaluating the protocols on graphics rendering tasks, they observed that multiple GPU-accelerated graphics applications running concurrently can be correctly prioritized and isolated. Then, they developed Gdev at an operating system level to enhance GPU resource manangement [51]. They adopted the same concept of Memory-Copy Transaction scheduling to split transactions, such that the staging overhead of moving data from pageable memory to pinned memory can be hidden. To improve the bandwidth utilization for virtual GPUs, they proposed a bandwidth-aware non-preemptive device (BAND) scheduling algorithm. BAND does not reduce the priority when the resource budget is exhausted, and it adds ”time-buffering” for bursty workloads to achieve fairness. [Page 31], the scheduler implements slot sharing, which divides graphics memory into several slots and dedicates a single slot to each vGPU…  [Page 70], Figure 5.10: Relative performance comparing a GTX 760 and a GTX 950 for six Rodinia GPU benchmarks. The GTX 950 performance is used as the baseline, which is shown as the 1x speedup.  [Page 69], To train the neural network model, we use the metrics generated from the NVIDIA command line profiler [71]. Two metrics sets, FeatAll and Feat9, are included in our analysis so that the benefits of using the reduced feature sets for interference-aware scheduler can be evaluated. GPU workloads from six popular open source benchmark suites are studied (described in Chapter 5.6.3). For each workload, we run on real (versus simulated) GPU hardware to obtain a ground truth execution  [Page 74], T Dedicatei is the runtime for i th application when it is the only application executing on the GPU…  [Page 62], We also measure the dedicated execution runtime (T Dedicate) for each application.)


identify, by the scheduling service, an output from the interference model, the output comprising a predicted interference corresponding to placement of the particular workload on the particular GPU; and ([Page X], It delivers an estimate of the performance ceiling by taking into account data transfers and GPU kernel execution behavior. Moka also provides guidance to find the best performing kernel-stream mapping, quickly identifying the best CKE configuration, resulting in improved performance and the highest utilization of the GPU. In addition, a machine-learning based interference-aware scheduler named Magic was developed to improve the system throughput for multitasking on GPUs. Magic framework implements offline short profiling analysis to study the important interference metrics and conducts interference sensitivity prediction for GPU workloads based on the selected machine learning models. Our scheduler outperforms a state-of-art similarity-based scheduler on a single GPU system and achieves a high system throughput compared to the least-loaded policy on a multi-GPU system.  [Page 55-56], Thus, it is key to characterize and accurately predict the interference impact before launching the concurrent execution. An efficient GPU workload scheduler should be aware of the characteristics of queued workloads and select the appropriate pair for co-execution with the goal of avoiding interference and achieving the best overall throughput.  In this chapter, we present Magic, a machine learning based interference-aware scheduler for GPU workloads. Magic utilizes profiling metrics, provides automated feature extraction to characterize GPU workloads, and predicts the sensitivity in terms of potential interference for concurrently scheduled applications. In addition, we also add support for clusters equipped with multiple GPUs from different GPU generations and configurations.  [Page 69], To train the neural network model, we use the metrics generated from the NVIDIA command line profiler [71]. Two metrics sets, FeatAll and Feat9, are included in our analysis so that the benefits of using the reduced feature sets for interference-aware scheduler can be evaluated. GPU workloads from six popular open source benchmark suites are studied (described in Chapter 5.6.3). For each workload, we run on real (versus simulated) GPU hardware to obtain a ground truth execution..  [Page 53], Figure 4.13: Modeling concurrent kernel execution of vector addition, matrix multiplication and pathfinder using Moka. Prediction results on six different combinations of launch order are illustrated.)
assign, by the scheduling service, the particular workload to the particular GPU based on the predicted interference corresponding to a minimum predicted interference among a plurality of predicted interferences for at least a subset of the plurality of GPUs. ([Page 32 and Page 40-42], As more and more applications are streamed onto the GPU for concurrent execution, delivering guaranteed performance is a non-trial problem. A number of interference factors can lead to performance degradation. Jog et al. showed that competing for DRAM bandwidth can severely impact concurrent kernel execution performance [45]. They proposed a first-ready round-robin policy, versus a strict first-come-first-serve policy, to improve the fairness between collocated applications. Phull et al. observed that a major source of GPU interference is due the kernel runtime and the kernel launch frequency, where GPUs are time-shared among jobs [82]. Later, Chen et al. pointed out four major factors that could lead to long tail latency: 1) the duration and occupancy of GPU kernels, 2) the kernel scheduling order, 3) the number of kernels, and 4) contention on the PCIe bandwidth [17]. Based on these observations, they further developed a performance prediction model for interference-aware scheduling [16]. Mystic, proposed by Y. Ukidave et al. , utilizes a short profile of a GPU application and applies Collaborative Filtering to predict the full profile information [97]. After computing the similarity distance based on the predictive features, the least similar application will be dispatched to the target GPU node to minimize the concurrent running interference.  [Page 59], To compare the benefits of using different feature sets, we use 13 GPU applications from the CUDA SDK, as shown in Table 5.2. This subset includes a range of domains, including linear algebra, computational finance, image processing and popular GPU libraries. Here, we use a state-of-art similarity-based approach, the same as used in the Mystic framework, to dispatch these workloads[97]. Similarity-based scheduling analyzes the resource usage patterns (i.e., similarities) among GPU applications and co-locates workloads with the least similar usage pattern to minimize the potential interference.  [Page 61], Figure 5.4: Performance impact when co-executing two GPU applications on a single NVIDIA GTX 1080Ti GPU. For each test, App2 is selected as the least similar application to co-run with App1. The dashed line shows the QoS. The higher the speedup, the lower the interference, and vice versa.  [Page 71], To dipatch a GPU workload, we first identify the least-loaded GPU in the cluster using the GPU Status Table (see Figure 5.1). For the selected GPU, the InferBin scheduling (see Algorithm 4) is applied, where we first run our interference prediction model to analyze the interference sensitivity of the workload. The dispatching sequence in the queue is reordered by prioritizing the interference-insensitive workloads and sorting them according to job size.  [Page 27], Adriaens et al. proposed spatial multitasking for concurrent kernel execution, where each kernel uses a subset of the GPU resources.)

As per claim 2, rejection of claim 1 is incorporated:
Leiming teaches wherein the at least one baseline parameter comprises at least one of: at least one average kernel length, and a ratio between a first number of short kernels that execute within a time slice length of the particular GPU and a second number of long kernels that execute for longer than the time slice length. ([Page 33-34], An overview of Moka framework is presented in Figure 4.1. At first, GPU applications are profiled to collect application characteristics. Based on the profiled metrics, Moka classifies a GPU kernel as either compute-intensive and memory-intensive, and suggests the best block size accordingly. After tuning kernel performance, Moka starts modeling the execution of concurrent kernels encapsulated in the CUDA streams, taking into account data transfers and kernel execution behavior. Equipped with knowledge of the dynamic runtime characteristics, as well as the static
properties of the kernels, Moka includes a data transfer model and a kernel execution model to estimate the CKE performance for an entire application. Since our work is based on the NVIDIA GPU architecture, CUDA terminology is used, but our work can easily be adapted to other GPU standards.  [Page 43], The AvgBlkExe assumes each block has an identical lifetime. In a round-robin fashion, batches of threads blocks are dispatched to each SM. When device resources are fully occupied, the rest of the blocks will be launched in future iterations. The AvgBlkExe metric measures the average block execution time from the kernel execution time, using Equation 4.12…  [Page 54], Our proposed CKE performance model Moka uses average block execution to model kernel execution and includes multiple performance factors to estimate the impact of resource competition on the device.)

As per claim 3, rejection of claim 2 is incorporated:
Leiming teaches wherein the at least one baseline parameter further comprises at least one of: a GPU utilization, a PCIe read bandwidth, a PCIe write bandwidth, a vCPU utilization, and a workload memory utilization for the at least one workload. ([Page 32 and Page 40-42], Later, Chen et al. pointed out four major factors that could lead to long tail latency: 1) the duration and occupancy of GPU kernels, 2) the kernel scheduling order, 3) the number of kernels, and 4) contention on the PCIe bandwidth [17]. Based on these observations, they further developed a performance prediction model for interference-aware scheduling…  [Page 17], Concurrent kernel execution (CKE) was first supported on NVIDIA Fermi GPUs [27]. It allows additional kernels to run on the device when a single kernel does not fully utilize the available resources. Up to 16 concurrent kernels are supported on the Fermi architecture. CKE exposes opportunities for small kernels to run concurrently in order to maximize resource utilization and improve the overall throughput of a GPU application. For a large kernel, CKE can also improve the throughput by mapping a single large kernel into smaller kernels, overlapping data transfers with kernel execution. As shown in Figure 2.2, data transfers and kernel computation are divided into two parts, where each part is implemented as a stream for concurrent kernel execution. The overlapped execution between kernel execution and data transfer can been observed. Eventually, leveraging 2 CKEs, the previous elapsed time is reduced by 30%, achieving a 1.3x performance improvement.  [Page 44-46], Figure 4.5: The average block execution pattern for two kernels on a GPU with two streaming multiprocessors.  [Page 42], When a kernel is dispatched on a device, the GigaThread Engine on the NVIDIA GPU is in charge of scheduling thread blocks on each streaming multiprocessor (SM). The thread blocks are issued in a round-robin fashion based on the leftover policy [43][80]. Given the independent nature of thread blocks on a GPU, each block contains the same instructions, so we propose using the Average Block Execution (AvgBlkExe) to model kernel execution.  [Page 54], In this chapter, we proposed an empirical model for concurrent kernel execution on the NVIDIA Maxwell GPU architecture. The major contributions of our modeling scheme are to tune GPU kernels by predicting the best block size and estimate CKE performance of multi-kernel applications using profiled performance counters. Our proposed CKE performance model Moka uses average block execution to model kernel execution and includes multiple performance factors to estimate the impact of resource competition on the device.  [Page 65-66], Eight popular classification models are introduced to the model pool Decision Trees, KNearest Neighbor, Support Vector Machines, Random Forest, Neural Networks, Adaboost, Gaussian Naive Bayes and Quadratic Discriminant Analysis [1]. Greedy search is applied to find the best configuration for each model (see FindBestModelParam in Figure 5.7). To identify the candidate, we choose the model that generates the lowest average error over all k folds. Next, we run k-fold cross validation again for all of the models in the model pool, using their best configurations (see FindBestModel in Figure 5.7). Finally, the model with the lowest error is selected for our interferenceaware scheduler. The workflow for our interference analysis is summarized in Figure 5.7.  [Page 70], Figure 5.10: Relative performance comparing a GTX 760 and a GTX 950 for six Rodinia GPU benchmarks. The GTX 950 performance is used as the baseline, which is shown as the 1x speedup.)

As per claim 4, rejection of claim 1 is incorporated:
Leiming teaches wherein the machine readable instructions, when executed by the at least one processor, cause the at least one computing device to at least: 
identify a plurality of available GPUs, wherein the subset of the plurality of GPUs correspond to the plurality of available GPUs; and
 determine the plurality of predicted interferences comprising a respective predicted interference for a respective one of the plurality of available GPUs. ([Page 6], If a kernel has a small number of threads, occupying a fraction of the available hardware threads on the GPU, a second kernel can be dispatched to run concurrently and improve the occupancy on the GPU. The occupancy metric represents the ratio of active running threads to the maximum available hardware threads available on the device. High occupancy means a high degree of thread utilization.  [Page 31], As GPU applications in VMs compete for shared resources, not every application has enough parallelism to fully utilize the available GPU resources.  To optimize gaming applications in the cloud, Qi et al. proposed the VGRIS scheduler to address a range of performance requirements [84]. Three scheduling policies are included in the scheduler: 1) SLA-aware, 2) Proportional Share and 3) Hybrid. SLA-aware scheduling provides the minimum amount of GPU resources to each VM, while Proportional-Share scheduling distributes resources based on the priority. Hybrid scheduling applies SLA-aware first, and then switches to Proportional-Share scheduling if additional resources are available. [Page 55-56], Thus, it is key to characterize and accurately predict the interference impact before launching the concurrent execution. An efficient GPU workload scheduler should be aware of the characteristics of queued workloads and select the appropriate pair for co-execution with the goal of avoiding interference and achieving the best overall throughput.  In this chapter, we present Magic, a machine learning based interference-aware scheduler for GPU workloads. Magic utilizes profiling metrics, provides automated feature extraction to characterize GPU workloads, and predicts the sensitivity in terms of potential interference for concurrently scheduled applications. In addition, we also add support for clusters equipped with multiple GPUs from different GPU generations and configurations.  [Page 69], To train the neural network model, we use the metrics generated from the NVIDIA command line profiler [71]. Two metrics sets, FeatAll and Feat9, are included in our analysis so that the benefits of using the reduced feature sets for interference-aware scheduler can be evaluated. GPU workloads from six popular open source benchmark suites are studied (described in Chapter 5.6.3). For each workload, we run on real (versus simulated) GPU hardware to obtain a ground truth execution..  [Page 53], Figure 4.13: Modeling concurrent kernel execution of vector addition, matrix multiplication and pathfinder using Moka. Prediction results on six different combinations of launch order are illustrated.)

As per claim 5, rejection of claim 4 is incorporated:
Leiming teaches wherein the machine readable instructions, when executed by the at least one processor, cause the at least one computing device to at least: 
input at least one baseline parameter for the respective one of the plurality of available GPUs to determine the respective predicted interference. ([Page 70], Figure 5.10: Relative performance comparing a GTX 760 and a GTX 950 for six Rodinia GPU benchmarks. The GTX 950 performance is used as the baseline, which is shown as the 1x speedup.  [Page 74], T Dedicatei is the runtime for i th application when it is the only application executing on the GPU…  [Page 62], We also measure the dedicated execution runtime (T Dedicate) for each application. [Page 55-56], Thus, it is key to characterize and accurately predict the interference impact before launching the concurrent execution. An efficient GPU workload scheduler should be aware of the characteristics of queued workloads and select the appropriate pair for co-execution with the goal of avoiding interference and achieving the best overall throughput.  In this chapter, we present Magic, a machine learning based interference-aware scheduler for GPU workloads. Magic utilizes profiling metrics, provides automated feature extraction to characterize GPU workloads, and predicts the sensitivity in terms of potential interference for concurrently scheduled applications. In addition, we also add support for clusters equipped with multiple GPUs from different GPU generations and configurations.  [Page 69], To train the neural network model, we use the metrics generated from the NVIDIA command line profiler [71]. Two metrics sets, FeatAll and Feat9, are included in our analysis so that the benefits of using the reduced feature sets for interference-aware scheduler can be evaluated. GPU workloads from six popular open source benchmark suites are studied (described in Chapter 5.6.3). For each workload, we run on real (versus simulated) GPU hardware to obtain a ground truth execution..  [Page 53], Figure 4.13: Modeling concurrent kernel execution of vector addition, matrix multiplication and pathfinder using Moka. Prediction results on six different combinations of launch order are illustrated.)

As per claim 6, rejection of claim 5 is incorporated:
Leiming teaches wherein the parameters input into the interference model further comprise at least one baseline parameter for the particular workload. ([Page 56-57], Therefore, it is key to understand the resource requirements of each pending workload so the GPU workload scheduler can make better decisions when it comes to minimizing interference. In this section, we utilize Principle Feature Analysis to select features that can best characterize resource requirements of GPU workloads. We run a similarity-based scheduler to demonstrate the effectiveness of the chosen parameters.)

As per claim 7, rejection of claim 1 is incorporated:
Leiming teaches wherein the interference model identifies a one-to-one interference between two workloads, and the predicted interference identified as an output from the interference model corresponds to a worst-case predicted interference among at least one interference calculated for the at least one workload currently assigned to the particular GPU. ([Page 61], Figure 5.4: Performance impact when co-executing two GPU applications on a single NVIDIA GTX 1080Ti GPU. For each test, App2 is selected as the least similar application to co-run with App1. [Page 62], Here, we consider a 20% slowdown for co-located kernels as our Quality-of-Service (QoS)
target, which is shown as the dashed line in Figure 5.4. FeatAll performs the best, where 7 out of 26 cases violate QoS, observing a 12% slowdown on average. FeatMystic produces 14 out of 26 cases below the QoS threshold, with an average slowdown of 22%. Surprisingly, Feat9 achieves an average slowdown of 16%, where 9 out of 26 cases missed the 20% QoS target. When using fewer than 10% of the metrics from FeatAll, Feat9 achieves a comparable performance, which ranks as the second best in terms of the average slowdown. In Table 5.3, we list the subset of 9 prominent features learned. These metrics provide coverage of several important device resources, including streaming multiprocessors, system memory, global memory, shared memory and cache.  [Page 9], Each violin plot shows the worst-case slowdown (top bar), the best-case slowdown (bottom bar) and the average slowdown (middle bar).)

As per claims 8-10 and 14.  These are method claims corresponding to the system claims 1-3 and 4.  Therefore, rejected based on similar rationale.

As per claim 11, rejection of claim 8 is incorporated:’
Leiming teaches training a plurality of interference models to predict interference based on measured interferences and a respective set of the at least one baseline parameter corresponding to a respective one of a plurality of workloads comprising the workload. ([Page 65-66], Eight popular classification models are introduced to the model pool Decision Trees, KNearest Neighbor, Support Vector Machines, Random Forest, Neural Networks, Adaboost, Gaussian Naive Bayes and Quadratic Discriminant Analysis [1]. Greedy search is applied to find the best configuration for each model (see FindBestModelParam in Figure 5.7). To identify the candidate, we choose the model that generates the lowest average error over all k folds. Next, we run k-fold cross validation again for all of the models in the model pool, using their best configurations (see FindBestModel in Figure 5.7). Finally, the model with the lowest error is selected for our interference aware scheduler. The workflow for our interference analysis is summarized in Figure 5.7.  [Page 69], To train the neural network model, we use the metrics generated from the NVIDIA command line profiler [71]. Two metrics sets, FeatAll and Feat9, are included in our analysis so that the benefits of using the reduced feature sets for interference-aware scheduler can be evaluated. GPU workloads from six popular open source benchmark suites are studied (described in Chapter 5.6.3). For each workload, we run on real (versus simulated) GPU hardware to obtain a ground truth execution..  [Page 53], Figure 4.13: Modeling concurrent kernel execution of vector addition, matrix multiplication and pathfinder using Moka. Prediction results on six different combinations of launch order are illustrated.)

As per claim 12, rejection of claim 8 is incorporated:
Leiming teaches determining that the interference model comprises a minimum error of a plurality of errors for the plurality of interference models; and selecting the interference model to process the workload. ([Page 65-66], Eight popular classification models are introduced to the model pool Decision Trees, KNearest Neighbor, Support Vector Machines, Random Forest, Neural Networks, Adaboost, Gaussian Naive Bayes and Quadratic Discriminant Analysis [1]. Greedy search is applied to find the best configuration for each model (see FindBestModelParam in Figure 5.7). To identify the candidate, we choose the model that generates the lowest average error over all k folds. Next, we run k-fold cross validation again for all of the models in the model pool, using their best configurations (see FindBestModel in Figure 5.7). Finally, the model with the lowest error is selected for our interferenceaware scheduler. The workflow for our interference analysis is summarized in Figure 5.7.  [Page 70], Figure 5.10: Relative performance comparing a GTX 760 and a GTX 950 for six Rodinia GPU benchmarks. The GTX 950 performance is used as the baseline, which is shown as the 1x speedup.)

As per claim 13, rejection of claim 12 is incorporated:
Leiming teaches wherein the minimum error is a minimum average error, a minimum median error, or a minimum mode error. ([Page 65-66], Eight popular classification models are introduced to the model pool Decision Trees, KNearest Neighbor, Support Vector Machines, Random Forest, Neural Networks, Adaboost, Gaussian Naive Bayes and Quadratic Discriminant Analysis [1]. Greedy search is applied to find the best configuration for each model (see FindBestModelParam in Figure 5.7). To identify the candidate, we choose the model that generates the lowest average error over all k folds. Next, we run k-fold cross validation again for all of the models in the model pool, using their best configurations (see FindBestModel in Figure 5.7). Finally, the model with the lowest error is selected for our interferenceaware scheduler. The workflow for our interference analysis is summarized in Figure 5.7.  [Page 70], Figure 5.10: Relative performance comparing a GTX 760 and a GTX 950 for six Rodinia GPU benchmarks. The GTX 950 performance is used as the baseline, which is shown as the 1x speedup.)

As per claims 15-20.  These are non-transitory computer-readable medium claims corresponding to the system claims 1-5 and 7.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DONG U KIM whose telephone number is (571)270-1313. The examiner can normally be reached 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on 5712723652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DONG U KIM/Primary Examiner, Art Unit 2196