Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 1/13/2021 has been entered.
Response to Amendment
Applicant's amendments and remarks submitted 1/13/2021 have been entered and considered, but are not found convincing. Claims 1, 7, 11 and 17 have been amended. Claim 21 has been added.  Claim 8 has been canceled.  In summary, claims 1-7, 9- 21 are pending in the application.  
Response to Arguments
Rejections under 35 U.S.C. 103:

 Applicant’s arguments with respect to independent claim 11 have been considered but are moot because the rejection have been modified to address the newly added limitations. The examiner now relies on new reference Pusukuri

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
1.	Claims 1, 3-6 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)
Regarding independent claim 1, Choi teaches a system, comprising:  a plurality of central processing units (CPUs), each of plurality of CPUs having a plurality of CPU cores; a plurality of graphical processing units (GPUs); one or more hardware processors; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors (see section Introduction , first paragraph, “Improving the performance of computing systems by increasing the throughput of the CPU (Central Processing Unit) is restricted by the limits such as transistor scaling and temperature constraints [1]. For this reason, solutions to reduce the workload of the CPU while improving the performance of computing systems have been explored. One solution to improve the performance of computing systems, when the CPU performance is saturated, is utilizing the GPU (Graphics Processing Unit) which is a highly specialized processor designed for graphics processing. In recent computing systems, the GPU reduces the workload of the CPU by processing complicated graphics-related computations instead of the CPU [2]. Moreover, recent GPUs canprocess general-purpose applications as well as graphics-related applications with the help of integrated development environments such as CUDA, OpenCL, Cg, HLSL, and OpenGL [3–5].”; see section 4.1 Experimental methods, page 895 , first paragraph “All experiments were performed under Fedora v.10. Our simulation environment was composed of an Intel 2.66 GHz Core2Quad Q9400 CPU with 2 GB RAM including 3 KB cache per one core and an NVIDIA Geforce 8500GT GPU providing 43.2 GFlops throughput per one shader core with 16 shader cores.”), causes the system to perform operations comprising: 
receiving a request to process one or more applications (see section 2.2 Related scheduling schemes, page 889, first paragraph “ Several scheduling schemes for heterogeneous computing systems have been proposed. The scheduling schemes can be divided into two steps: application and device selection [10]. The application selection is the process for choosing the application to be executed. The device selection is the process for the determination of the device between the CPU and the GPU to execute a selected application. In this work, we apply the first-come, first-served ;
for each application in the one or more applications: 
determining a respective central processing unit (CPU) cost corresponding to processing of the application by the CPU and a respective graphical processing unit (GPU) cost corresponding to processing of the application by the GPU; and based on comparing the respective CPU cost and the respective GPU cost, determining whether to store the application in a first processing queue associated with the CPU or in a second processing queue associated with the plurality of GPUs (see section 3 Proposed scheduling scheme, page 892, last paragraph “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the   Choi is understood to be silent on store applications in queue and the remaining limitations of claim 1.
In the same field of endeavor, Gregg teaches a system, comprising: a plurality of central processing units (CPUs), each of plurality of CPUs having a plurality of CPU cores; a plurality of graphical processing units (GPUs); one or more hardware processors (see section 1 Introduction, page 1 first paragraph “Heterogeneous computing consists of applications running on a platform that has more than one computational unit with different architectures, such as a multi-core CPU and a many-core GPU. Using language frameworks such as OpenCL and software platforms such as Twin Peaks [6], applications running on the CPU launch kernels that can run on either the CPU or the GPU. Generally these kernels perform better on the GPU as they are optimized for a GPU’s highly parallel architecture and GPUs typically provide higher peak throughput. Therefore, applications preferentially schedule kernels on GPUs, leading to device contention and limiting overall throughput. In some cases, a better scheduling decision runs some kernels on the CPU, and even though they take longer than they would if run on the GPU, they still finish faster than if they were to wait for the GPU to be free. Furthermore, by utilizing all available processors for computational work, the total throughput of the system is increased over a static schedule that runs each kernel on the fastest device.”; section 2. Problem Definition, page 3, fourth paragraph “One assumption that we make in order to implement our scheduling ; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: (section 5.1 Workload and test environment, page 7, second paragraph “Our test environment was comprised of a 6-core, 3.7GHz AMD Phenom II 1090T CPU with 4GB of main memory and an AMD 5870 GPU with 2GB of memor. All tests were run under Ubuntu 10.04.”)
receiving a request to process one or more applications (section 2. Problem Definition, page 3, second paragraph, “Typically, when using a language framework such as OpenCL or CUDA, an application that wishes to run a kernel on a heterogeneous platform queries the system to determine which devices are available, and it preferentially chooses the device that will run the kernel the fastest. In most cases, this is a GPU, and the kernel is optimized to run on this device. Applications therefore tend to all choose the same device, and if a number of applications attempt to launch kernels concurrently, this leads to contention on a device. Furthermore, this type of scheduling ignores devices on the system that can potentially run the kernels and finish them before they would be finished if they were launched on the faster device in queue-order. We propose that instead of letting applications determine where kernels should be launched, a scheduler instead determines the best device at a given time for each kernel by analyzing predicted runtimes of the applications. This scheduler has historical runtime information about the other applications in the queue, and knows which kernels, if any, are currently running”);  
for each application in the one or more applications: 
determining a respective central processing unit (CPU) cost corresponding to processing of the application by the CPU and a respective graphical processing unit (GPU) cost corresponding to processing of the application by the GPU (see section 5.1 Workload and test environment, page 7 as shown in table 2 “In order to test our algorithm, we used sixteen OpenCL benchmark applications with one kernel each, and ran the set of applications sequentially, for a total of 16 kernel launches. The applications we used in our experiments represent a number of computational algorithms that are commonly used in scientific computing. Table 2 shows the applications and the absolute and relative runtimes for the data sets that we tested with. As expected, most applications had kernels that ran faster on the GPU, and therefore the entire applications ran faster on the GPU. In order to demonstrate the scheduler when some application were faster assigned to the CPU, we set the data size small enough for three applications such that this was the case (Binary Search, FFT, and Prefix Sum)”); and 
store the application in a first processing queue associated with the CPU or in a second processing queue associated with the plurality of GPUs (see section 4 The Scheduling Algorithm, “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi with storing applications to either GPU queue or CPU queue as seen in Gregg  because this modification would schedule applications for a device in queue order (4.1 Overview of the Algorithm, last paragraph of Gregg). Both Choi and Gregg are understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Weber teaches grouping a first set of applications assigned to the first processing queue into one or more CPU task groups according to CPU grouping criteria; causing the CPU to process the one or more CPU task groups ( ¶0043 “….Continuing with the example of FIG. 2 specifically, the storage controller 108.a is illustrated in FIG. 3 with four cores 204.a through 204.d. Each core 204.a through 204.d has one or more task core groups assigned to them. The task core groups may include different tasks that have been classified together. For example, tasks of an application, such as application 214 of FIG. 2, may be deemed to be related to each other in the sense that the different tasks perform similar functionality to each other and/or can access common code or data structures”; ¶0047 “As the examples above demonstrate, more than one application may have tasks distributed to one or more cores. As a result of the grouping of potentially related tasks together (from either one application 214 or multiple applications 214) into task core groups that are assigned to specific cores, data structures (associated with tasks in a given task core group) receive additional protection by remaining accessible only by a specific core.”; ¶0050 as shown in Fig.4 “The application wrapper 404 may operate between the operating system 402 and the application tasks 406 of one or more applications. The application wrapper 404 may provide a layer between the operating system 402 and the application tasks 406 that operates to map task core group assignments to physical cores, such as to implement the assignments shown in FIG. 3 and discussed above. For example, one or more cores may execute an application that determines the assignments of task core groups to different cores maintained with the application tasks. The application wrapper 404 may keep track of the different task core groups. Whenever a task core group change may become desired for a given application task 406, e.g. to enable access to a data structure under the control of another task core group (either at the same core or a different core), the application wrapper 404 may receive the request to change task core groups, determine whether the request includes reassignment to a different core or not, and temporarily reassign to a different task core group (and core, where applicable) according to the core guard procedure that will be described with respect to the other figures below”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi with storing applications to either GPU queue or CPU queue of Gregg  with grouping of potentially related tasks together (from either one application  or multiple applications) into task core ¶0047 of Weber). Choi, Gregg and Weber are understood to be silent on the remaining limitations of claim 1.
In the same field of endeavor, Deng teaches based on comparing the respective CPU cost and the respective GPU cost, determining whether to store the application in a first processing queue associated with the CPU or in a second processing queue associated with the plurality of GPUs (¶0083 “Optionally, in another embodiment, if a radio of the first duration to the second duration is less than a first preset threshold, the first subtask is allocated to the CPU task group; if a radio of the first duration to the second duration is greater than a second preset threshold, the first subtask is allocated to the GPU task group”), wherein the application is associated with an elapsed time indication that indicates an amount of time the application in the second processing queue (¶0095] 205. Record execution log information of the first subtask into a performance database, where the execution log information includes a data volume of the first subtask, required waiting duration before the first subtask is executed, and a running platform and running duration of the first subtask, and the running platform is the CPU or the GPU); grouping a second set of applications assigned to the second processing queue into one or more GPU batches according to GPU batching criteria  (¶0080-0081 “Specifically, the process of classifying the first subtask may be determined based on two factors: a static characteristic and a cost of executing the first subtask, and the former has a higher priority. The indication information is used to indicate whether the subtask is executed by the CPU or is executed by the GPU. Optionally, in another If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter operation subtask, the user specifies that this type of operation is scheduled to the CPU for execution, and the specification may be implemented using @CPU in code. In this case, the subtask is classified into the CPU task group.” Where grouping is considered as a batch.  Thus, subtasks that have the same type of operation that are scheduled to the GPU for execution ); and causing the plurality of GPUs to process the one or more GPU batches (¶0036 “ It should be understood that only one CPU 11 and one GPU 12 are shown in FIG. 1. For the distributed heterogeneous system, there may be multiple CPUs 11 and GPUs 12. The at least one working node 13 shares a CPU resource and a GPU resource included in the distributed heterogeneous system. The subtask may be allocated to the working node 13, and then the subtask is scheduled to the CPU 11 or the GPU 12 corresponding to the working node 13 for execution. Different working nodes may be corresponding to a same CPU 11 or GPU 12, or may be corresponding to different CPUs 11 or GPUs 12, and this embodiment of the present disclosure is not limited thereto.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg 
In the same field of endeavor, Pohl teaches wherein the application is associated with an elapsed time indication that indicates an amount of time the application is stored in the second processing queue(col.6, lines 7-10 “ For example, although techniques of the present disclosure are described using threads, the techniques of this disclosure are similarly applicable to processes or individual tasks.”; col.6, lines 54-63 “For instance, in some examples, all threads may record insertion times into and removal times from a queue. A monitored thread may be a thread that is sensitive to response times such adverse system performance may occur if the execution of the monitored thread is delayed. When a thread is classified as a monitored thread, scheduler 24 may, for example, precisely measure the amount of time that the monitored thread is stored on a run queue before being selected for execution by the processor” where tasks/processes/threads is considered as application); determining whether the elapsed time indication of the application exceeds a predetermined time threshold (col.6, lines 63-67-col.7, lines 1-2 “In such examples, the kernel may initially record an insertion time when the monitored thread is inserted into the run queue. When the kernel later removes the monitored thread from the run queue to execute the thread, the kernel may determine a removal time. If the ; grouping a second set of applications assigned to the second processing queue into one or more batches when the elapsed time indication of the application is determined to exceed the predetermined time threshold (col.7, lines 5-17 “In other examples, the event may cause operating system 22 to re-prioritize a monitored thread that has been stored on the run queue if the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold. For instance, a group of four threads may each be assigned a priority level. One of the threads in the group of four threads may be a monitored thread such that the monitored thread's time stored on the run queue is measured by the kernel. If the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold, scheduler 24 may re-schedule the monitored thread such that monitored thread is prioritized ahead of all other threads of the same priority level” where reprioritize a monitored thread in a group of four threads when the amount of time the monitored thread is stored on the queue is greater than a specific threshold is considered as grouping a second set of applications assigned to the second processing queue into one or more batches when the elapsed time indication of the application is determined to exceed the predetermined time threshold)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core 
BERGSMA teaches wherein a first total processing cost of a first task group and a second total processing cost of a second task group,  a predetermined threshold range (¶0106 “For example, the increment may be one (1), the first (or current best) number of groups may be set to one (1), and the second (or proposed increased) number of groups may be set to two (2) (step 702). The total allocation area obtained using one task duration group and the total allocation area obtained using two task duration groups are then computed and the difference between the two total allocation areas is calculated to determine the relative improvement in the cost function (steps 704 and 706). The relative improvement in the cost function is then compared to a predetermined threshold (step 708). If it is determined at step 710 that the relative improvement in the cost function is within the threshold, it can be concluded that the optimum number of groups is one and this number is used to determine the partition.” where total allocation area obtained using one task duration group is considered as total processing cost of task group)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison 
In the same field of endeavor, Tsafrir teaches wherein the CPU grouping criteria includes a difference between a first total processing cost of a first task and a second total processing cost of a second task being within a predetermined threshold range (¶0101 “Job Similarity: Two (or more) jobs can be characterized as "similar", if one or more of their attributes are similar. For example, two jobs submitted by the same user may be judged similar. The similarity criterion may be more elaborated: e.g. two jobs with the same user, and same number of required processors (this is denoted as the "size" of the job), and the same runtime estimate, are judged similar. Various transformations on the job's attributes can be applied when determining if jobs are similar, e.g. if the difference between the size of the two jobs is smaller than, say, 10, they are judged similar. In general, any method that (at least partly) uses the jobs' attributes to determine similarity falls under this definition.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA with determining if the difference between number required processor (size) of jobs within a predetermined value as seen in Tsafrir  because this modification would judge jobs similar(¶0101 of Tsafrir) 
Thus, the combination of  Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir teaches a system, comprising: a plurality of central processing units (CPUs), each of the plurality of CPUs having a plurality of CPU cores; a plurality of graphical processing units (GPUs); one or more hardware processors; and a memory storing computer-executable instructions, that in response to execution by the one or more hardware processors, causes the system to perform operations comprising: receiving a request to process one or more applications; for each application in the one or more applications: determining a respective central processing unit (CPU) cost corresponding to processing of an application by the CPU and a respective graphical processing unit (GPU) cost corresponding to processing of the application by the GPU; and based on comparing the respective CPU cost and the respective GPU cost, determining whether to store the application in a first processing queue associated with the CPU or in a second processing queue associated with the plurality of GPUs, wherein the application is associated with an elapsed time indication that indicates an amount of time the application is stored in the second processing queue; grouping a first set of applications assigned to the first processing queue into one or more CPU task groups according to CPU grouping criteria, wherein the CPU grouping criteria includes a difference between a first total processing cost of a first task group and a second total processing cost of a second task group being within a predetermined threshold range; causing the CPU to process the one or more CPU task groups; Page 2 of 13Appl. No.: 16/236,546determining whether the elapsed time indication of the application exceeds a predetermined time threshold; grouping a second set of applications assigned to the second processing queue into one or more GPU batches according to GPU batching criteria when the elapsed time indication of the application is determined to exceed the predetermined time threshold; and causing the plurality of GPUs to process the one or more GPU batches.
Regarding claim 3, Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir teach the system of claim 1, wherein the determining whether to store the application in the first processing queue or in the second processing queue further comprises: determining a first CPU cost and a first GPU cost associated with a first application of the one or more applications; and in response to determining that the first CPU cost is less than the first GPU cost, storing the first application in the first processing queue (see section 3 Proposed scheduling scheme, page 892, last paragraph of Choi “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is assigned to the CPU in order to reduce the completion time.”; see section 4 The Scheduling Algorithm of Gregg, “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 1.
Regarding claim 4, Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir teach the system of claim 1, wherein the determining whether to store the application in the first processing queue or in the second processing queue further comprises: determining a first CPU cost and a first GPU cost associated with a first application of the one or more applications; and in response to determining that the first GPU cost is less than the first CPU cost, storing the first application in the second processing queue (see section 3 Proposed scheduling scheme, page 892, last paragraph of Choi “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is assigned to the CPU in order to reduce the completion time.”; see section 4 The Scheduling Algorithm, “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm of Gregg, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 1.
Regarding claim 5, Choi, Gregg, Weber, Deng and Pohl teach the system of claim 1, wherein the first processing queue includes applications that were previously assigned to the first processing queue based on  one or more previous requests (see section  3 Proposed scheduling scheme , pages 891, 892 of Choi“ As mentioned  page 892, last paragraph of Choi “ By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’” where based on compare between time execution of CPU and time execution of GPU of application according information of previous execute applications to assign application to GPU or CPU which is considered as applications that were In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 1.
Regarding claim 6, Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir teach the system of claim 1, wherein the second processing queue includes applications that were previously assigned to the second processing queue based on one or more previous requests (see section  3 Proposed scheduling scheme , pages 891, 892 of Choi“ As mentioned above, in heterogeneous computing systems, the selection between the CPU and the GPU for an incoming application is a very important factor in determining the system performance [14]. The objective of the proposed scheduling, called EET (Estimated-Execution-Time) scheduling, is to select the device which can complete the incoming application more quickly by considering both the execution history for incoming applications and the remaining time for currently executed applications. To enable the efficient selection between the CPU and the GPU for an incoming  page 892, last paragraph of Choi “ By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’” where based on compare between time execution of CPU and time execution of GPU of application according information of previous execute applications to assign application to GPU or CPU which is considered as applications that were previously assigned to the second processing based one or more previous requests; see section 4 The Scheduling Algorithm of Gregg“…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 1.
2.	Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”) further in view of Aguilar et al., U.S Patent Application Publication No. 20070300231 (“Aguilar”)
Regarding claim 2, Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir teach the system of claim 1, wherein the respective CPU cost corresponds to a first amount of time to process the application by the plurality of CPUs, and the respective GPU cost corresponds to a second amount of time to process the application by the plurality of GPUs (see section 3 Proposed scheduling scheme, page 892, last paragraph of Choi “By using the Estimated-The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is assigned to the CPU in order to reduce the completion time.”; section 2. Problem Definition, page 3, fourth paragraph “ One assumption that we make in order to implement our scheduling algorithm is that kernels can be run on more than one device in a system. Our implementation utilizes OpenCL, which supports running the same kernel across both CPUs and GPUs. Kernels can be compiled for available devices prior to or at runtime, and when a kernel is launched onto a specific device, the OpenCL runtime uses the correct binary for that device. Not all frameworks support running kernels on different devices, however our scheme allows for implementations that have separate versions of a kernel for each available device (e.g., one written in CUDA for although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.” Where multiple CPUs and GPUs can be extended to algorithm), wherein the operations further comprise:  receiving information related to processing costs of the processing of the application (see section 4.2 as shown in Figure 1 of Gregg “Figure 1 shows the actual runtimes versus the predicted runtimes. The height of the left-side bars denote the average runtimes for the applications over 150 trials, and the error bars represent the minimum and maximum times. The right-side bars denote the predicted time after the scheduler has been trained. In our experiments, the predicted runtimes fell within the actual minimum and maximum runtimes, which was sufficient to use for the scheduling decision.”; ¶0095 of Deng “ 205. Record execution log information of the first subtask into a performance database, where the execution log information includes a data volume of the first subtask, required waiting duration before the first subtask and adjusting one or more of the respective CPU cost or the respective GPU cost based on the information related to the processing costs (¶0097 of Deng “207. Adjust, according to the first average duration and the second average duration, a quantity of subtasks in the CPU task group on the first working node, or a quantity of subtasks in the GPU task group on the first working node.”) In addition, the same motivation is used as the rejection for claim 1. Choi, Gregg, Weber, Deng, Pohl BERGSMA and Tsafrir are understood to be silent on the remaining limitations of claim 2.
In same field of endeavor, Aguilar teaches receiving information related to actual processing costs of the processing of the application; and adjusting one or more of the respective CPU cost or the respective GPU cost based on the information related to the actual processing costs (¶0007] It has been discovered that the aforementioned challenges are resolved using a system and method that gathers thread performance data using a performance monitor. The threads may be running on either a first processor that is based on a first instruction set architecture (ISA), or a second processor that is based on a second ISA. Multiple first processors and multiple second processors may be included in a single computer system. The first processors and second processors can each access data stored in a common shared memory. The gathered thread performance data is analyzed to determine whether the  ¶0009 In another embodiment, a common scheduler is used to schedule threads to both the first processors and the second processors. In this embodiment, the thread performance data is stored in the shared memory. The scheduler determines whether a particular processor is running below a predefined CPU utilization. If the processor is running below the predefined utilization, then the CPU time that the threads receive for the processor are adjusted as described above”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA with determining if the difference between number required processor (size) of jobs within a predetermined ¶0007 of Aguilar).
Thus, the combination of Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir and Aquilar teaches wherein the respective CPU cost corresponds to a first amount of time to process the application by the plurality of CPUs, and the respective GPU cost corresponds to a second amount of time to process the application by the plurality of GPUs, wherein the operations further comprise: receiving information related to actual processing costs of the processing of the application; and adjusting one or more of the respective CPU cost or the respective GPU cost based on the information related to the actual processing costs.
3.	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)  further in view of NAKANISHI, U.S Patent Application Publication No. 2010/0241683 (“NAKANISHI”)
the system of claim 1, wherein the grouping the first set of applications into the one or more CPU task groups according to the CPU grouping criteria further comprises: a number of threads included in the first processing queue(¶0041 of Weber “Each core 204.a, 204.b, 204.c, and 204.d may be configured to execute one or more application and/or system tasks (e.g., from the application 214 and the operating system 216, respectively). A task may include any unit of execution. Tasks may include, for example, threads, processes and/or applications executed by the storage controller 108.a.”) wherein a number of CPU tasks corresponding to the one or more CPU task groups ( ¶0123 of BERGSMA, Referring now to FIG. 10, in another embodiment, instead of selecting a different duration (i.e. width) for each of the K compute slots (as illustrated in FIG. 9), the smallest-possible allocation shape is determined at step 804 by selecting a duration for a group of slots, in accordance with the number of the task duration groups. For this purpose, step 804 comprises setting to one (1) the total number (m) of tasks so far assigned in the allocation plan and setting the group index (i) to one (1) (step 1002)).Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir are understood to be silent on the remaining limitations of claim 7.
In same field of endeavor, NAKANISHI teaches determining a number of threads included in the first processing queue, wherein a number of CPU tasks corresponding to the one or more CPU task groups is equal to the number of threads (¶0064  “ the parallel process, a task chain is generated in the system similar to the allocation above. Assuming that the number of threads performing the parallel process is #P, and there are #P or more subtrees, the nodes as the roots of the subtrees are connected to a task chain (subtree chain).”; ¶0243 “In FIG. 5, first in step S501, the number of tasks to be generated is equal to the number of #thread. The processes in steps S502 through S512 are independently performed for each thread.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA with determining if the difference between number required processor (size) of jobs within a predetermined value as seen in Tsafrir with generating number of tasks is equal number of thread as seen in NAKANISHI because this modification would achieve the expected benefits of providing a desired or suitable number of threads.
Thus, the combination of Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir and NAKANISHI teaches wherein the grouping the first set of applications into the one or more CPU task groups according to the CPU grouping criteria further comprises: determining a number of threads included in the first processing queue, wherein a number of CPU tasks corresponding to the one or more CPU task groups is equal to the number of threads.
The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)  further in view of Glover ,U.S Patent Application Publication No. 20160171589 (“Glover”)
Regarding claim 9, Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir teach the system of claim 1, wherein the grouping the second set of applications according to the GPU batching criteria further comprises: grouping the first applications into a first GPU batch of the one or more GPU batches; and grouping the second applications into a second GPU batch of the one or more GPU batches (¶0080-0081 “Specifically, the process of classifying the first subtask may be determined based on two factors: a static characteristic and a cost of executing the first subtask, and the former has a higher priority. The indication information is used to indicate whether the subtask is executed by the CPU or is executed by the GPU. Optionally, in another embodiment, the indication information may include the static characteristic. [0081] If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter operation subtask, the user specifies that this type of operation is scheduled to the CPU for execution, and the specification may be implemented using @CPU in code. In this case, the subtask is classified into the CPU task group.”) Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir are understood to be silent on the remaining limitations of claim 9.
In same field of endeavor, Glover teaches determining that first applications from the second set of applications are of a first application type; grouping the first applications; determining that second applications from the second set of applications are of a second application type that is different from the first application type; and grouping the second applications (¶0069 “The application group data 302 may include any number of application member identifiers in the list of applications 310. Application member identifiers identify applications that are members of a particular application group. The list of applications 310 may include names of applications or any identifying information that identify or describe applications that are members of a group. In some embodiments, the list of applications 310 may each be associated or assigned at least one common tag or category.”; ¶0070 “The application group data 302 may include group generation rules 312 that are instructions of criteria for inclusion of applications (e.g., application identifiers) in the application group (e.g. criteria to be met before an application may be identified in the list of applications 310). For example, the group generation rules 312 may dictate the types or categories of applications that may be included in the application group, such as games, social networking, or the like. In some embodiments, the group generation rules may provide instructions for performance conditions that are to be met before an application may be assigned to the application group.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA with determining if the difference between number required processor (size) of jobs within a predetermined value as seen in Tsafrir with grouping same type applications as seen Glover because this modification would group applications that include similar characteristics (0105 of Glover).
Thus, the combination of Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir and Glover teaches wherein the grouping the second set of applications according to the GPU batching criteria further comprises: determining that first applications from the second set of applications are of a first application type; grouping the first multiple applications into a first GPU batch of the one or more GPU batches; determining that second applications from the second set of applications are of a second application type that is different from the first application type; and grouping the second multiple applications into a second GPU batch of the one or more GPU batches.
5 	Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)  further in view of Glover ,U.S Patent Application Publication No. 20160171589 (“Glover”) further in view of Verner, Uri, Assaf Schuster, and Avi Mendelson. Processing Real-time Data Streams on GPU-based Systems. Diss. Computer Science Department, Technion, 2015.(“Verner”)
Regarding claim 10, Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir and Glover teach the system of claim 9, wherein the causing the plurality of GPUs to process the one or more GPU batches further comprises: causing the first GPU batch to be processed by a first GPU of the plurality of GPUs and the second GPU batch to be processed by a second GPU of the plurality of GPUs, the first GPU being different from the second GPU (¶0036 “ It should be understood that only one CPU 11 and one GPU 12 are shown in FIG. 1. For the distributed heterogeneous system, there may be multiple CPUs 11 and GPUs 12. The at least one working node 13 shares a CPU resource and a GPU resource included in the distributed heterogeneous system. The subtask may be allocated to the working node 13, and then the subtask is scheduled to the CPU 11 or the GPU 12 corresponding to the working node 13 for execution. Different working nodes may be corresponding to a same CPU 11 or GPU 12, or may be corresponding to different CPUs 11 or GPUs 12, and this embodiment of the present disclosure is not limited thereto.”) Choi, Gregg, Weber, Deng, Pohl, BERGSMA, Tsafrir and Glover are understood to be silent on the remaining limitations of claim 10
Verner teaches causing the first GPU batch to be processed by a first GPU of the plurality of GPUs and the second GPU batch to be processed by a second GPU of the plurality of GPUs, the first GPU being different from the second GPU (see section 5.2 Split Rectangle of Verner, “When mapping streams to SMs, the VIRTUAL GPU method does not attempt to prioritize the SMs of one GPU over those of another. As a result, the streams with deadlines that are equal to or close to dmin are distributed among all GPUs. Effectively, on every GPU, the minimum stream deadline is approximately dmin. Since the batch collection period is 1/ 4dmin, batch processing on the GPUs is synchronous…. In fact, it might be beneficial to use different batch collection periods for different GPUs. For example, suppose that a workload S with a minimum stream deadline dmin = 8ms is partitioned between GPU1 and GPU2. Using the VIRTUAL 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA with determining if the difference between number required processor (size) of jobs within a predetermined value as seen in Tsafrir with grouping same type applications of Glover with processing in batches of jobs with different GPUs as seen in Verner because this modification would minimize their overall execution time while guaranteeing that all batches complete on time (4.1 GPU Batch Scheduling, last paragraph of Verner).
wherein the causing the plurality of GPUs to process the one or more GPU batches further comprises: causing the first GPU batch to be processed by a first GPU of the plurality of GPUs and the second GPU batch to be processed by a second GPU of the plurality of GPUs, the first GPU being different from the second GPU.
6.	Claims 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of ABIEZZI et at., U.S Patent Application Publication No. 20140176583 (“Abiezzi”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of Pusukuri et al, U.S Patent Application Publication No. 20140208330 (“Pusukuri”)
Regarding independent claim 11, Choi teaches a method, comprising:
first data of a first application in central processing unit (CPU) processing queue based on a first CPU processing cost corresponding to a first application identifier of the first application being less than a first graphical processing unit (GPU) processing cost corresponding to the first application identifier; second data of a second application in a GPU processing queue based on a second GPU processing cost corresponding to a second application identifier of the second application being less than a second CPU processing cost corresponding to the second application identifier (see section 3 Proposed scheduling scheme, pages 891-892 “The proposed scheduling requires two history tables: one for the CPU and the other for the GPU. The history table is composed of six entries: TaskName, Size, Count, Sum, Average, and Lifetime. TaskName denotes the application name, and Size represents the size of input data…… The history table is indexed by using application name (TaskName) and the size of input data (Size)” where TaskName is considered as application identifier;  see section 3 Proposed scheduling scheme, page 892, last paragraph “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is  Choi is understood to be silent on the remaining limitations of claim 11.
In same field of endeavor, Gregg teaches storing, by a system comprising one or more hardware processors, first data of a first application in central processing unit (CPU) processing queue based on a first CPU processing cost; storing second data of a second application in a GPU processing queue based on a second GPU processing cost (see section 4 The Scheduling Algorithm of Gregg “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue);
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi with storing applications to either GPU queue or CPU queue as seen in 
In same field of endeavor, Deng teaches generating a batch that includes the second data and third data of a third application stored in the processing queue based on determining that the second application identifier matches a third application identifier associated with the third application (¶0080 “ Specifically, the process of classifying the first subtask may be determined based on two factors: a static characteristic and a cost of executing the first subtask, and the former has a higher priority. The indication information is used to indicate whether the subtask is executed by the CPU or is executed by the GPU. Optionally, in another embodiment, the indication information may include the static characteristic.”; ¶0081 “ If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter operation subtask, the user specifies that this type of operation is scheduled to the CPU for execution, and the specification may be implemented using @CPU in code. In this case, the subtask is classified into the CPU task group.” Where the static characteristic can be considered to be an application identifier, grouping is considered as a batch.  Thus, subtasks that have the same type of operation that are scheduled to the GPU for execution would include indications for the GPU, and thus if the second application identifier matches the third application identifier, then the second application and the third application are executed on the GPU); causing a first GPU of a plurality of GPUs to process the batch (¶0036 “ It should be understood that only one CPU 11 and one GPU 12 are shown in FIG. 1. For the distributed heterogeneous system, there may be multiple CPUs 11 and GPUs 12. The at least one working node 13 shares a CPU resource and a GPU resource included in the distributed heterogeneous system. The subtask may be allocated to the working node 13, and then the subtask is scheduled to the CPU 11 or the GPU 12 corresponding to the working node 13 for execution. Different working nodes may be corresponding to a same CPU 11 or GPU 12, or may be corresponding to different CPUs 11 or GPUs 12, and this embodiment of the present disclosure is not limited thereto.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  with same type of operation that are scheduling to GPU for execution as seen in Deng because this modification would indicate whether the subtask is executed by the CPU or is executed by the GPU (¶0080 of Deng) Choi, Gregg and Deng are understood to be silent on the remaining limitations of claim 11.
In same field of endeavor, Abiezzi teaches in response to determining an amount of time for the first GPU to process the batch, determining whether to adjust the second GPU processing cost corresponding to the second application identifier ( ¶0031 “In one example embodiment, a GBF may designate six different values that can be assigned to a virtual machine based upon different categorizations (e.g., computational profiles, classifications, and the 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng with determining an amount of time for GPU processing as seen Abiezzi because this modification would determine whether to adjust its GPU benefit factor (GBF) and the ranked ordering of the virtual machine adjusted accordingly (¶0014 of Abiezzi) Choi, Gregg, Deng and Abiezz are understood to be silent on the remaining limitations of claim 11.
In the same field of endeavor, Pohl teaches wherein the second data of the second application is associated with an elapsed time indication that indicates an amount of time the second data is stored in the processing queue (col.6, lines 7-10 “ For example, although techniques of the present disclosure are described using threads, the techniques of this disclosure are similarly applicable to processes or individual tasks.”; col.6, lines 54-63 “For instance, in some examples, all threads may record insertion times into and removal times from a queue. A monitored thread may be a thread that is sensitive to response times such adverse system performance may occur if the execution of the monitored thread is delayed. When a thread is classified as a monitored thread, scheduler 24 may, for example, precisely measure ; 
determining the elapsed time indication of the second data exceeds a predetermined time threshold (col.6, lines 63-67-col.7, lines 1-2 “In such examples, the kernel may initially record an insertion time when the monitored thread is inserted into the run queue. When the kernel later removes the monitored thread from the run queue to execute the thread, the kernel may determine a removal time. If the time that the monitored thread has been stored on the run queue is greater than or equal to a threshold value, operating system 22 may generate an event.”); 
wherein the batch that includes the second data in response to the determining the elapsed time indication of the second data exceeds the predetermined time threshold (col.7, lines 5-17 “In other examples, the event may cause operating system 22 to re-prioritize a monitored thread that has been stored on the run queue if the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold. For instance, a group of four threads may each be assigned a priority level. One of the threads in the group of four threads may be a monitored thread such that the monitored thread's time stored on the run queue is measured by the kernel. If the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold, scheduler 24 may re-schedule the monitored thread such that monitored thread is prioritized ahead of all other threads of the same priority level” where reprioritize a monitored thread in a group of four threads in response to the amount of time the monitored thread is stored 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng and determining an amount of time for GPU processing of Abiezzi with measuring amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl because this modification would re-prioritize a monitored thread that has been stored on the run queue (col.7, lines 5-9 of Pohl). Choi, Gregg, Deng, Abiezz and Pohl are understood to be silent on the remaining limitations of claim 11.
In the same field of endeavor, Pusukuri teaches determining the elapsed time indication of the second data exceeds a predetermined time threshold (¶0026] In one or more embodiments of the invention, the lock overhead time value (206) is a value indicating the percent of execution time a thread spends waiting for locks on resources”; ¶0030 “If in Step 312, the scheduler determines that the lock overhead time values do not exceed the threshold, then in Step 314, the scheduler waits while the lock overhead time values are refreshed by the thread monitor. If in Step 312, the scheduler determines that the lock overhead time values do exceed the threshold, then in Step 316, the scheduler creates a number of thread group” where lock overhead time value is considered as elapsed time indication ); 
generating a batch that includes the second data and third data, wherein the batch that includes the second data is generated in response to the determining the elapsed time indication of the second data exceeds the predetermined time threshold (¶0030 “If in Step 312, the scheduler determines that the lock overhead time values do not exceed the threshold, then in Step 314, the scheduler waits while the lock overhead time values are refreshed by the thread monitor. If in Step 312, the scheduler determines that the lock overhead time values do exceed the threshold, then in Step 316, the scheduler creates a number of thread groups. In one or more embodiments of the invention, the thread groups are created using the lock overhead time values of each thread as stored in the corresponding thread data item. In one or more embodiments of the invention, threads with similar lock overhead time values are placed together in thread groups”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng and determining an amount of time for GPU processing of Abiezzi and measuring amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl with generating a number of thread group when the elapsed time  indication exceed the threshold as seen in Pusukuri because this modification would create thread groups using the lock overhead time values of each thread as stored in the corresponding thread data item (¶0030 of Pusukuri)
a method, comprising: storing, by a system comprising one or more hardware processors, first data of a first application in a central processing unit (CPU) processing queue based on a first CPU processing cost corresponding to a first application identifier of the first application being less than a first graphical processing unit (GPU) processing cost corresponding to the first application identifier; storing second data of a second application in a GPU processing queue based on a second GPU processing cost corresponding to a second application identifier of the second application being less than a second CPU processing cost corresponding to the second application identifier, wherein the second data of the second application is associated with an elapsed time indication that indicates an amount of time the second data is stored in the GPU processing queue; determining the elapsed time indication of the second data exceeds a predetermined time threshold; generating a batch that includes the second data and third data of a third application stored in the GPU processing queue based on determining that the second application identifier matches a third application identifier associated with the third application, wherein the batch that includes the second data is generated in response to the determining the elapsed time indication of the second data exceeds the predetermined time threshold; causing a first GPU of a plurality of GPUs to process the batch; and Page 5 of 13Appl. No.: 16/236,546in response to determining an amount of time for the first GPU to process the batch, determining whether to adjust the second GPU processing cost corresponding to the second application identifier.
the method of claim 11, wherein the second GPU processing cost includes a previously determined amount of time taken for at least one of the plurality of GPUs to process data of an application associated with the second application identifier (see section 3 Proposed scheduling scheme, pages 891-892 of Choi “The proposed scheduling requires two history tables: one for the CPU and the other for the GPU. The history table is composed of six entries: TaskName, Size, Count, Sum, Average, and Lifetime. TaskName denotes the application name, and Size represents the size of input data…… The history table is indexed by using application name (TaskName) and the size of input data (Size)” where TaskName is considered as application identifier; see section  3 Proposed scheduling scheme , pages 891, 892 of Choi“ As mentioned above, in heterogeneous computing systems, the selection between the CPU and the GPU for an incoming application is a very important factor in determining the system performance [14]. The objective of the proposed scheduling, called EET (Estimated-Execution-Time) scheduling, is to select the device which can complete the incoming application more quickly by considering both the execution history for incoming applications and the remaining time for currently executed applications. To enable the efficient selection between the CPU and the GPU for an incoming application, the proposed EET scheduling requires history table containing execution time history and remaining time table containing estimated remaining time. The execution time history is the information of previously executed applications, and the estimated remaining time is the information for currently executed applications…. The remaining time table has two entries: one entry for the CPU and the other entry for the GPU. When an application is assigned to the CPU or the GPU, the  page 892, last paragraph of Choi “ By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’” where based on compare between time execution of CPU and time execution of GPU of application according information of previous execute applications to assign application to GPU or CPU which is considered wherein the second GPU processing cost includes a previously determined amount of time taken for at least one of the plurality of GPUs to process data of an application associated with the second application identifier; see section 4 The Scheduling Algorithm of Gregg“…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices) In addition, the same motivation is used as the rejection for claim 11.
Regarding claim 13, Choi, Gregg, Deng ,Abiezz , Pohl and Pusukuri teach the method of claim 12, wherein the determining whether to adjust the second GPU processing cost further comprises: determining whether the amount of time for the first GPU to process the batch is different from the previously determined amount of time (see section 3 Proposed scheduling scheme, pages 891-892 of Choi “The proposed scheduling requires two history tables: one for the CPU and the other for the GPU. The history table is composed of six entries: TaskName, Size, Count, Sum, Average, and Lifetime. TaskName denotes the application name, and Size represents the size of input data…… The history table is indexed by using application name (TaskName) and the size of input data (Size)” where TaskName is considered as application identifier; see section  3 Proposed scheduling scheme , pages 891, 892 of Choi“ As mentioned above, in heterogeneous computing systems, the selection between the CPU and the GPU for an incoming application is a very important factor in determining the system performance [14]. The objective of the proposed scheduling, called EET (Estimated-Execution-Time) scheduling, is to select the device which can complete the incoming application more quickly by considering both the execution history for incoming applications and the remaining time for currently executed applications. To enable the efficient selection between the CPU and the GPU for an incoming application, the proposed EET scheduling requires history table containing execution time history and remaining time table containing estimated remaining time. The execution time history is the information of previously executed applications, and the estimated remaining time is the information for currently executed applications…. The remaining time table has two entries: one entry for the CPU and the other entry for the GPU. When an application is assigned to the CPU or the GPU, the corresponding entry of the remaining time table is updated by using the information from the history table. After that, the remaining time value in each entry is decreased by one every second to keep track of the remaining execution time  page 892, last paragraph of Choi “ By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’” where based on compare between time execution of CPU and time execution of GPU of application according information of previous execute applications to assign application to GPU or CPU which is considered wherein the second GPU processing cost includes a previously determined amount of time taken for at least one of the plurality of GPUs to process data of an application associated with the second application identifier;  ¶0081 of Deng “ If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter operation subtask, the user specifies that this type of operation is scheduled to the CPU for execution, and the specification may be implemented using @CPU in code. In this case, the subtask is classified into the CPU task group.” Where grouping is considered as a batch ¶0041 of Abiezz “In block 307, the DGAS determines the runtime profiles of the VMs on the GPU allocation list, re-ranks them, and marks any VMs that can be potentially unseated (when a better contender becomes available) because their runtime profiles indicate that they are not really benefiting from GPU acceleration and not benefiting in an amount that is beyond a tolerable threshold (below or above depending upon how the measurement is performed). In general, . 
7.	Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of ABIEZZI et at., U.S Patent Application Publication No. 20140176583 (“Abiezzi”) further in view of  Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of Pusukuri et al, U.S Patent Application Publication No. 20140208330 (“Pusukuri”) further in view of Aguilar et al., U.S Patent Application Publication No. 20070300231 (“Aguilar”)
Regarding claim 14, Choi, Gregg, Deng ,Abiezz , Pohl and Pusukuri teach the method of claim 11, further comprising: causing a CPU to process the first data; and in response to determining a second amount of time for the CPU to process the first data, adjusting the first CPU processing cost corresponding to the first application identifier (see section 3 Proposed scheduling scheme, pages 891-892 of Choi “The proposed scheduling requires two The history table is indexed by using application name (TaskName) and the size of input data (Size)” where TaskName is considered as application identifier; see section  3 Proposed scheduling scheme , pages 891, 892 of Choi“ As mentioned above, in heterogeneous computing systems, the selection between the CPU and the GPU for an incoming application is a very important factor in determining the system performance [14]. The objective of the proposed scheduling, called EET (Estimated-Execution-Time) scheduling, is to select the device which can complete the incoming application more quickly by considering both the execution history for incoming applications and the remaining time for currently executed applications. To enable the efficient selection between the CPU and the GPU for an incoming application, the proposed EET scheduling requires history table containing execution time history and remaining time table containing estimated remaining time. The execution time history is the information of previously executed applications, and the estimated remaining time is the information for currently executed applications…. The remaining time table has two entries: one entry for the CPU and the other entry for the GPU. When an application is assigned to the CPU or the GPU, the corresponding entry of the remaining time table is updated by using the information from the history table. After that, the remaining time value in each entry is decreased by one every second to keep track of the remaining execution time information of the CPU and the GPU.” where history table contains the information of previous execute applications; see section 3 Proposed scheduling scheme, page 892, third paragraph of Choi “ The pseudo code for the 
In same field of endeavor, Aguilar teaches causing a CPU to process the first data; and in response to determining a second amount of time for the CPU to process the first data, adjusting the first CPU processing cost (¶0007] It has been discovered that the aforementioned challenges are resolved using a system and method that gathers thread performance data using a performance monitor. The threads may be running on either a first processor that is based on a first instruction set architecture (ISA), or a second processor that is based on a second ISA. Multiple first processors and multiple second processors may be included in a single computer system. The first processors and second processors can each access data stored in a common shared memory. The gathered thread performance data is analyzed to determine whether the corresponding thread needs additional CPU time in order to optimize system performance. If additional CPU time is needed, the amount of CPU time that the thread receives is altered (increased) so that the thread receives the additional time when it is scheduled by the scheduler. In one embodiment, the increased CPU  ¶0009 In another embodiment, a common scheduler is used to schedule threads to both the first processors and the second processors. In this embodiment, the thread performance data is stored in the shared memory. The scheduler determines whether a particular processor is running below a predefined CPU utilization. If the processor is running below the predefined utilization, then the CPU time that the threads receive for the processor are adjusted as described above”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng and determining an amount of time for GPU processing of Abiezzi and measuring amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl and generating a number of thread group when the elapsed time  indication exceed the threshold of Pusukuri with adjusting CPU time as seen Aguilar because this modification would optimize system performance (¶0007 of Aguilar).
Thus, the combination of Choi, Gregg, Deng,  Abiezz, Pohl , Pusukuri and  Aguilar teaches further comprising: causing a CPU to process the first data; and in response to determining a second amount of time for the CPU to process the first data, adjusting the first CPU processing cost corresponding to the first application identifier.
the method of claim 14, wherein the causing the CPU to process the first data further comprises: grouping the first data with other data associated with other applications included in the CPU processing queue based at least in part on the first CPU processing cost (see section 4 The Scheduling Algorithm of Gregg “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm of Gregg “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue  ¶0080  of Deng “ Specifically, the process of classifying the first subtask may be determined based on two factors: a static characteristic and a cost of executing the first subtask, and the former has a higher priority. The indication information is used to indicate whether the subtask is executed by the CPU or is executed by the GPU. Optionally, in another embodiment, the indication information may include the static characteristic.”; ¶0081of Deng “ If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter .
8.	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of ABIEZZI et at., U.S Patent Application Publication No. 20140176583 (“Abiezzi”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”)  further in view of Pusukuri et al, U.S Patent Application Publication No. 20140208330 (“Pusukuri”) further in view of Song et al., U.S Patent Application Publication No. 20190318245 (“Song”)
Regarding claim 16, Choi, Gregg, Deng,  Abiezz, Pohl , Pusukuri teach the method of claim 1 1, Choi, Gregg, Deng,  Abiezz, Pohl , Pusukuri are understood to be silent on the remaining limitations of claim 16.
In the same field of endeavor, Song teaches wherein the first application corresponds to a first neural network model, and the second application corresponds to a second neural network model (¶0129] “In a first In other words, the plurality of neural network models with different degrees of cognitive accuracy are pre-stored on the terminal-side device. For example, when the terminal-side device needs to process the cognitive computing task in an application scenario A, the terminal-side device selects a neural network model with cognitive accuracy corresponding to the application scenario A to process the cognitive computing task; when the terminal-side device needs to process the cognitive computing task in an application scenario B, the terminal-side device selects a neural network model with cognitive accuracy corresponding to the application scenario B to process the cognitive computing task..)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng and determining an amount of time for GPU processing of Abiezzi and measuring amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl and generating a number of thread group when the elapsed time  indication exceed the threshold of Pusukuri with selecting a neural network model with cognitive accuracy corresponding 
9.	Claims 17, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Zhao et al.,  U.S Patent Application Publication No. 20190121664 (“Zhao”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)
Regarding independent claim 17, Choi teaches a non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a payment provider system to perform operations comprising: (see section 4.1 Experimental methods, first paragraph “All experiments were performed under Fedora v.10. Our simulation environment was composed of an Intel 2.66 GHz Core2Quad Q9400 CPU with 2 GB RAM including 3 KB cache per one core and an NVIDIA Geforce 8500GT GPU providing 43.2 GFlops throughput per one shader core with 16 shader cores.”)
receiving instructions to process a first application (see section Related scheduling schemes, page 889, first paragraph “Several scheduling schemes for heterogeneous computing systems have been proposed. The scheduling schemes can be divided into two steps: application and device selection [10]. The application selection is the process for choosing the application to be executed”); 
determining whether to store the first application in a first processing queue or a second processing queue based on a comparison between a central processing unit (CPU) processing cost associated with the first application and a graphical processing unit (GPU) processing cost associated with the first application (see section 3 Proposed scheduling scheme, page 892, last paragraph “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third 
In the same field of endeavor, Gregg teaches receiving instructions to process a first application (section 2. Problem Definition, page 3, second paragraph, “Typically, when using a language framework such as OpenCL or CUDA, an application that wishes to run a kernel on a heterogeneous platform queries the system to determine which devices are available, and it preferentially chooses the device that will run the kernel the fastest. In most cases, this is a GPU, and the kernel is optimized to run on this device. Applications therefore tend to all choose the same device, and if a number of applications attempt to launch kernels concurrently, this leads to contention on a device. Furthermore, this type of scheduling ignores devices on the system that can potentially run the kernels and finish them before they would be finished if they were launched on the faster device in queue-order. We propose that instead of letting applications determine where kernels should be launched, a scheduler instead determines the best device at a given time for each kernel by analyzing predicted runtimes of the applications. This scheduler has historical runtime information about the other applications in the queue, and knows which kernels, if any, are currently running”); 
determining whether to store the first application in a first processing queue or a second processing queue based on a comparison between a central processing unit (CPU) processing cost associated with the first application and a graphical processing unit (GPU) processing cost associated with the first application (see In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 1. Choi, Gregg are understood to be silent on the remaining limitations of claim 17.

grouping a first set of applications stored in the first processing queue according to CPU grouping criteria; and causing a CPU to process the grouped first set of applications (¶0047 “As the examples above demonstrate, more than one application may have tasks distributed to one or more cores. As a result of the grouping of potentially related tasks together (from either one application 214 or multiple applications 214) into task core groups that are assigned to specific cores, data structures (associated with tasks in a given task core group) receive additional protection by remaining accessible only by a specific core.”); ¶0050 as shown in Fig.4 “The application wrapper 404 may operate between the operating system 402 and the application tasks 406 of one or more applications. The application wrapper 404 may provide a layer between the operating system 402 and the application tasks 406 that operates to map task core group assignments to physical cores, such as to implement the assignments shown in FIG. 3 and discussed above. For example, one or more cores may execute an application that determines the assignments of task core groups to different cores maintained with the application tasks. The application wrapper 404 may keep track of the different task core groups. Whenever a task core group change may become desired for a given application task 406, e.g. to enable access to a data structure under the control of another task core group (either at the same core or a different core), the application wrapper 404 may receive the request to change task core groups, determine whether the request includes reassignment to a different core or not, and temporarily reassign to a different task core group (and core, where applicable) according to the core guard procedure that will be described with respect to 

In the same field of endeavor, Deng teaches determining whether to store the first application in a first processing queue or a second processing queue based on a comparison between a central processing unit (CPU) processing cost associated with the first application and a graphical processing unit (GPU) processing cost associated with the first application,(¶0083 “Optionally, in another embodiment, if a radio of the first duration to the second duration is less than a first preset threshold, the first subtask is allocated to the CPU task group; if a radio of the first duration to the second duration is greater than a second preset threshold, the first subtask is allocated to the GPU task group”), wherein the first application is associated with an elapsed time indication that indicates an amount of time the first application is in the second processing queue (¶0095] 205. Record execution log information of the first subtask into a performance database, where the execution log information includes a data volume of the first subtask, required waiting duration before the first subtask is executed, and a running platform and running duration of the first subtask, and the running platform is the CPU or the GPU); grouping a second set of applications assigned to the second processing queue into one or more GPU batches according to GPU batching criteria  (¶0080-0081 “Specifically, the process of classifying the first subtask may be determined based on two factors: a static characteristic and a cost of executing the first subtask, and the former has a higher priority. The indication information is used to indicate If the subtask includes the static characteristic, grouping is preferentially performed according to an indication of the static characteristic. The static characteristic may include: a rigid specification on each subtask by a user in a program or file configuration, or through another means, a user code semantic characteristic, and the like. For example, for a filter operation subtask, the user specifies that this type of operation is scheduled to the CPU for execution, and the specification may be implemented using @CPU in code. In this case, the subtask is classified into the CPU task group.” Where grouping is considered as a batch.  Thus, subtasks that have the same type of operation that are scheduled to the GPU for execution );  and causing a CPU to process the grouped first set of applications and a plurality of GPUs to process the grouped second set of applications (¶0036 “ It should be understood that only one CPU 11 and one GPU 12 are shown in FIG. 1. For the distributed heterogeneous system, there may be multiple CPUs 11 and GPUs 12. The at least one working node 13 shares a CPU resource and a GPU resource included in the distributed heterogeneous system. The subtask may be allocated to the working node 13, and then the subtask is scheduled to the CPU 11 or the GPU 12 corresponding to the working node 13 for execution. Different working nodes may be corresponding to a same CPU 11 or GPU 12, or may be corresponding to different CPUs 11 or GPUs 12, and this embodiment of the present disclosure is not limited thereto.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison 
In the same field of endeavor, Zhao teaches a non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a payment provider system to perform operations comprising:(¶0007 “In a second aspect of the present disclosure, there is provided an apparatus for scheduling applications. The apparatus includes a processor and a memory coupled to the processor having instructions stored therein, the instructions, when executed by the processor, causing the apparatus to perform acts”)
receiving instructions to process a first application in response to a user request (¶0026 “If there is a plurality of processing units available for running a plurality of applications, it needs to determine to which processing unit(s) each application is scheduled for running. In a conventional mechanism, the applications are scheduled only at the initial request phase. Specifically, when the clients requests to run the applications, a polling mechanism may be employed to schedule the applications sequentially to different processing units according to an incoming sequence of the requests of the clients”); 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng with processing a application as user request as seen in Zhao because this modification would schedule the applications sequentially to different processing units according to an incoming sequence of the requests of the clients (¶0026 of Zhao) Choi, Gregg, Weber, Deng,  Zhao are understood to be silent on the remaining limitations of claim 17.
In the same field of endeavor, Pohl teaches wherein the first application is associated with an elapsed time indication that indicates an amount of time the first application is stored in the second processing queue (col.6, lines 7-10 “ For example, although techniques of the present disclosure are described using threads, the techniques of this disclosure are similarly applicable to processes or individual tasks.”; col.6, lines 54-63 “For instance, in some examples, all threads may record insertion times into and removal times from a queue. A monitored thread may be a thread that is sensitive to response times such adverse system performance may occur if the execution of the monitored ; 
determining whether the elapsed time indication of the first application exceeds a predetermined time threshold(col.6, lines 63-67-col.7, lines 1-2 “In such examples, the kernel may initially record an insertion time when the monitored thread is inserted into the run queue. When the kernel later removes the monitored thread from the run queue to execute the thread, the kernel may determine a removal time. If the time that the monitored thread has been stored on the run queue is greater than or equal to a threshold value, operating system 22 may generate an event.”);
grouping a second set of applications stored in the second processing queue when the elapsed time indication of the first application is determined to exceed the predetermined time threshold (col.7, lines 5-17 “In other examples, the event may cause operating system 22 to re-prioritize a monitored thread that has been stored on the run queue if the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold. For instance, a group of four threads may each be assigned a priority level. One of the threads in the group of four threads may be a monitored thread such that the monitored thread's time stored on the run queue is measured by the kernel. If the amount of time the monitored thread is stored on the queue is greater than or equal to a specified threshold, scheduler 24 may re-schedule the monitored thread such that monitored thread is prioritized ahead of all other threads of the same priority level” where reprioritize a monitored thread in a group of four threads when the amount of time the 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber with grouping same type of operation that are scheduling to GPU for execution of Deng and processing  application as user request of Zhao with measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl because this modification would re-prioritize a monitored thread that has been stored on the run queue (col.7, lines 5-9 of Pohl). Choi, Gregg, Weber, Deng, Zhao, Pohl are understood to be silent on remaining limitations of claim 17.
In the same field of endeavor, BERGSMA teaches wherein a first total processing cost of a first task group and a second total processing cost of a second task group,  a predetermined threshold range (¶0106 “For example, the increment may be one (1), the first (or current best) number of groups may be set to one (1), and the second (or proposed increased) number of groups may be set to two (2) (step 702). The total allocation area obtained using one task duration group and the total allocation area obtained using two task duration groups are then computed and the difference between the two total allocation areas is calculated to determine the relative improvement in the cost function (steps 704 and 706). The relative improvement in the cost function is then compared to a predetermined threshold (step 708). If it is determined at step 710 that the relative improvement in the cost function is within the threshold, it can be concluded that the optimum number of groups is one and this number is used to determine the partition.” where total allocation area obtained using one task duration group is considered as total processing cost of task group)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and processing  application as user request of Zhao and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl  with calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold as seen BERGSMA because this modification would determine the relative improvement in the cost function and the optimum number of groups (¶0106 of BERGSMA) Choi, Gregg, Weber, Deng,  Zhao, Pohl and BERGSMA are understood to be silent on the remaining limitations of claim 17.
wherein the CPU grouping criteria includes a difference between a first total processing cost of a first task and a second total processing cost of a second task being within a predetermined threshold range (¶0101 “Job Similarity: Two (or more) jobs can be characterized as "similar", if one or more of their attributes are similar. For example, two jobs submitted by the same user may be judged similar. The similarity criterion may be more elaborated: e.g. two jobs with the same user, and same number of required processors (this is denoted as the "size" of the job), and the same runtime estimate, are judged similar. Various transformations on the job's attributes can be applied when determining if jobs are similar, e.g. if the difference between the size of the two jobs is smaller than, say, 10, they are judged similar. In general, any method that (at least partly) uses the jobs' attributes to determine similarity falls under this definition.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi with storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and processing application as user request of Zhao and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl  and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, 
Thus, the combination of Choi, Gregg, Weber, Deng,  Zhao, Pohl,  BERGSMA and Tsafrir teaches a non-transitory computer readable medium storing computer-executable instructions that in response to execution by one or more hardware processors, causes a payment provider system to perform operations comprising: receiving instructions to process a first application in response to a user request; determining whether to store the first application in a first processing queue or a second processing queue based on a comparison between a central processing unit (CPU) processing cost associated with the first application and a graphical processing unit (GPU) processing cost associated with the first application, wherein the first application is associated with an elapsed time indication that indicates an amount of time the first application is stored in the second processing queue; grouping a first set of applications stored in the first processing queue according to CPU grouping criteria, wherein the CPU grouping criteria includes a difference between a first total processing cost of a first task group and a second total processing cost of a second task group being within a predetermined threshold range; determining whether the elapsed time indication of the first application exceeds a predetermined time threshold; grouping a second set of applications stored in the second processing queue according to GPU batching criteria when the elapsed time indication of the first application is determined to exceed the predetermined time threshold; and causing a CPU to process the grouped first set of applications and a plurality of GPUs to process the grouped second set of applications.
Regarding claim 19, Choi, Gregg, Weber, Deng,  Zhao, Pohl,  BERGSMA and Tsafrir teach the non-transitory computer readable medium of claim 17, wherein the operations further comprise: in response to determining, based on the comparison, that the CPU processing cost is less than or equal to the GPU processing cost, storing the first application in the first processing queue, wherein the first set of applications includes the first application (see section 3 Proposed scheduling scheme, page 892, last paragraph of Choi “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is assigned to the CPU in order to reduce the completion time.”; see section 4 The Scheduling Algorithm of Gregg, “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm of Gregg, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 17.
Regarding claim 20, Choi, Gregg, Weber, Deng,  Zhao, Pohl,  BERGSMA and Tsafrir teach the non-transitory computer readable medium of claim 17, wherein the operations further comprise: in response to determining, based on the comparison, that the GPU processing cost is less than the CPU processing cost, storing the first application in the second processing queue, wherein the second set of applications includes the first application queue (see section 3 Proposed scheduling scheme, page 892, last paragraph of Choi “By using the Estimated-Execution-Time information, the scheduler selects the device that is suitable for the execution of the application. In other words, to select the device between the CPU and the GPU, the ‘estimatedCPUTime’ is compared with the ‘estimatedGPUTime’. The comparison of the ‘estimatedCPUTime’ with the ‘estimatedGPUTime’ is classified into three cases. The first case is when the ‘estimatedGPUTime’ is smaller than the predefined portion of the ‘estimatedCPUTime’, as shown in Fig. 6(a). The predefined portion is determined by the predefined threshold value (tv), as described in the pseudo code. It implies that the GPU can execute the application much faster than the CPU. Therefore, the application is assigned to the GPU in this case. Figure 6(b) describes the second case when the ‘estimatedGPUTime’ and the ‘estimatedCPUTime’ have little difference. In this case, the device to execute the application is selected according to the First-Free scheduling scheme. Therefore, in this case, the application is assigned to the device by considering the idle status of the device. The third case is when the ‘estimatedCPUTime’ is smaller than the predefined portion of the ‘estimatedGPUTime’, as shown in Fig. 6(c). In this case, the application is assigned to the CPU in order to reduce the completion time.”; see section 4 The Scheduling Algorithm of Gregg, “…..For clarity, we also assume that there are two devices available, a CPU and a GPU, although the algorithm could easily be extended to include an arbitrary number of devices. We also assume that most applications will run faster when assigned to the GPU.”; see section 4.1 Overview of the Algorithm of Gregg, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main queue” where a sub-queue for each device which is considered store application in first processing queue or second processing queue) In addition, the same motivation is used as the rejection for claim 17.
18 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Weber et al, U.S Patent Application Publication No. 20170090999 (“Weber) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of Zhao et al.,  U.S Patent Application Publication No. 20190121664 (“Zhao”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”) further in view of BERGSMA at al, U.S Patent Application Publication No.  20190087232 (“BERGSMA”) further in view of Tsafrir et al, U.S Patent Application Publication No. 20080155550 (“Tsafrir”)  further in view of Verner, Uri, Assaf Schuster, and Avi Mendelson. Processing Real-time Data Streams on GPU-based Systems. Diss. Computer Science Department, Technion, 2015.(“Verner”)
Regarding claim 18, Choi, Gregg, Weber, Deng,  Zhao, Pohl,  BERGSMA and Tsafrir teach the non-transitory computer readable medium of claim 17, wherein each group of the grouped second set of applications is processed by a GPU of the plurality of GPUs(¶0036 of Deng “ It should be understood that only one CPU 11 and one GPU 12 are shown in FIG. 1. For the distributed heterogeneous system, there may be multiple CPUs 11 and GPUs 12. The at least one working node 13 shares a CPU resource and a GPU resource included in the distributed heterogeneous system. The subtask may be allocated to the working node 13, and then the subtask is scheduled to the CPU 11 or the  Choi, Gregg, Weber, Deng,  Zhao, Pohl,  BERGSMA and Tsafrir are understood to be silent on the remaining limitations of claim 18.
Verner teaches wherein each group of the grouped second set of applications is processed by a different GPU of the plurality of GPUs (4.1 GPU Batch Scheduling of Verner , “ The streams on the GPU are processed in batches-of-jobs, or batches. A batch is a collection of jobs that are sent together for execution on the GPU in the same kernel instance. Every batch goes through a four-stage synchronous pipeline: data aggregation in main memory, data transfer to local GPU memory, kernel execution, and transfer of results to main memory. The duration of a pipeline period, which is defined by the system, is denoted by T. The pipeline stages are illustrated in Figure 4.1.” where job is considered as application see 5.2 Split Rectangle of Verner, “ When mapping streams to SMs, the VIRTUAL GPU method does not attempt to prioritize the SMs of one GPU over those of another. As a result, the streams with deadlines that are equal to or close to dmin are distributed among all GPUs. Effectively, on every GPU, the minimum stream deadline is approximately dmin. Since the batch collection period is 1/ 4dmin, batch processing on the GPUs is synchronous…. For example, suppose that a workload S with a minimum stream deadline dmin = 8ms is partitioned between GPU1 and GPU2. Using the VIRTUAL GPU method, the collection period for both GPUs is 8/4 = 2ms. However, if all streams with deadlines in [8 ms; 20ms] are assigned to GPU1 and all streams with deadlines higher than 20ms to GPU2, then the 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi with storing applications to either GPU queue or CPU queue of Gregg and grouping of potentially related tasks together (from either one application  or multiple applications) into task core groups that are assigned to specific cores of Weber and grouping same type of operation that are scheduling to GPU for execution of Deng and processing application as user request of Zhao and measure amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold of Pohl  and calculating the total allocation area obtained using one task duration group, computing difference between the two total allocation areas, comparing computed difference value with threshold of BERGSMA and determining if the difference between number required processor (size) of jobs within a predetermined value of Tsafrir with processing in batches of jobs with different GPUs as seen in Verner because this modification would minimize their overall execution time while guaranteeing that all batches complete on time (4.1 GPU Batch Scheduling, last paragraph of Verner).
wherein each group of the grouped second set of applications is processed by a different GPU of the plurality of GPUs.
11.	Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Choi, Hong Jun, et al. "An efficient scheduling scheme using estimated execution time for heterogeneous computing systems." The Journal of Supercomputing 65.2 (2013): 886-902 (“Choi”) in view of Gregg, Chris, et al. "Dynamic heterogeneous scheduling decisions using historical runtime data." Workshop on Applications for Multi-and Many-Core Processors (A4MMC). 2011. (“Gregg”) further in view of Deng, U.S Patent Application Publication No. 20170255496 (“Deng”) further in view of ABIEZZI et at., U.S Patent Application Publication No. 20140176583 (“Abiezzi”) further in view of Pohl et al., U.S Patent No.8954968 (“Pohl”)  further in view of Pusukuri et al, U.S Patent Application Publication No. 20140208330 (“Pusukuri”) further in view of Kucera., U.S Patent Application Publication No. 20120096046 (“Kucera”)
Regarding claim 21, Choi, Gregg, Deng,  Abiezz, Pohl , Pusukuri teach the method of claim 1 1, wherein the batch that includes the second data is generated further determining a number of applications in the GPU processing (col.7, lines 5-17 of Pohl, ¶0030 of Pusukuri, see section 4.1 Overview of the Algorithm of Gregg, “In essence, the scheduler we describe implements a greedy algorithm that assigns applications to devices based on a comparison between the predicted times for the application to finish on all available devices;….. Our scheduling algorithm is laid out as follows. We create a sub-queue for each device, and place applications in those sub-queues from the main 
Kucera teaches wherein the batch that includes the second data is generated further in response to determining a number of applications in the processing queue exceeds a maximum number of applications associated with the processing queue (¶0372 “ At 1704g, a number of batch jobs can be placed in a "run" batch queue to be run immediately. This number of batch jobs, at 1704g, is generally defined by the size of the run batch queue; that is, the run batch queue will have a maximum number of slots available to receive and execute batch jobs. In one implementation, the first jobs inserted in the run batch queue are those having no associated delay, that is, delay=0. When the maximum number of batch jobs are put into the run batch queue, that is, when the run batch queue is at capacity, remaining batch jobs defined in block 1704 are added to an "unfollow" batch queue, at 1704h. The unfollow batch queue holds jobs until slots are available to in the run batch queue. In the unfollow batch queue, a definition of the job is saved, including the query criteria, the object name, the number of rules, when the job was first attempted, and other data of interest. The scheduler then stops at 1704i.”; ¶0373 “ In some implementations, the size of the run batch queue is limited to restrain the computational burden on one or more servers running the batch jobs in the on-demand service environment. For instance, when 100 batch jobs are to be run, the run batch queue could be limited to holding a maximum of 5 batch jobs. After the run batch queue is filled, the remaining 95 jobs are placed in the unfollow batch queue. After the first scheduler executes the 5 batch jobs in the run batch queue, subsequent schedulers can be configured to check if additional jobs remain in the unfollow batch queue, pull jobs from the unfollow batch queue to replace completed jobs in the run batch queue, run those jobs, and repeat this process, e.g., at hourly intervals. where maximum number of batch jobs is at capacity, remaining batch jobs are added to an unfollow batch queue which is considered in response to determining a number of applications in the processing queue exceeds a maximum number of application associated with processing queue)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the scheduling scheme that assigns applications to GPU or CPU based on a comparison between estimated time of GPU and CPU of Choi and storing applications to either GPU queue or CPU queue of Gregg  and same type of operation that are scheduling to GPU for execution of  Deng and determining an amount of time for GPU processing of Abiezzi and measuring amount of time tasks/processes or threads stored in queue and compare the amount of time with specific threshold seen in Pohl and generating a number of thread group when the elapsed time  indication exceed the threshold of Pusukuri with adding remaining batch jobs to an unfollow batch queue when maximum number of batch jobs put into the run batch queue is at capacity as seen in Kucera because this modification would restrain the computational burden on one or more servers running the batch jobs in the on-demand service environment (¶0373  of Kucera )


Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SARAH LE whose telephone number is (571)270-7842.  The examiner can normally be reached on Monday: 8AM-4:30PM EST, Tuesday: 8 AM-3:30PM EST, Wednesday: 8AM-2:30PM EST, Thursday and Friday off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Zimmerman can be reached on 571-272-7653.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like 






/SARAH LE/Primary Examiner, Art Unit 2619