DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant’s submission filed on 3/3/2022 has been entered.

Claims 1-20 are presented for examination. Claims 1, 8 and 15 have been amended.

Examiner Notes
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirely as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 9/15/2021 and 3/3/2022.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a)  IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same,  and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had possession of the claimed invention.

Regarding to Claim 1, the claim limitation “in response to determining, prior to the first scheduling group beginning execution, that the first scheduling group does not have one or more waterfronts ready for execution” at lines 16-18 lacks support from the specification. At the Remarks submitted by 3/3/2022, Applicant stated at least “figures 5-6, and accompanying st paragraph of page 9 from the Remarks). However, examiner did not find out any support from the specification for such amended limitation, especially the Figs. 5-6 and [0035]-[0036] from the specification. The descriptions from Figs. 5-6 and [0035]-[0036] of the specification at most provide support for there is such action of determining the first scheduling group does not have one or more wavefronts ready for execution; however it is silence about whether such action is performed before or after the first scheduling group begins to execute. Examiner even found out descriptions from Fig. 4 and [0030]-[0031] of Applicant’s specification related to feature of selecting lower priority scheduling group in response to determining a higher priority scheduling group does not have any wavefronts ready for execution after the higher priority scheduling group began to execute (based on Fig. 4 and [0030]-[0031], at time slot t1, the group of kernels B and D having higher priority is scheduled and executed; later at time slot t2, “Wavefronts from kernel D” having lower priority than the group of kernels B and D,  “is now able to be scheduled in time slot t2 since there are no higher priority kernels available in the same cycle”. At this example, at the time of determining the higher priority group containing B and D does not have any wavefront is ready for execution to select the lower priority group containing kernel D, the higher priority group containing kernel B and D already began to execute). Thereby, the limitation mentioned above fails to comply with the written description requirement.

Claims 2-7 are rejected for failing to cure the deficiency from their respective parent claim by dependency.

Regarding to Claim 8, Claim 8 is rejected under the same reason set forth in the rejection of Claim 1 above.
Claims 9-14 are rejected for failing to cure the deficiency from their respective parent claim by dependency.

Regarding to Claim 15, Claim 15 is rejected under the same reason set forth in the rejection of Claim 1 above.
Claims 16-20 are rejected for failing to cure the deficiency from their respective parent claim by dependency.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 7-10 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block .
Boudier, Hsu and Yudanov were cited on the previous office action.

Regarding to Claim 1, Lee discloses: a system comprising:
a compute unit is configured to:
receive a plurality of wavefronts of a plurality of kernels (see lines 15-19, 21-23 of abstract; “the optimal number of thread blocks”, “the ‘block’ of CTAs allocated to a core” and “multiple kernels to be allocated to the same core”. In order to schedule or execute multiple thread blocks/CTAs and kernels on a same core, the method is inherently to require to receive a plurality of wavefronts of a plurality of kernels. Note: see lines 7-11 of 1st paragraph of 1. Introduction for the relationship between wavefronts/warps and thread blocks/CTAs);
create a plurality of scheduling groups including one or more scheduling groups that each comprises wavesfronts from at least one kernel of the plurality of kernels, wherein wavefronts selected from inclusion in a scheduling group of the one or more scheduling groups are selected based on an identified criteria of a corresponding kernel (see lines 7-11 of 1st paragraph of 1. Introduction, Fig. 9(a), lines 1-9 of 1st paragraph of 4.2 Block CAT Scheudling (BCS), “A collection of threads are grouped to form a warp or a wavefront and the warps are combined to create a CTA (cooperative thread array) or a thread block”, “a kernel with 16X16 CTA dimension”. Wavefronts of same/common kernel are grouped into certain groups as CATs);
, for scheduling, a first scheduling group from the plurality of scheduling groups; and select for scheduling a second scheduling group from the plurality of scheduling groups (see Fig. 2 at page 2, lines 7-15 of 1st paragraph of 1. Introduction; “All threads within a CTA are executed on the same core and the threads within a warp are often executed together”, “a warp (or a wavefront) scheduler to determine which warp is executed” and “a thread block or CTA scheduler to assign CTAs to cores”. The CTAs including at least first CTA and second CTA, i.e., claimed first scheduling group and claimed second scheduling group, scheduled by the CTA scheduler will be scheduled for execution).

Lee does not disclose: 
a plurality of compute units; and
a command processor coupled to the plurality of compute units, wherein the command processor is configured to dispatch kernels to the plurality of compute units;
wherein each compute unit of the plurality of compute units is configured to: 
receive, from the command processor, a plurality of wavefronts of a plurality of kernels;
each scheduling group comprises wavesfronts from at least two kernels of the plurality of kernels, the identified criteria of a corresponding kernel for creating the plurality of scheduling groups is an identified priority of a corresponding kernel;
the second scheduling group are selected in response to determining, prior to the first scheduling group beginning execution, that the first scheduling group does not have one or more wavefronts ready for execution.


a plurality of compute units (see Figs. 1, 2 and [0022]; “The CPs 134 each may include many processing elements 212 (see FIG. 2) that perform as single instruction multiple data (SIMD) processing elements 212”); and
a command processor coupled to the plurality of compute units (see Figs. 1, 2 and [0022]; “A command processor 140 may control a group of CUs 134”), wherein the command processor is configured to dispatch kernels to the plurality of compute units (see [0008], [0029]; “The GPU may determine a number of processing elements for the consumer kernels to execute on” and “The command processor 140 may control the processing elements 212 by determining a kernel 220 that should be executed on each of the processing elements 212”);
wherein each compute unit of the plurality of compute units is configured to: 
receive, from the command processor, a plurality of wavefronts of a plurality of kernels (see [0008], [0029]; “The GPU may determine a number of processing elements for the consumer kernels to execute on” and “The command processor 140 may control the processing elements 212 by determining a kernel 220 that should be executed on each of the processing elements 212”. Note: [0008] and [0029] from Boudier may only include descriptions for compute unit receives a plurality of kernels instead of receiving “a plurality of wavefronts of a plurality of kernels”. It is understood in GPU technology field that a kernel at least includes a wavefronts or warp, see lines 1-11 of 1st paragraph of 1. Introduction from Lee, [0002] from Applicant’s specification).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify a system architecture of a GPU-type of processing element having two level of schedulers for scheduling received kernels/workloads from Lee by they are prior arts related to different parts of a same computing system respectively. Combining such two prior art together provides a completed system architecture of a same component).

The combination of Lee and Boudier does not disclose:
each scheduling group comprises wavesfronts from at least two kernels of the plurality of kernels, the identified criteria of a corresponding kernel for creating the plurality of scheduling groups is an identified priority of a corresponding kernel;
the second scheduling group are selected in response to determining, prior to the first scheduling group beginning execution, that the first scheduling group does not have one or more wavefronts ready for execution.
However, Hsu discloses: create a plurality of scheduling groups including one or more scheduling groups that each comprises wavefronts from at least two kernels of the plurality of kernels, wherein wavefronts selected for inclusion in a scheduling group of one or more scheduling groups are selected based on an identified criteria of a corresponding kernel (see Figs. 2-3, [0030]-[0032], [0043]-[0046]; “In this example, the application 201 launches kernels 201.1 and 201.2”, “an application my launch any number of kernels”, “the wavefront assign wavefronts of the workgroups 201.1.2 and 201.2.1 to the active subset 316, while all other wavefronts are assigned to the pending subset 314”, “assign wavefronts of the kernels 201.2 and 202.1 to the active subset 316, while all other wavefronts are assigned to the pending subset 314”, “a wavefront is classified based on its application identifier. For example, the wavefront classifier 310 may assign wavefronts of the application 201 of FIG. 2 to the active subset 316, while all other wavefronts are assigned to the pending subset 314” and “wavefronts of the same application, kernel, or workgroup may be grouped together for processing”, emphasis added. In the particular example of Fig. 2, Application 202 only launches one single kernel 202.1; however it is understood that there is a well-known example that Application 202 also launch more than one kernel as Application 201 from the particular example of Fig. 2. At such example, assigning wavefronts into active subset and pending subset based on same application identifier associated with the kernels/wavefronts would include each of active subset and pending subset comprises wavefronts from at least two kernels, wherein wavefronts selected for inclusion in one of the subsets are selected based on an identified criteria of the corresponding kernel. Also see [0039]-[0040] for the detail explanations on active subset 316 and pending subset 314 as scheduling groups);
select, for scheduling, a first scheduling group from the plurality of scheduling groups (see Figs. 2-3, [0007], [0038]-[0040] and [0044]; “the wavefront scheduler may be further configured to schedule the active subset for processing before scheduling the pending subset for processing”);
in response to determining that the first scheduling group does not have one or more wavefronts ready for execution, select for scheduling a second scheduling group from the plurality of scheduling groups (see Figs. 2-3, [0007], [0038]-[0040] and [0044]; “the wavefront wavefronts assigned to the pending subset 314 may become eligible if all of the wavefronts assigned to the active subset 316 become stalled, e.g., have outstanding address translation operations”, emphasis added. In response to determining that all of the wavefronts from the active subset become stalled, i.e., there is no wavefronts at the active subset now is ready for execution since all of the wavefronts from the active subset are already scheduled and processed and now stalled, the system would select the wavefronts from the pending subset for schedule to execute).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the grouping mechanism of wavefronts to form different scheduling groups from the combination of Lee and Boudier by including grouping different wavefronts that are even from different kernels to form different scheduling groups from Hsu, since the wavefronts from different kernels may still have affinity to be executed as a group (see [0045] from Hsu).

The combination of Lee, Boudier and Hsu does not disclose:
the identified criteria of a corresponding kernel for creating the plurality of scheduling groups is an identified priority of a corresponding kernel.
the determination of the first scheduling group does not have one or more wavefronts ready for execution is performed prior to the first scheduling group beginning execution.
However, Yudanov discloses: create a plurality of scheduling groups including one or more scheduling groups that each comprises, wherein threads selected for inclusion in a kernels that are given higher priority at runtime and the group of the threads/wavefronts from different applications or kernels that are given lower priority at runtime).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of grouping different wavefronts from at least two kernels into different scheduling groups for executions from the combination of Lee, Boudier and Hsu by including method of coalescing threads from different applications or kernels based on priority of the application or kernel from Yudanov, and thus the new combination would teach the limitation of wherein wavefronts selected for inclusion in a scheduling group of the one or more scheduling groups are selected based on an identified priority of a corresponding kernel, since scheduling a group of instructions/threads/jobs/tasks having same priority together is a well-known computing task scheduling mechanism (note: Lee also discusses prioritizing threads/instructions for execution, see lines 11-16 of right side of page 5 from Lee; however Lee does not explicitly discuss threads/wavefronts from different kernels having same priority as a group can also be scheduled together). 

prior to the first scheduling group beginning execution.
However, Kakadia discloses: a scheduling tasks for executions comprises:
in response to determining, prior to the first scheduling task beginning execution, that the first scheduling task is not ready for execution, select for scheduling a second scheduling task from plurality of tasks (see [0056]. Before the first task having high priority beginning execution, determine whether this first task is ready for execution or not, if such first task is not ready for execution, i.e., no high priority task ready for execution, then the scheduler would select a lower priority task for scheduling).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the scheduling algorithm from the combination of Lee, Boudier, Hsu and Yudanov by including a scheduling algorithm of scheduling a lower priority task if there is no sufficient resources for executing higher priority tasks from Kakadia, and thus the combination of Lee, Boudier, Hsu, Yudanov and Kakadia would disclose the missing limitation from the combination of Lee, Boudier, Hsu and Yudanov (note: Kakadia may not disclose scheduling groups having multiple tasks/wavefonts; however, in views of the wavefronts of each scheduling group from the combination of Lee, Boudier, Hsu and Yudanov have same priority level and both of the combination of Lee, Boudier, Hsu, Yudanov and Kakadia are related to scheduling higher priority tasks first in general situation, it is still reasonable apply the scheduling mechanism at certain specifically situations (like there is no sufficient resources for executing higher priority task when scheduler tries to schedule higher priority task) from Kakadia into the combination of Lee, Boudier, Hsu and Yudanov), since it 

Regarding to Claim 2, the rejection of Claim 1 is incorporated and the combination of Lee, Boudier, Hsu, Yudanov and Kakadia discloses: wherein, in response to determining that the first scheduling group has one or more wavefronts ready for execution, each compute unit is further configured to: schedule wavefronts for execution from the first scheduling group; and prevent wavefronts of scheduling groups other than the first scheduling group from being scheduled for execution (see [0007], [0038]-[0040] and [0044] from Hsu; “schedule the active subset for processing before scheduling the pending subset for processing”, “Wavefronts in the active subset 316 are eligible to be scheduled (by the scheduler 320), and processed (by the processing lanes 340); wavefronts assigned to the pending subset are generally ineligible for scheduling and processing. However, one or more wavefronts assigned to the pending subset 314 may become eligible if all of the wavefronts assigned to the active subset 316 become stalled, e.g., have outstanding address translation operations”. Only the wavefronts from the active subset should be selected for scheduling and executing when there is still at least one wavefront in the active subset is not yet be scheduled or executed, i.e., prevent wavefronts of scheduling groups other than the active subset that is mapped to claimed first scheduling group from being scheduled for execution. Determination of whether all of the wavefronts assigned to the active subset 316 become stalled at least is equivalent to determining whether the active subset 316 still has one or more wavefronts ready for execution under BRI).

Regarding to Claim 3, the rejection of Claim 1 is incorporated and further the combination of Lee, Boudier, Hsu, Yudanov and Kakadia discloses: each compute unit is configured to group wavefronts within a same priority together into a same scheduling group (see Fig. 9, lines 11-16 of right side of page 5 from Lee, Fig. 1 and [0033] from Yudanov).

Regarding to Claim 7, the rejection of Claim 1 is incorporated and further the combination of Lee, Boudier, Hsu, Yudanov and Kakadia discloses: wherein, in further response to determining that the second scheduling group has one or more wavefronts ready for execution, each compute unit is further configured to: schedule wavefronts for execution from the second scheduling group; and prevent wavefronts from scheduling groups other than the second scheduling group from being scheduled for execution (see [0007] and [0054] from Hsu; “a plurality of subsets comprising an active subset and a pending subset”, “The plurality of subsets may further comprises a pre-fetch subset, and the wavefront scheduler may be further configured to schedule the pre-fetch subset for processing after scheduling the active subset for processing and before scheduling the pending subset for processing” and “one or more wavefronts assigned to the pre-fetch subset 318 may become eligible if all wavefronts assigned to the active subset 518 become stalled, and one or more wavefronts assigned to the pending subset 514 may become eligible if wavefronts assigned to the active subset 516 and pre-fetch subset 518 become stalled”. In one of the embodiments, the scheduling groups at the combination system at least include an active subset, a pre-fetch subset and a pending subset, wherein the active subset is mapped to the claimed first scheduling group and the pre-fetch subset is mapped to the claimed second scheduling group. The wavefronts from the active subset are scheduled first among the three subsets/groups, the wavefronts from the pre-fetch subset are going to be scheduled after all .

Regarding to Claim 8, Claim 8 is a method claim corresponds to system Claim 1 and is rejected for the same reason set forth in the rejection of Claim 1 above.

Regarding to Claim 9, the rejection of Claim 8 is incorporated and further Claim 9 is a method claim corresponds to system Claim 2 and is rejected for the same reason set forth in the rejection of Claim 2 above.

Regarding to Claim 10, the rejection of Claim 8 is incorporated and further Claim 10 is a method claim corresponds to system Claim 3 and is rejected for the same reason set forth in the rejection of Claim 3 above.

Regarding to Claim 14, the rejection of Claim 8 is incorporated and further Claim 17 is a method claim corresponds to system Claim 7 and is rejected for the same reason set forth in the rejection of Claim 7 above.

Claims 4-5 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block scheduling-recorded on IDS submitted 9/19/2019, hereafter Lee) in view of Boudier (US PGPUB .
Boudier, Hsu, Yudanov, Fontenot and Sollars were cited on the previous office action.

Regarding to Claim 4, the rejection of Claim 1 is incorporated, the combination of Lee, Boudier, Hsu, Yudanov and Kakadia discloses: each of the one or more scheduling groups comprises wavefronts from at least two kernels of the plurality of kernels (see Figs. 2-3 of Hsu and [0033] from Yudanov. The combination system groups threads or wavefronts to different scheduling groups based on the corresponding applications and kernels having same priority, and thus each of the scheduling groups based on such method would result all/every scheduling groups comprise threads/wavefronts from different kernels having same priority).
The combination of Lee, Boudier, Hsu, Yudanov and Kakadia does not disclose: wherein each compute unit is further configured to:
monitor one or more conditions indicative of resource contention on the compute unit, the one or more conditions comprising at least one of compute unit stall cycles, cache miss rates, memory access latency, and link utilization; 
generate a first measure of resource contention based on the one or more conditions being monitored; and
move a lowest priority scheduling group into a descheduled queue responsive to determining that the first measure of resource contention is greater than a first threshold, wherein:

one or more scheduling groups stored in the descheduled queue comprise wavefronts from at least two kernels of the plurality of kernels.

However, Fontenot discloses: a method comprising:
monitor one or more conditions indicative of resource contention on the compute unit, the one or more conditions comprising at least one of compute unit stall cycles, cache miss rates, memory access latency, and link utilization (see [0097]; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. Also see “When the cache miss rate for L2 cache 624 and L3 cache 626 is low, core 610 likely is able to effectively utilize additional levels of parallelism resulting in an increase in overall computes” from [0087] and “should the cache misses as counted by counter 840 exceed the lower count value of count threshold 850” from [0099], and thus the counter 840 is a measurement of resource contention); 
generate a first measure of resource contention based on the one or more conditions being monitored (see [0097]; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
reduce number of parallel execution threads responsive to determining that the first measure of resource contention is greater than a first threshold, wherein: threads from the reduced number of parallel execution threads are prevented from being scheduled for execution on the compute unit (see [0100]; “the cache misses as counted by counter 840 exceed the upper count value of count threshold 850, core 810 can remove layers of parallelism by switching from a higher SMT mode to a lower SMT mode” and “overall computes may be increased by reducing 
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the warp scheduler in the compute unit of GPU to schedule executions of groups of wavefronts having threads from the combination of Lee, Boudier, Hsu, Yudanov and Kakadia by including a method of increasing or decreasing the number of parallel thread executions in responsive to different resource contention situations as taught by Fontenot, and thus the new combination system would teach some portions of the missing limitations mentioned above (note: at the new combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed the upper count value of count threshold 850, then the combination system would reduce number of parallel execution threads, i.e., reduce number of scheduling groups. Since such reducing is reducing the parallel execution, and thus the wavefronts from the reduced scheduling groups would be prevented from being scheduled for execution on the GPU compute unit. In addition, since every scheduling group comprises wavefronts from different kernels having same priority, and thus the wavefronts from the reduced scheduling group would comprises wavefronts from different kernels), since it would provide a method of efficiently utilizing resources of the system via increasing parallel thread executions number when resource contention is low and decreasing parallel thread executions number when resource contention is high (see [0099]-[0100] from Fontenot).   

The combination of Lee, Boudier, Hsu, Yudanov, Kakadia and Fontenot does not disclose: reduce number of scheduling groups is move a lowest priority scheduling group into a descheduled queue.

It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of reducing the number of parallel thread executions when the resource contention is reached a threshold level as taught by the combination of Lee, Boudier, Hsu, Yudanov, Kakadia and Fontenot by including a method of moving lowest priority allocated thread/context into a queue which holes deallocated thread/context when resource contention is reached a threshold level as taught by Sollars, thereby the new combination system would teach the missing limitations from the combination of Lee, Boudier, Hsu, Yudanov and Kakadia, since it would provide a mechanism of only deallocate or descheudle lowest priority threads/contexts, and thus avoid to deallocate the threads/contexts having higher priority (see lines 15-29 of col. 12).

Regarding to Claim 5, the rejection of Claim 4 is incorporated and further the combination of Lee, Boudier, Hsu, Yudanov, Kakadia, Fontenot and Sollars discloses: wherein each compute unit is configured to:
wait a given amount of time after moving the lowest priority scheduling group into the descheduled queue (see Fig. 13, [0097], [0145]-[0146] from Fontenot; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. After performing one action of reducing parallel thread execution number, i.e., moving the lowest priority scheduling 
generate a second measure of resource contention based on the one or more conditions being monitored (see Fig. 13, [0097], [0145]-[0146] from Fontenot and the analysis of previous limitation; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
move a next lowest priority scheduling group into the descheduled queue responsive to determining the second measure of resource contention is greater than the first threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee, [0100] from Fontenot and lines 16-29 of col. 12 from Sollars. At the combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed the upper count value of count threshold 850, then the combination system would reduce number of parallel execution threads, i.e., reduce number of scheduling groups; by moving the current lowest priority scheduling group, i.e., the previous next lowest priority scheduling group, into the descheduled queue, and thus the wavefronts from the scheduling groups stored in the descheudled queue are prevented from being scheduled for execution on the GPU compute unit).

Regarding to Claim 11, the rejection of Claim 8 is incorporated and further Claim 11 is a method claim corresponds to system Claim 4 and is rejected for the same reason set forth in the rejection of Claim 4 above.

Regarding to Claim 12, the rejection of Claim 11 is incorporated and further Claim 12 is a method claim corresponds to system Claim 5 and is rejected for the same reason set forth in the rejection of Claim 5 above.

Claims 6 and 13 is rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block scheduling-recorded on IDS submitted 9/19/2019, hereafter Lee) in view of Boudier (US PGPUB 20130332702 A1), Hsu et al. (US PGPUB 20160055005 A1, hereafter Hsu), Yudanov et al. (US PGPUB 20160371082 A1, hereafter Yudanov), Kakadia et al. (US PGPUB 20150208275 A1, hereafter Kakadia), Fontenot et al. (US PGPUB 20110302372 A1, hereafter Fontenot) and Sollars (US Patent 5900025 A) and further in view of Otenko (US PGPGPUB 20150212794 A1).
Boudier, Hsu, Yudanov, Fontenot, Sollars and Otenko were cited on the previous office action.

Regarding to Claim 6, the rejection of Claim 4 is incorporated and further the combination of Lee, Boudier, Hsu, Yudanov, Kakadia, Fontenot and Sollars discloses: wherein each compute unit is configured to:
wait a given amount of time after moving the lowest priority scheduling group into the descheduled queue (see Fig. 13, [0097], [0145]-[0146] from Fontenot; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. After performing one action of reducing parallel thread execution number, i.e., moving the lowest priority scheduling group into descheduled queue, at the combination system, the method would return step/action of 
generate a second measure of resource contention based on the one or more conditions being monitored (see Fig. 13, [0097], [0145]-[0146] from Fontenot and the analysis of previous limitation; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
increase number of scheduling groups responsive to determining the second measure of resource contention is less than a second threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee and [0099] from Fontenot; “should the cache misses as counted by counter 840 exceed the lower count value of count threshold 850, core 810 can add additional layers of parallelism by switching from a lower SMT mode to a higher SMT mode” and “overall computes may be increased by increasing parallelism and the contention for a contested resource. In this case, the number of parallel threads could be increased”. At the combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed lower count value of count threshold 850, then the combination system would increase number of parallel execution threads, i.e., increase number of scheduling groups).

The combination of Lee, Boudier, Hsu, Yudanov, Kakadia, Fontenot and Sollars does not disclose: increase number of scheduling groups is move a highest priority scheduling group out of the descheduled queue.
However, Otenko discloses: a method of increasing allocated tasks comprising: move a highest priority task out of the descheudled queue (see [0023]; “When the contention level is low in the system, the underlying priority queue 101 can sort the requests waiting in the priority 
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of increasing the number of parallel thread executions when the resource contention is low as taught by the combination of Lee, Boudier, Hsu, Yudanov, Kakadia, Fontenot and Sollars by including a method of scheduling the highest priority tasks in a wait queue having tasks to be scheduled as soon as possible as taught by Otenko, since it is understood to scheduling a highest priority task first instead of a lower priority task (see [0023] from Otenko. Also see lines 11-16 of right side of page 5 from Lee).
Thereby, the combination of Lee, Boudier, Hsu, Yudanov, Kakadia, Fontenot, Sollars and Otenko discloses: move a highest priority scheduling group out of the descheduled queue responsive to determining the second measure of resource contention is less than a second threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee, [0100] from Fontenot, lines 16-29 of col. 12 from Sollars and [0023] from Otenko. At the combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed lower count value of count threshold 850, then the combination system would increase number of parallel execution threads, i.e., increase number of scheduling groups, by moving the highest priority scheduling group out of the descehduled queue).

Regarding to Claim 13, the rejection of Claim 11 is incorporated and further Claim 13 is a method claim corresponds to system Claim 6 and is rejected for the same reason set forth in the rejection of Claim 6 above.

Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block scheduling-recorded on IDS submitted 9/19/2019, hereafter Lee) in view of Hsu et al. (US PGPUB 20160055005 A1, hereafter Hsu), Yudanov et al. (US PGPUB 20160371082 A1, hereafter Yudanov) and Kakadia et al. (US PGPUB 20150208275 A1, hereafter Kakadia).
Hsu and Yudanov was cited on the previous office action.

Regarding to Claim 15, Lee discloses: an apparatus comprising:
a memory, and a processor coupled to the memory, wherein the processor is configured to (see 2.1. Methodology of page 2; “The GPGPU that we model consists of 28 cores (or streaming multiprocessors (SMs)) connected to 8 memory controllers, as shown in Fig. 2. Each core has its own private L1 data cache, texture cache, and shared memory”):
receiving a plurality of wavefronts of a plurality of kernels (see lines 15-19, 21-23 of abstract; “the optimal number of thread blocks”, “the ‘block’ of CTAs allocated to a core” and “multiple kernels to be allocated to the same core”. In order to schedule or execute multiple thread blocks/CTAs and kernels on a same core, the method is inherently to require to receive a plurality of wavefronts of a plurality of kernels. Note: see lines 7-11 of 1st paragraph of 1. Introduction for the relationship between wavefronts/warps and thread blocks/CTAs);
one kernel of the plurality of kernels, wherein wavefronts selected from inclusion in a scheduling group of the one or more scheduling groups are selected based on an identified criteria of a corresponding kernel (see lines 7-11 of 1st paragraph of 1. Introduction, Fig. 9(a), lines 1-9 of 1st paragraph of 4.2 Block CAT Scheudling (BCS), “A collection of threads are grouped to form a warp or a wavefront and the warps are combined to create a CTA (cooperative thread array) or a thread block”, “a kernel with 16X16 CTA dimension”. Wavefronts of same/common kernel are grouped into certain groups as CATs); and
select, for scheduling, a first scheduling group from the plurality of scheduling groups; and select for scheduling a second scheduling group from the plurality of scheduling groups (see Fig. 2 at page 2, lines 7-15 of 1st paragraph of 1. Introduction; “All threads within a CTA are executed on the same core and the threads within a warp are often executed together”, “a warp (or a wavefront) scheduler to determine which warp is executed” and “a thread block or CTA scheduler to assign CTAs to cores”. The CTAs including at least first CTA and second CTA, i.e., claimed first scheduling group and claimed second scheduling group, scheduled by the CTA scheduler will be scheduled for execution).

Lee does not disclose: 
each scheduling group comprises wavesfronts from at least two kernels of the plurality of kernels, the identified criteria of a corresponding kernel for creating the plurality of scheduling groups is an identified priority of a corresponding kernel;
, prior to the first scheduling group beginning execution, that the first scheduling group does not have one or more wavefronts ready for execution.
However, Hsu discloses: create a plurality of scheduling groups including one or more scheduling groups that each comprises wavefronts from at least two kernels of the plurality of kernels, wherein wavefronts selected for inclusion in a scheduling group of one or more scheduling groups are selected based on an identified criteria of a corresponding kernel (see Figs. 2-3, [0030]-[0032], [0043]-[0046]; “In this example, the application 201 launches kernels 201.1 and 201.2”, “an application my launch any number of kernels”, “the wavefront classifier 310 may assign wavefronts of the workgroups 201.1.2 and 201.2.1 to the active subset 316, while all other wavefronts are assigned to the pending subset 314”, “assign wavefronts of the kernels 201.2 and 202.1 to the active subset 316, while all other wavefronts are assigned to the pending subset 314”, “a wavefront is classified based on its application identifier. For example, the wavefront classifier 310 may assign wavefronts of the application 201 of FIG. 2 to the active subset 316, while all other wavefronts are assigned to the pending subset 314” and “wavefronts of the same application, kernel, or workgroup may be grouped together for processing”, emphasis added. In the particular example of Fig. 2, Application 202 only launch one single kernel 202.1; however it is understood that there is a well-known example that Application 202 also launch more than one kernel as Application 201 from the particular example of Fig. 2. At such example, assigning wavefronts into active subset and pending subset based on same application identifier associated with the kernels/wavefronts would include each of active subset and pending subset comprises wavefronts from at least two kernels, wherein wavefronts selected for inclusion in one of the subsets are selected based on an active subset 316 and pending subset 314 as scheduling groups);
select, for scheduling, a first scheduling group from the plurality of scheduling groups (see Figs. 2-3, [0007], [0038]-[0040] and [0044]; “the wavefront scheduler may be further configured to schedule the active subset for processing before scheduling the pending subset for processing”);
in response to determining that the first scheduling group does not have one or more wavefronts ready for execution, select for scheduling a second scheduling group from the plurality of scheduling groups (see Figs. 2-3, [0007], [0038]-[0040] and [0044]; “the wavefront scheduler may be further configured to schedule the active subset for processing before scheduling the pending subset for processing” and “wavefronts assigned to the pending subset are generally ineligible for scheduling and processing. However, one or more wavefronts assigned to the pending subset 314 may become eligible if all of the wavefronts assigned to the active subset 316 become stalled, e.g., have outstanding address translation operations”, emphasis added. In response to determining that all of the wavefronts from the active subset become stalled, i.e., there is no wavefronts at the active subset now is ready for execution since all of the wavefronts from the active subset are already scheduled and processed and now stalled, the system would select the wavefronts from the pending subset for schedule to execute).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the grouping mechanism of wavefronts to form different scheduling groups from Lee by including grouping different wavefronts that are even from different kernels to form different scheduling groups from Hsu, since the wavefronts from different kernels may still have affinity to be executed as a group (see [0045] from Hsu).

The combination of Lee and Hsu does not disclose:
the identified criteria of an corresponding kernel for creating the plurality of scheduling groups is an identified priority of a corresponding kernel;
the determination of the first scheduling group does not have one or more wavefronts ready for execution is performed prior to the first scheduling group beginning execution.
However, Yudanov discloses: create a plurality of scheduling groups including one or more scheduling groups that each comprises, wherein threads selected for inclusion in a scheduling group of the one or more scheduling groups are selected based on an identified priority of a corresponding kernel (see [0033]; “the scheduler 230 are configured to select groups of threads for execution”, “threads may be allocated to a group if the threads are accessing the same portion of the main memory 215” and “threads may be coalesced to provide preferential access to applications or kernels that are given higher priority at runtime”. There is at least two groups of threads/wavefronts being coalesced, i.e., the group of the threads/wavefronts from different applications or kernels that are given higher priority at runtime and the group of the threads/wavefronts from different applications or kernels that are given lower priority at runtime).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of grouping different wavefronts from at least two kernels into different scheduling groups for executions from the combination of Lee and Hsu by including method of coalescing threads from different applications or kernels based on priority of the application or kernel from Yudanov, and thus the new combination would teach the limitation of wherein wavefronts selected for inclusion in a scheduling group of the one 

The combination of Lee, Hsu and Yudanov does not disclose: the determination of the first scheduling group does not have one or more wavefronts ready for execution is performed prior to the first scheduling group beginning execution.
However, Kakadia discloses: a scheduling tasks for executions comprises:
in response to determining, prior to the first scheduling task beginning execution, that the first scheduling task is not ready for execution, select for scheduling a second scheduling task from plurality of tasks (see [0056]. Before the first task having high priority beginning execution, determine whether this first task is ready for execution or not, if such first task is not ready for execution, i.e., no high priority task ready for execution, then the scheduler would select a lower priority task for scheduling).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the scheduling algorithm from the combination of Lee, Hsu and Yudanov by including a scheduling algorithm of scheduling a lower priority task if there is no sufficient resources for executing higher priority tasks from Kakadia, and thus the combination of Lee, Hsu, Yudanov and Kakadia would disclose the missing limitation from the combination of Lee, Hsu and Yudanov (note: Kakadia may not disclose scheduling groups 

Regarding to Claim 16, the rejection of Claim 15 is incorporated and the combination of Lee, Hsu, Yudanov and Kakadia discloses: wherein, in response to a determination that the first scheduling group has one or more wavefronts ready for execution, the processor is further configured to: schedule wavefronts for execution from the first scheduling group; and prevent wavefronts of scheduling groups other than the first scheduling group from being scheduled for execution (see [0007], [0038]-[0040] and [0044] from Hsu; “schedule the active subset for processing before scheduling the pending subset for processing”, “Wavefronts in the active subset 316 are eligible to be scheduled (by the scheduler 320), and processed (by the processing lanes 340); wavefronts assigned to the pending subset are generally ineligible for scheduling and processing. However, one or more wavefronts assigned to the pending subset 314 may become eligible if all of the wavefronts assigned to the active subset 316 become stalled, e.g., have outstanding address translation operations”. Only the wavefronts from the active subset should be selected for scheduling and executing when there is still at least one wavefront in the active subset is not yet be scheduled or executed, i.e., prevent wavefronts of scheduling groups other than the active subset that is mapped to claimed first scheduling group from being scheduled for execution. Determination of whether all of the wavefronts assigned to the active subset 316 become stalled at least is equivalent to determining whether the active subset 316 still has one or more wavefronts ready for execution under BRI).

Regarding to Claim 17, the rejection of Claim 15 is incorporated and further the combination of Lee, Hsu, Yudanov and Kakadia discloses: each compute unit is configured to group wavefronts within a same priority together into a same scheduling group (see Fig. 9, lines 11-16 of right side of page 5 from Lee, Fig. 1 and [0033] from Yudanov).

Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block scheduling-recorded on IDS submitted 9/19/2019, hereafter Lee) in view of Hsu et al. (US PGPUB 20160055005 A1, hereafter Hsu), Yudanov et al. (US PGPUB 20160371082 A1, hereafter Yudanov) and Kakadia et al. (US PGPUB 20150208275 A1, hereafter Kakadia) and further in view of Fontenot et al. (US PGPUB 20110302372 A1, hereafter Fontenot) and Sollars (US Patent 5900025 A).
Hsu, Yudanov, Fontenot and Sollars were cited on the previous office action.

Regarding to Claim 18, the rejection of Claim 15 is incorporated, the combination of Lee, Hsu, Yudanov and Kakadia discloses: each of the one or more scheduling groups comprises wavefronts from at least two kernels of the plurality of kernels (see Figs. 2-3 of Hsu and [0033] from Yudanov. The combination system groups threads or wavefronts to different scheduling all/every scheduling groups comprise threads/wavefronts from different kernels having same priority).

The combination of Lee, Hsu, Yudanov and Kakadia does not disclose:
monitor one or more conditions indicative of resource contention on the compute unit, the one or more conditions comprising at least one of compute unit stall cycles, cache miss rates, memory access latency, and link utilization;
generate a first measure of resource contention based on the one or more conditions being monitored; and
move a lowest priority scheduling group into a descheduled queue responsive to determining that the first measure of resource contention is greater than a first threshold, wherein:
wavefronts from scheduling groups stored in the descheduled queue are prevented from being scheduled for execution on the compute unit; and
one or more scheduling groups stored in the descheduled queue comprise wavefronts from at least two kernels of the plurality of kernels.

However, Fontenot discloses: a method comprising:
monitor one or more conditions indicative of resource contention on the compute unit, the one or more conditions comprising at least one of compute unit stall cycles, cache miss rates, memory access latency, and link utilization (see [0097]; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. Also see “When the cache miss rate 
generate a first measure of resource contention based on the one or more conditions being monitored (see [0097]; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
reduce number of parallel execution threads responsive to determining that the first measure of resource contention is greater than a first threshold, wherein: threads from the reduced number of parallel execution threads are prevented from being scheduled for execution on the compute unit (see [0100]; “the cache misses as counted by counter 840 exceed the upper count value of count threshold 850, core 810 can remove layers of parallelism by switching from a higher SMT mode to a lower SMT mode” and “overall computes may be increased by reducing parallelism and the contention for a contested resource. In this case, the number of parallel threads could be reduced”).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the warp scheduler in the compute unit of GPU to schedule executions of groups of wavefronts having threads from the combination of Lee, Hsu, Yudanov and Kakadia by including a method of increasing or decreasing the number of parallel thread executions in responsive to different resource contention situations as taught by Fontenot, and thus the new combination system would teach some portions of the missing limitations mentioned above (note: at the new combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed the upper count value of count threshold 850, reduce number of scheduling groups. Since such reducing is reducing the parallel execution, and thus the wavefronts from the reduced scheduling groups would be prevented from being scheduled for execution on the GPU compute unit. In addition, since every scheduling group comprises wavefronts from different kernels having same priority, and thus the wavefronts from the reduced scheduling group would comprises wavefronts from different kernels), since it would provide a method of efficiently utilizing resources of the system via increasing parallel thread executions number when resource contention is low and decreasing parallel thread executions number when resource contention is high (see [0099]-[0100] from Fontenot).   

The combination of Lee, Hsu, Yudanov, Kakadia and Fontenot does not disclose: reduce number of scheduling groups is move a lowest priority scheduling group into a descheduled queue.
However, Sollars discloses: a method of reducing allocated thread number comprising: move a lowest priority allocated thread into a descheudled queue (see lines 15-29 of col. 12; “deallocates and queues the lowest priority allocated context”. In addition, “If all context level control register sets 104 have been allocated” from lines 15-29 of col. 12 also indicates a resource contention is reached a threshold level).
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of reducing the number of parallel thread executions when the resource contention is reached a threshold level as taught by the combination of Lee, Hsu, Yudanov, Kakadia and Fontenot by including a method of moving lowest priority allocated thread/context into a queue which holes deallocated thread/context 

Regarding to Claim 19, the rejection of Claim 15 is incorporated and further the combination of Lee, Hsu, Yudanov, Kakadia, Fontenot and Sollars discloses: 
wait a given amount of time after moving the lowest priority scheduling group into the descheduled queue (see Fig. 13, [0097], [0145]-[0146] from Fontenot; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. After performing one action of reducing parallel thread execution number, i.e., moving the lowest priority scheduling group into descheduled queue, at the combination system, the method would return step/action of monitoring the cache misses over a period of time to detect the next trigger point to perform step 1320 of Fig. 13);
generate a second measure of resource contention based on the one or more conditions being monitored (see Fig. 13, [0097], [0145]-[0146] from Fontenot and the analysis of previous limitation; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
move a next lowest priority scheduling group into the descheduled queue responsive to determining the second measure of resource contention is greater than the first threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee, [0100] from Fontenot and lines 16-29 of col. 12 from Sallars. At the combination system, when the cache misses as counted by counter the current lowest priority scheduling group, i.e., the previous next lowest priority scheduling group, into the descheduled queue, and thus the wavefronts from the scheduling groups stored in the descheudled queue are prevented from being scheduled for execution on the GPU compute unit).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (title: Improving GPGPU resource utilization through alternative thread block scheduling-recorded on IDS submitted 9/19/2019, hereafter Lee) in view of Hsu et al. (US PGPUB 20160055005 A1, hereafter Hsu), Yudanov et al. (US PGPUB 20160371082 A1, hereafter Yudanov), Kakadia et al. (US PGPUB 20150208275 A1, hereafter Kakadia), Fontenot et al. (US PGPUB 20110302372 A1, hereafter Fontenot) and Sollars (US Patent 5900025 A) and further in view of Otenko (US PGPGPUB 20150212794 A1).
Hsu, Yudanov, Fontenot, Sollars and Otenko were cited on the previous office action.

Regarding to Claim 20, the rejection of Claim 15 is incorporated and further the combination of Lee, Yudanov, Hsu, Kakadia, Fontenot and Sollars discloses:
wait a given amount of time after moving the lowest priority scheduling group into the descheduled queue (see Fig. 13, [0097], [0145]-[0146] from Fontenot; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”. After performing one action of reducing parallel thread execution number, i.e., moving the lowest priority scheduling group into descheduled queue, at the combination system, the method would return step/action of 
generate a second measure of resource contention based on the one or more conditions being monitored (see Fig. 13, [0097], [0145]-[0146] from Fontenot and the analysis of previous limitation; “Counter 840 is a counter that tracks a number of cache misses in cache 822 over a period of time”); and
increasing number of scheduling groups responsive to determining the second measure of resource contention is less than a second threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee and [0099] from Fontenot; “should the cache misses as counted by counter 840 exceed the lower count value of count threshold 850, core 810 can add additional layers of parallelism by switching from a lower SMT mode to a higher SMT mode” and “overall computes may be increased by increasing parallelism and the contention for a contested resource. In this case, the number of parallel threads could be increased”. At the combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed lower count value of count threshold 850, then the combination system would increase number of parallel execution threads, i.e., increase number of scheduling groups).

The combination of Lee, Hsu, Yudanov, Kakadia, Fontenot and Sollars does not disclose: increasing number of scheduling groups is move a highest priority scheduling group out of the descheduled queue.
However, Otenko discloses: a method of increasing allocated tasks comprising: move a highest priority task out of the descheudled queue (see [0023]; “When the contention level is low in the system, the underlying priority queue 101 can sort the requests waiting in the priority 
It would have been obvious to one with ordinary skill, in the art before the effective filling date of the claim invention, to modify the method of increasing the number of parallel thread executions when the resource contention is low as taught by the combination of Lee, Hsu, Kakadia, Fontenot and Sollars by including a method of scheduling the highest priority tasks in a wait queue having tasks to be scheduled as soon as possible as taught by Otenko, since it is understood to scheduling a highest priority task first instead of a lower priority task (see [0023] from Otenko. Also see lines 11-16 of right side of page 5 from Lee).
Thereby, the combination of Lee, Hsu, Yudanov, Kakadia, Fontenot, Sollars and Otenko discloses: move a highest priority scheduling group out of the descheduled queue responsive to determining the second measure of resource contention is less than a second threshold (see lines 7-15 of 1st paragraph of 1. Introduction from Lee, [0100] from Fontenot, lines 16-29 of col. 12 from Sollars and [0023] from Otenko. At the combination system, when the cache misses as counted by counter 840 for one GPU compute unit exceed lower count value of count threshold 850, then the combination system would increase number of parallel execution threads, i.e., increase number of scheduling groups, by moving the highest priority scheduling group out of the descehduled queue).

Response to Arguments
Applicant’s arguments, filled 3/3/2022, with respect to rejections of Claims 1-20 under 35 U.S.C. 103 have been full considered. New grounds of rejections were made based on the amended limitations from the independent claims and corresponding arguments from pages 10-11 of the Remarks. Applicant’s arguments for dependent Claims, 4, 11 and 18 are not persuasive.

Applicant’s arguments at pages 12-13 are summarized as the following:
For dependent Claims 4, 11 and 18, “Sollars is directed toward reallocating a context level control register set used for controlling concurrent execution of processes. Therefore, a lowest priority context/thread is a thread that is already executing, rather than a scheduling group” (see 1st paragraph of page 12 from the Remarks). In addition, “Reallocating a context level control register set used for controlling concurrent execution of processes is not equivalent to ‘move a lowest priority scheduling group into a descheduled queue’” (see 1st paragraph of page 13 from the Remarks).

The examiner respectively disagrees.
First of all, it is not clear that Applicant would like to argue about the lowest priority context/thread from Sollars is only one single task/process while the scheduling group from the claim is multiple tasks/processes OR Applicant would like to argue about the lowest priority context/thread from Sollars is task/process that is already executing while the scheduling group from the claim is not task/process that is already executing (or it is task/process that not yet executed). If what Applicant try to argue about is the single task/process versus multiple tasks/processes, then the corresponding claim limitations are then such interpretation does not comply the corresponding descriptions from the specification. Based on the specification (see [0027] and [0037]-[0039]) and the claim language, the reason to moving the lowest priority scheduling group into a descheduled queue is to reduce the resource contention. If the lowest priority scheduling group is group of wavefronts that are not yet executing, then such group does not use or even reserve any resource yet. Moving such group of wavefronts will not reduce or improve the current status or state of resource contention. Actually, performing such action of moving such not yet executing wavefronts to descheduled queue does not 

For the issue of “Reallocating a context level control register set used for controlling concurrent execution of processes is not equivalent to ‘move a lowest priority scheduling group into a descheduled queue’”, Examiner did not understand why would Applicant consider examiner used concept of “Reallocating a context level control register set used for controlling concurrent execution of processes” to map to claimed “move a lowest priority scheduling group into a descheduled queue” at the corresponding rejection section. According to the corresponding rejection section, one with ordinary skills in the art would understand examiner used “deallocates and queues the lowest priority allocated context” from Sollars to map to the claimed concept related to “move a lowest priority scheduling group into a descheduled queue” instead of the reallocating as Applicant argued about. If such series actions of deallocating and queuing are considered as reallocating as Applicant argued about, then the claimed moving to the descheduled queue according to the corresponding descriptions from Applicant’s specification or the context of claimed limitation is also considered as deallocating and queuing the lowest priority scheduling group to the descheduled queue. 





Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Joao et al. (US PGPUB 20180276046 A1) discloses: instruction is stalled due to insufficient capacity of a given hardware resource or on a level of utilization of a give hardware resource (see [0034]). 
Wang et al. (US PGPUB 20020073129 A1) discloses: a process of preventing a user program from being processed by a second set of scheduler components if a first set of scheduler components determines that the operating system is able to provide enough resource for the user program (see [0038]).
Dutta et al. (US PGPUB 20150200867 A1) discloses: the scheduler attempts to schedule other tasks if insufficient computing resources are available for the first tasks (see [0055]).
Pal et al. (US PGPUB 20190258635 A1) discloses: executing a second task having lower priority level if there is insufficient resource to execute a first task having higher priority level (see [1344]).


Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZHI CHEN whose telephone number is (571)272-0805.  The examiner can normally be reached on Monday-Friday 9:30AM-5PM.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Zhi Chen/
Patent Examiner, AU2196

/EMERSON C PUENTE/Supervisory Patent Examiner, Art Unit 2196