DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	This Office Action is in response to the Applicants' communication filed on February 24, 2020.  In virtue of this communication, claims 1-21 are currently presented in the instant application.

Drawings
3.	The drawings submitted on February 24, 2020. These drawings are reviewed and accepted by the examiner.

Information Disclosure Statement
4.	The information Disclosure Statement (IDS) Form PTO-1449, filed on February 24, 2020 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosed therein was considered by the examiner.

Priority
5.	Receipt is acknowledged of paper submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Claim Rejections - 35 USC § 102
6.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


7.	Claims 1-4, 7 and 9-12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated Abdel-Majeed, Mohammed et al.: "Warped gates: Gating aware scheduling and power gating for GPGPUs", 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture, 12/7/13, (hereinafter “Mohammad”).
Regarding claim 1.  Mohammad discloses a graphics multiprocessor (Mohammad, see at least page 1, column 2, first par. and lines 1-3, “Graphics processing units (GPUs) are massively parallel processors that are designed to run workloads with thousands of concurrent threads”), comprising: 
a queue having an initial state of groups with a first group having threads of first and second instruction types and a second group having threads of the first and second instruction types (Mohammad, see page 4, column 2 and the first 5 lines of the last paragraph, Figure 4 illustrates the shortcomings of current warp scheduling and its implication on power gating techniques. In this simplified illustration, the active warps set contains 10 warps with a mix of integer and floating point instructions”); and 

    PNG
    media_image1.png
    538
    743
    media_image1.png
    Greyscale

Mohammad does not discloses “a regroup engine to regroup threads into a third group having threads of the first instruction type and a fourth group having threads of the second instruction type”.  However, Camparan discloses:
a regroup engine to regroup threads into a third group having threads of the first instruction type and a fourth group having threads of the second instruction type (Mohammad, page 2, column 1, third paragraph, lines 1-6“Gating Aware Two-level warp scheduler: To address the inefficiencies of GPGPU scheduler in extracting idle periods, we present a gating-aware Two-level warp scheduler (GATES). GATES prioritizes issuing clusters of instructions that require the same type of execution unit for longer intervals before switching to a new instruction type”).

Regarding claim 2.  Mohammad further discloses wherein the regroup engine to cause the third group to replace the first group in the queue and the fourth group to replace the second group in the queue having a regrouped state (Mohammad, see page 5, column 1, paragraph 2 and lines 7-17, Two-level scheduler (GATES) which takes into account previously issued instruction types in determining which ready warp to issue next. GATES prioritizes issuing the same instruction type as was issued in prior issue cycle to coalesce the utilization and idle periods of integer and floating point units. GATES will keep issuing instructions from the same type as long as there are ready warps in the active warps set. GATES switches to a warp with different instruction type when there are no more ready warps in the active warps set with the same instruction type as the one issued in the previous issue cycle). 

Regarding claim 3.  Mohammad further discloses wherein each of the first instruction type and the second instruction type comprise one of a load/store instruction, an integer instruction, a floating point instruction, an integer mac instruction, an integer add instruction, a floating point add instruction, a floating point fma instruction, a floating point sine instruction, or a floating point cosine instruction (Mohammad, see page 3, column 1, first paragraph, lines 3-7, “The decoded instruction field includes the instruction type that determines which execution unit type that instruction requires for execution, namely an integer unit (INT), floating point unit (FP), special purpose functional unit (SFU), or load/store unit (LD/ST)”). 

Regarding claim 4.  Mohammad discloses a thread scheduler coupled to the queue; and a plurality of execution units coupled to the thread scheduler (Mohammad, see page 3, column 1 and the last 4 lines of the second paragraph, “In GTX480, two schedulers are integrated in each SM and each scheduler can issue one ready warp per cycle as long as there are no structural hazards”).

Regarding claim 7.  Mohammad further discloses wherein the regroup engine utilizes regrouping policies and an order that a new regrouped group is inserted in the queue is optimized depending on latencies (Mohammad, see page 4, paragraph 2 and lines 2-8, “The order of instructions in the set is shown in the figure at the top. We assume each instruction is a simple add instruction,
each instruction has a latency of four cycles, and initiation interval is one cycle. These are the default parameters in GPGPUSim’s configuration file for Fermi [6]. The Twolevel”). 

Regarding claim 9. Mohammad discloses a graphics processor, comprising: 
a queue having a first group of first and second instruction types and a second group having second and third instruction types (Mohammad, see page 4, column 2 and the first 5 lines of the last paragraph, “Figure 4 illustrates the shortcomings of current warp scheduling and its implication on power gating techniques. In this simplified illustration, the active warps set contains 10 warps with a mix of integer and floating point instructions”); and 
a regroup engine to regroup threads into a modified first group of the first instruction type, a modified second group of the second instruction type, and a third group of the third instruction type (Mohammad, page 2, column 1, third paragraph, lines 1-6“Gating Aware Two-level warp scheduler: To address the inefficiencies of GPGPU scheduler in extracting idle periods, we present a gating-aware Two-level warp scheduler (GATES). GATES prioritizes issuing clusters of instructions that require the same type of execution unit for longer intervals before switching to a new instruction type” and page 5, colume 2, paragraph 3, lines 1-5, “Per instruction type active warps subset: Since GATES prioritizes issuing instructions of a specific type, we propose to logically split the active warp set into four active warp subsets, namely integer (INT), floating point (FP), special function unit (SFU) and load/store (LDST) subsets”). 

Regarding claim 10.  Mohammad further discloses wherein the regroup engine to cause the modified first group to replace the first group in the queue and the modified second group to replace the second group in the queue (Mohammad, see page 5, column 4, lines 1-8 “Instruction issue priority: The instruction issue arbiter inside the warp issue logic is modified with a simple priority-based issuing algorithm which assigns each instruction type an issue priority. We ordered the instructions in our implementation as: INT/FP, LDST, SFU, FP/ INT.  The ordering implies that either INT or FP is given the highest priority first. If INT is given the highest priority, then FP will be given the lowest priority and vice-versa.”). 

Regarding claim 11.  Is rejected for the same rationale of claim 3. 
Regarding claim 12. Is rejected for the same rationale of claim 4.

Claim Rejections - 35 USC § 103
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	Claim 5 is rejected under 35 U.S.C. 103 as being obvious over Mohammad Abdel-Majeed et al. (“Warped Gates:  Gating Aware Scheduling and Power Gating for GPGPUs”, hereinafter “Mohammad”) as applied claim 1 above and further in view of Comparan et al. (US 20090113181 A1, hereinafter “Comparan”).
Regarding claim 5. Mohammad does not disclose “wherein the thread scheduler is configured to schedule the first instruction type of the third group for execution on a first execution unit with full utilization of this first execution unit”.  However, Comparan discloses wherein the thread scheduler is configured to schedule the first instruction type of the third group for execution on a first execution unit with full utilization of this first execution unit (Comparan, see at least par. [0017], “Issuing a common issue group including an instruction of a first type and an instruction of a second type may ensure that each of the execution units in the processor core are being utilized to execute instructions in each issue group (thereby increasing utilization of the processing circuitry in the processor core). As described above, the instructions may be retrieved from multiple threads being executed in the processor, thereby increasing the chances that an appropriate instruction for each execution unit is provided in the common issue group”). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the instruction types of Mohammad to include “wherein the thread scheduler is configured to schedule the first instruction type of the third group for execution on a first execution unit with full utilization of this first execution unit”, as taught by Comparan, thereby “to provide the most realistic experience to the video game player, there may be a desire to have each thread perform a given function (e.g., one thread may draw a three-dimensional scene, also referred to as rendering while another thread performs a physics calculation) requiring a certain amount of processing power for a set amount of time” (Comparan, see par. [0006]).

10.	Claim 6 is rejected under 35 U.S.C. 103 as being obvious over Mohammad Abdel-Majeed et al. (“Warped Gates:  Gating Aware Scheduling and Power Gating for GPGPUs”, hereinafter “Mohammad”) as applied claim 4 above and further in view of Gowin, Jr. et al. (US 6606721 B1, hereinafter “Gowin”).
Regarding claim 6.   Mohammad discloses the thread scheduler, but does not explicitly disclose “wherein the thread scheduler is configured to schedule the second instruction type of the fourth group for execution on a second execution unit with full utilization of this second execution unit”.  However, Gowin discloses wherein the thread scheduler is configured to schedule the second instruction type of the fourth group for execution on a second execution unit with full utilization of this second execution unit (Gowin, see at least col. 8, lines 39-41 and 66-67, “The present invention tracks the current utilization of the floating point execution unit, which enables the instruction packer of the test generator to select a floating point instruction only when the floating point unit is free, as indicated by the relevant floating-point execution unit data structure…  Those skilled in the art will recognize that, while neither the floating point execution unit utilization nor the values in or utilization of specific general purpose registers may be reflected in the defined architectural state within the golden model, tracking these system resources to aid in the efficient scheduling of test instructions is highly useful”). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of claimed invention to combine the instruction types of Mohammad, to include “wherein the thread scheduler is configured to schedule the second instruction type of the fourth group for execution on a second execution unit with full utilization of this second execution unit”, as taught by Gowin, in order to “create a test environment that is more efficient and conducts a more robust test of very complex sequential circuits such as processors, the industry has moved toward the use of dynamically generated, biased pseudo-random test patterns. In dynamic testing, instructions are generated, all processor resources and facilities needed for executing the instruction are identified and initialized if required, the instruction is executed on a simulated processor, and the simulated processor state is updated to reflect the execution results”, (Gowin, see col. 2, lines 4-12).

Warped Gates:  Gating Aware Scheduling and Power Gating for GPGPUs”, hereinafter “Mohammad”) as applied claim 1 above BOYER et al. (US 20190370173 A1, hereinafter “BOYER”).
Regarding claim 8.  Mohammad discloses the regroup engine, but does not disclose explicitly discloses wherein the regroup engine utilizes regrouping policies to minimize divergence between threads.  BOYER discloses wherein the regroup engine utilizes regrouping policies to minimize divergence between threads (BOYER, see at least par. [0018], “Memory divergence also occurs between threads that hit in the cache and threads that miss in the cache. One suggestion for minimizing memory divergence for wavefronts that mostly include threads that miss in the cache is to artificially convert all of the threads in the mostly-miss wavefront to cache misses by bypassing the cache, thereby conserving bandwidth to the cache. Another suggestion is to modify cache insertion and eviction policies to control whether requests from subsequent threads will hit or miss in the cache”). 

Allowable Subject Matter
12.	Claim 13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:

Regarding claim 13. The graphics multiprocessor of claim 12, wherein the scheduler is configured to schedule the first instruction type of the modified first group for execution on a first execution unit with full utilization of this first execution unit, wherein the scheduler is configured to schedule the second instruction type of the modified second group for execution on a second execution unit with full utilization of this second execution unit. 

13.	Claims 14-21 are allowed.
The following is an examiner’s statement of reasons for allowance:
Consider independent claims 14 and 19 the best prior arts found of record during the examination of the present application.
In view of the present application, the prior arts made of record and considered pertinent to the applicant’s disclosure does not teach or suggest the claimed limitations.
Per claims 14 and 19, the cited prior arts, taken individually or in combination, do not teach the cited claim limitations having the following limitations:
execution units to execute threads; and 
thread control circuitry coupled to the execution units, the thread control circuitry is configured to determine groupings of instantiated threads, to dynamically determine progress of the threads for executing a task on an execution unit, and to determine drift between threads.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIM THANH THI TRAN whose telephone number is (571)270-1408. The examiner can normally be reached Monday-Friday 8:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JENNIFER MEHMOOD can be reached on 5712722976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-





/KIM THANH T TRAN/Examiner, Art Unit 2612                                                                                                                                                                                                        
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2612