Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lindholm et al., US Patent 9,304,775 (hereinafter Lindholm) in view of Chen et al., US Patent Application Publication 2009/0172686 (hereinafter Chen).
	Regarding claim 1, Lindholm teaches:
A graphics processing unit (see e.g. col. 3 lines 23-26, col. 4 lines 50-55) comprising: an instruction execution pipeline including hardware execution logic (see e.g. col. 4 lines 50-55), the hardware execution logic including hardware thread slots distributed across multiple hardware units (see e.g. col. 7 line 46 – col. 8 line 43, thread slots in processing engines); a thread dispatcher to process a set of commands for execution and distribute threads for the set of commands to the hardware execution logic to execute the set of commands (see e.g. col. 7 line 63 – col. 8 line 43, work  and while in the second dispatch mode: divide the threads for the set of commands into multiple groups of threads, wherein threads within a group of threads have a temporally local memory access pattern across a specific set of memory regions (see e.g. col. 7 line 46 – col. 8 line 43, threads are executing at the same time and therefore have access to shared memory at the same time); concurrently distribute a first group of the multiple groups of hardware threads to hardware thread slots of the hardware execution logic (see e.g. col. 7 line 63 – col. 8 line 11); and withhold distribution of threads of a second group of the multiple groups of threads until after the first group completes execution (see e.g. col. 8, additional thread groups are not activated until a current group completes - “A SIMD thread group may include fewer than P threads, in which case some of the processing engines of the particular type will be idle during cycles when that SIMD thread group is being processed”, “When execution of a thread or SIMD thread group is completed, core 208 advantageously notifies core interface 303. Core interface 303 can then initiate other processes, e.g., to retrieve output data from shared memory 306 and/or to prepare core 208 for execution of additional threads or SIMD thread groups”). 
Lindholm fails to explicitly teach withholding distribution of threads to hardware thread slots that are available to execute threads of the second group.

The scheduling applied to single threads in Chen readily applies to thread groups in Lindholm. Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the teachings of Lindholm and Chen to withhold distribution of threads to hardware thread slots that are available to execute threads of the second group. This would have provided an advantage of preventing the altering of data by other threads to prevent incorrect computations such as discussed by Chen (see para. [0015]).
Regarding claim 2, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 1, additionally including a command streamer to provide the set of commands to the instruction execution pipeline (see e.g. col. 7 line 63 – col. 8 line 43). 
Regarding claim 3, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 2, wherein the thread dispatcher is to concurrently distribute one or more threads of the first group to each available hardware unit within the hardware execution logic (see e.g. col. 7 line 63 – col. 8 line 43). 
Regarding claim 4, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 2, wherein the thread dispatcher is to divide threads of first group among available hardware units within the hardware execution logic and concurrently distribute a pre-determined number of threads from the 
Regarding claim 5, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 4, the thread dispatcher additionally to concurrently distribute threads from the first group to available hardware units within the hardware execution logic up to a maximum number of threads per bank of a cache memory associated with the hardware execution logic (see e.g. fig. 1, col. 7 lines 16-45, a cache includes at least one bank and the number of hardware units necessarily impose a maximum). 
Regarding claim 6, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 1, each thread including one or more work items to be performed by the hardware execution logic, wherein the first group is complete when the one or more work items of each of the threads of the first group is complete (see e.g. Lindholm col. 7 line 63 – col. 8 line 43, Chen para. [0015], [0033]). 
	Regarding claim 7, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 1, wherein during execution of the first group, a cache memory of the graphics processing unit is to cache data associated with the first group according to a first memory access pattern and thread dispatcher is to withhold distribution of the second group to prevent contention for cache resources between the first group and the second group due to differing memory access pattern between the first group and the second group (see e.g. Lindholm col. 7 line 16 – col. 8 line 43, Chen para. [0015], [0033]). 
	Regarding claim 8, Lindholm in view of Chen teaches or suggests:

	Regarding claim 9, Lindholm in view of Chen teaches or suggests:
The graphics processing unit as in claim 8, the thread dispatcher to concurrently distribute the second group of the multiple groups of threads to available hardware thread slots of the hardware execution logic after the first group is complete (see e.g. col. 7 line 63 – col. 8 line 43).
	Claims 10-14 are rejected for reasons corresponding to those given above for claims 1-9.
Claims 15-20 are rejected for reasons corresponding to those given above for claims 1-9.

Response to Arguments
Applicant's arguments filed 2/1/21 have been fully considered but they are not persuasive.
	Applicant argues Lindholm does not teach or suggest different “modes”.
Examiner respectfully disagrees. A mode is simply a particular way of (or arrangement for) doing something. Applicant has not claimed any control to place the system in a certain mode, or any indication that a mode is exclusively active or enabled 
Applicant argues a lack of teaching of “divide the threads for the set of commands into multiple groups of threads, wherein threads within a group of threads have a temporally local memory access pattern across a specific set of memory regions”

Examiner respectfully disagrees. Applicant has attempted to differentiate the claimed “temporally local memory access pattern” without providing a definition or explanation of the term. The specification also fails to provide a definition of the term, and only provides one example of what may be considered a temporally local access pattern (see para. [00160], “…a memory access pattern that is temporally local (e.g., the threads access generally the same memory addresses at generally the same time)”). Therefore, for the purposes of examination, this term has been given its broadest reasonable meaning consistent with the specification of using/accessing a resource (in this case memory) within a relatively small duration. Lindholm teaches threads that are executing at the same time and therefore have access to shared memory at the same time (within a relatively small duration), see e.g. col. 7 line 46 – col. 8 line 43. 


In response to Applicant's argument that Chen cannot be combined with Lindholm, the test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference; nor is it that the claimed invention must be expressly suggested in any one or all of the references.  Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981).
Further, in response to Applicant’s argument that Chen teaches “a different solution”, Examiner disagrees that the aspect relied upon in Chen is incompatible with Lindholm. Simply because a reference contains multiple teachings does not prevent a specific one of those teachings from being relied upon and applied in the combination of references. Chen was relied upon to teach restricting the access of threads to a shared resource until execution of a certain thread is fully completed, even if the resource is temporarily available while the certain thread is idle (see e.g. para. [0015], [0033]).
Regarding claim 7, Applicant argues a lack of teaching of “wherein during execution of the first group, a cache memory of the graphics processing unit is to cache data associated with the first group according to a first memory access pattern and thread dispatcher is to withhold distribution of the second group to prevent contention for cache resources between the first group and the second group due to differing memory access pattern between the first group and the second group” 

Examiner respectfully disagrees. Chen describes that “the shared resource released by the thread is prevented from being retrieved by other threads during the idle period rather than altering internal data of the thread under execution, resulting in obtaining incorrect computation data and incorrect computation results” (see e.g. Chen para. [0015]). Incorrect results in this context would occur with a second thread writing 



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN M LINDLOF whose telephone number is (571)270-1024.  The examiner can normally be reached on M-F 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 5712724169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact 






/JOHN M LINDLOF/           Primary Examiner, Art Unit 2183