DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
This Application is a continuation of and claims the benefit of and priority to U.S. Application No. 16/599,175, filed October 11, 2019, which is a continuation of and claims the benefit of and priority to U.S. Application No. 15/477,030, filed April 01, 2017.

Preliminary Amendment
The preliminary amendment submitted on 01/14/2021 has been acknowledged. Claims 1-20 are cancelled and claims 21-40 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted is considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 21-40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dulik, JR. et al.  (US Publication Number 2013/0268942 A1, hereinafter “Duluk”).

(1) regarding claim 21:
As shown in fig. 1, Duluk disclosed an apparatus (para. [0023], note that FIG. 1 is a block diagram illustrating a computer system 100) comprising: 
one or more processors including a graphics processor to process data (para. [0024], note that  the parallel processing subsystem 112 incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry, and constitutes a graphics processing unit (GPU)), the one or more processors including one or more units to process a plurality of shader threads (para. [0035], note that GPCs 208 can be programmed to execute processing tasks relating to a wide variety of applications, including but not limited to image rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or pixel shader programs i.e. shader threads));
a memory to store data and instructions for processing (para. [0035], note that PPUs 202 may transfer data from system memory 104 and/or local parallel processing memories 204 into internal (on-chip) memory, process the data); and 
a local shader data cache (para. [0084], note that FIG. 5A illustrates a virtual address space 510 allocated by device driver 103 as shader local memory, according to one example embodiment of the present disclosure. Shader local memory is a per -thread private data storage stored in PP memory 204 and directly addressable by LSUs 303 via L1 cache 320); 
wherein the one or more processors are to: 
detect and prepare the local shader data cache (para. [0088], note that LSUs 303 may address the virtual memory addresses in shader local memory directly via L1 cache 320. In the event of a cache miss, a memory page will be fetched from PP memory 204 to the L1 cache 320 via memory interface 214), fetch shader data for one or more of the plurality of shader threads (para. [0089], note that each TMD 322 may require a vastly varying amount of resources depending on the particular programs (i.e., thread groups or warps) stored in the CTAs associated with the TMD 322. One TMD 322 may include a warp that requires very little shader local memory per thread). 
Duluk disclosed most of the subject matter as described as above except for specifically teaching the shader data to be fetched at the local shader data cache, and cache the fetched shader data in the local shader data cache.
However, it would be obvious for Duluk teach the shader data to be fetched at the local shader data cache, and cache the fetched shader data in the local shader data cache (para. [0047], note that each SM 310 also has access to level two (L2) caches that are shared among all GPCs 208 and may be used to transfer data between threads… Additionally, a level one-point-five (L1.5) cache 335 may be included within the GPC 208, configured to receive and hold data fetched from memory via memory interface 214 requested by SM 310, including instructions, uniform data, and constant data, and provide the requested data to SM 310. Also see para. [0085], Each SM 310 that is activated within PPU 202 is allocated a certain amount of shader local memory (511, 512, etc.).). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art for Duluk to teach the shader data to be fetched at the local shader data cache, and cache the fetched shader data in the local shader data cache. The suggestion/motivation for doing so would have been in order to efficiently allocate memory in parallel processors (para. [0006]). Therefore, it would have been obvious for Duluk to obtain the invention as specified in claim 1.

(2) regarding claim 22:
Duluk further disclosed the apparatus of claim 21, wherein the one or more processors are further to provide the cached shader data from the local shader data cache in shader thread processing (para. [0045], note that a "thread group" refers to a group of threads concurrently executing the same program on different input data, with one thread of the group being assigned to a different processing engine within an SM 310. A thread group may include fewer threads than the number of processing engines within the SM 310, in which case some processing engines will be idle during cycles when that thread group is being processed. Also see para. [0086]). 

(3) regarding claim 23:
Duluk further disclosed the apparatus of claim 22, wherein the cached shader data is available for processing of each of the plurality of shader threads (para. [0086], note that each allocated shader local memory (511, 512, etc.) includes three partitions divided between all of the threads scheduled on the corresponding SM 310--local memory high 521, local memory low 522, and a call return stack 523. The size of these partitions may be allocated by device driver 103 using a predetermined ratio based on the size of shader local memory allocated to each SM 310). 

(4) regarding claim 24: 
Duluk further disclosed the apparatus of claim 21, wherein the fetched shader data includes one or more push constants to be pushed to the plurality of shader threads in processing (para. [0047], note that a level one-point-five (L1.5) cache 335 may be included within the GPC 208, configured to receive and hold data fetched from memory via memory interface 214 requested by SM 310, including instructions, uniform data, and constant data, and provide the requested data to SM 310).

(5) regarding claim 25:
Duluk further disclosed the apparatus of claim 21, wherein the shader threads include three-dimensional (3D) shader threads (para. [0052], note that the thread ID, which can be defined as a one-dimensional or multi-dimensional numerical value controls various aspects of the thread's processing behavior). 

(6) regarding claim 26:
Duluk further disclosed the apparatus of claim 21, further comprising a shared local memory (SLM), the local shader data cache comprising at least a portion of the SLM (para. [0084], note that FIG. 5A illustrates a virtual address space 510 allocated by device driver 103 as shader local memory, according to one example embodiment of the present disclosure. Shader local memory is a per -thread private data storage stored in PP memory 204 and directly addressable by LSUs 303 via L1 cache 320). 

(7) regarding claim 27: 
Duluk further disclosed the apparatus of claim 21, wherein the graphics processor is co-located with an application processor on a common semiconductor package (para. [0106], note that non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored).

The proposed rejection of Duluk, as explained in apparatus claims 21-27, renders obvious the steps of the method (fig. 6) of claims 28-34 and the machine-readable medium (para. [0106]) claims 35-40 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 21-27 are equally applicable to claims 28-40.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Linda (US Publication Number 2014/0006838 A1) disclosed methods and apparatus relating to dynamic intelligent allocation and utilization of package maximum operating current budget are described.

Any inquiry concerning this communication or earlier communication from the examiner should be directed to Hilina K Demeter whose telephone number is (571) 270-1676. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu could be reached at (571) 272- 7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about PAIR system, see http://pari-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HILINA K DEMETER/Primary Examiner, Art Unit 2674