DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

 Response to Amendment
The amendment filed September 1, 2022 has been entered.  Claims 1, 2, and 4-20 remain pending in this application. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 4, 10, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Du et al. (US 2017/0364440) in view of Iyer et al. (US 2020/0341660), Vaidyanathan et al. (US 2018/0293703), and Dostert et al. (US 2006/0143608).
Regarding claim 1, Du teaches an apparatus to facilitate partitioning of local memory (Fig. 1, processors 102, see also [0153]), comprising:
A plurality of execution units to execute a plurality of execution threads (Figs. 3- 8 discuss embodiments/operations of a graphics processor, where each depicts/discusses using execution units for operation, see [0043, 0049, 0050, 0052, 0058, 0062, 0063, 0076, 0078] ; regarding the units executing threads, see “the pipelines send thread execution requests to 3D/Media subsystem 315, which includes thread dispatch logic for arbitrating and dispatching the various requests to available thread execution resources. The execution resources include an array of graphics execution units to process the 3D and media threads” [0049]; “The 3D and media pipelines process the commands by performing operations via logic within the respective pipelines or by dispatching one or more execution threads to execution unit array 414,” [0052], “In some embodiments, each execution unit (e.g. 608A) is an individual vector processor capable of executing multiple simultaneous threads and processing multiple data elements in parallel for each thread. In some embodiments, execution unit array 608A-N includes any number of individual execution unit,” [0063], “vertex fetcher 805 and vertex shader 807 execute vertex-processing instructions by dispatching execution threads to execution units 852A, 852B,” [0078]);
A memory device coupled to share access between the plurality of execution units (Fig. 12 level-3 cache 875 within processor 102, where “cache memory 875 has a portion characterized as SLM (e.g., SLM 206-1) which is shared with a plurality of EUs,” [0116], where Fig. 12 shows SLM’s within a super SLM within the level-3 cache, see also [0115,0128]); and
Partitioning circuitry to initiate the memory device to be used as a cache (“In another example, an apparatus is provided which comprises: means for grouping two or more work groups to form a super-workgroup; and means for partitioning a portion of a memory space into one or more Super-SLMs,” [0153], where [0110] describes embodiments of the means for performing different functions, including hardware circuitry; as Du discloses in [0115-0117] that cache 875 is partitioned into super-SLM’s and SLM’s, then the cache is necessarily initiated as a cache) and receive a thread dispatch including a command (“thread execution logic 600 includes local thread dispatcher 604 that arbitrates thread initiation requests from the graphics and media pipelines and instantiates the requested threads on one or more execution units 608A-N,” [0068], where [0085] depicts these thread requests from the pipelines as commands, teaching that execution logic is able to receive thread requests including commands) and partition the first portion of the memory device as SLM (Du discloses in [0115-0117] that cache 875 is partitioned into super-SLM’s and SLM’s, where, “In traditional OpenCL memory structure, workgroups share a respective Shared Local Memory (SLM). Workgroups consist of a defined number of work items. These work items are executed by execution units,” [0112], teaching that SLM’s are allocated/partitioned for thread execution).
Du fails to teach where the command indicates a size of shared local memory (SLM) blocks to allocate a first portion of the memory device to be used as SLM.  Consequently, Du also fails to teach where the partitioned first portion of the memory device has a size based on the size of SLM blocks indicated in the command. Finally, Du also fails to teach where the partitioned circuitry remaps the first portion of the memory device from being used as SLM to be used as the cache in response to the thread dispatch being cleared. 
Iyer’s disclosure is related to memory allocation for threads executing on virtual machines and as such comprises analogous art as directed to the same field of endeavor as the claimed invention’s memory management.
Iyer discloses that threads transmit requests for memory allocation, Fig. 6, step 602, where the controller device receives this request and allocates a portion of the memory to the thread, Fig. 6 step 604.  More particularly, Iyer discloses that the memory request includes code which identifies a size of memory fabric requested for allocation, see [0046].
An obvious modification can be identified: incorporating Iyer’s thread command for memory allocation.  Such a modification reads upon where the thread requests indicate a size of blocks for allocation, as well as where the partitioning of the memory device for SLM is based on the size indicated in the command.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate Iyer’s allocation commands into Du’s system, as this allows for a more versatile allocation system, where SLM’s can be allocated based on thread requirements instead of a static size allocation.
The combination of Du and Iyer fails to teach where the partitioning circuitry remaps the first portion of the memory device from being used as SLM to be used as the cache in response to the thread dispatch being cleared.
Vaidyanathan’s disclosure relates to a graphics apparatus with local memory and allocating memory for different forms of access and as such comprises analogous art.
Vaidyanathan discloses a system that includes shared local memory, see Fig. 6 local memory 614 discussed in [0129] and Fig. 7, shared local memory 721 discussed in [0133].  Vaidyanathan’s disclosure provides that some embodiments can use the SLM for other purposes, including as a cache specifically, while the SLM is meant for compute threads specifically, see “Some embodiments may advantageously use shared local memory (SLM) as extended URB memory. For example, the extended memory may be used as a vertex buffer, a tile cache storage for a 3D application, or any other suitable storage,” [0160].  In order to determine when to use SLM for the other memory purposes, Vaidyanathan teaches detecting when the SLM is idle, see “Some embodiments may detect the case when only 3D contexts are running in the GPU (e.g. determine that there is no local access to the SLM), and in that case utilize the otherwise unused SLM memory as an extended URB memory and/or tile cache,” [0161]. 
Further, Vaidyanathan discloses the presence of an MMU, see Fig. 13, where the MMU is utilized to map virtual addresses to physical addresses, including addresses for cache lines and memory pages, see [0180].
An obvious modification can be identified: incorporating Vaidyanathan’s disclosure of reconfiguring SLM to serve as a cache when the SLM is idle as well as the MMU.  This reads upon the limitation where the first portion of the memory device is remapped from being used as SLM to be used as a cache.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate Vaidyanathan’s reconfiguration of the SLM and MMU into Du’s system, as the reconfiguring provides for a more efficient use of the original L3 cache area instead of leaving the area allocated idle for SLMs, while the MMU provides a level of abstraction to the processors/execution units and only requires them to provide virtual addresses for memory access.
The combination of Du, Iyer, and Vaidyanathan still fails to teach where the remapping occurs in response to the thread dispatch being cleared.  While Vaidyanathan’s disclosure provides for a determination if the SLM area is idle/unused, this is not understood to explicitly refer to a cleared thread dispatch.
Dostert’s disclosure is related to monitoring threads and shared memory, and as such comprises analogous art.
As part of this disclosure, Dostert provides an area in shared memory, see Fig. 2, shared memory 125, as reporting slots to store thread status information, see [0029]. Figs. 3-5 show processes for processing tasks as well as example task status information.  Fig. 3 at step 340 reports a task is finished and where the thread is therefore idle, see [0037]; another example of this is seen in Fig. 5 step 545, where after a task finishes, the worker thread returns to a thread pool and updates the reporting slot to show an idle status, see [0042].
An obvious combination can be identified: incorporating Dostert’s reporting slots and thread status information, and particularly where the slots report idle threads at the end of thread execution.  Such a modification reads upon where the remapping is based upon a determination that the thread dispatch has been cleared; Vaidyanathan as earlier discussed discloses remapping SLM for cache purposes based on an idle state of the SLM, and Dostert has now provided a clear mechanism for recognizing that an area of SLM assigned to a thread is idle upon completion of the thread execution.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate Dostert’s reporting slots/thread information and reporting idle threads upon completion of a thread’s execution to Du’s system, as Dostert’s reporting structure/process further enhances Vaidyanathan’s modification – now the underlying system can be made aware of idle/available resources as soon as a thread is complete, instead of waiting for some unknown determination period.
Regarding claim 4, the combination of Du, Iyer, Vaidyanathan, and Dostert teaches the apparatus of claim 1, and the combination further teaches wherein the command includes a header indicating the size of SLM blocks (Du Fig. 9A shows command formats; as discussed in the claim 1 rationale, the incorporation of Iyer provides for a request for memory allocation, including a size of the memory to be allocated; necessarily this size requested can be found somewhere in the command format, reading upon the header indicating the size of the SLM blocks). 
Claim 10 is a method claim reciting method steps identical to the functional limitations of the partitioning circuitry of claim 1 and can therefore be rejected according to the same rationale of claim 1.
Regarding claim 16, Du teaches a graphics processing unit (GPU) comprising a plurality of slices, each having a plurality of sub-slices (Fig. 1, processor 102 includes graphics processor(s) 108, where in one embodiment of Fig. 5, the graphics processor includes modular cores called slices and each has sub-cores called sub-slices, see [0061]; see also [0116]), including structural elements identical to the structural elements of claim 1 and therefore rejected according to the same rationale. 
Claims 2, 5, 6, 8, 9, 11-14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Iyer, Vaidyanathan, and Dostert, and further in view of Sathe (US 2017/0061569).
Regarding claim 2, the combination of Du, Iyer, Vaidyanathan, and Dostert teaches the apparatus of claim 1, but fails to teach wherein the partitioning circuitry determines whether there is sufficient space in the memory device for the SLM blocks. 
Sathe’s disclosure is related to memory allocation for a graphics processor and as such comprises analogous art.
As part of this disclosure, Sathe depicts optimization logic in Fig. 15, where at step 1508, a determination is made if sufficient space exists in the SLM for the control block, see also [0126].  As seen in Fig. 15, if there is enough space, then space is allocated for the thread group, and if not, then the optimization logic skips optimization for the particular input program. 
An obvious modification can be identified: incorporating a check for sufficient space in Du’s cache as part of the allocation handling. This reads upon the limitation of the claim. 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to check for sufficient space when allocating the SLM into Du’s system, as this check would ensure that the allocation request can be successfully processed before attempting to process the allocation, potentially prematurely. 
Regarding claim 5, the combination of Du, Iyer, Vaidyanathan, Dostert, and Sathe teaches the apparatus of claim 2, and the combination further teaches wherein the partitioning circuitry acquires the SLM blocks having a size based on the size of SLM blocks indicated in the command upon a determination that there is sufficient space in the memory device for the SLM blocks (as discussed in the claim 2 rationale, Sathe continues with processing the control blocks when there’s enough space; as incorporated into the combination, the check occurs as part of the allocation request disclosed in Iyer; necessarily, this teaches that the allocation request successfully allocates the requested memory when there’s sufficient memory).
Regarding claim 6, the combination of Du, Iyer, Vaidyanathan, Dostert, and Sathe teaches the apparatus of claim 5, and Du further teaches the apparatus further comprising dispatch circuitry (as cited in the claim 1 rationale, [0110] discloses circuitry for performing functions in Du’s disclosure) to dispatch a group of the plurality of execution threads to operate on the SLM (“thread execution logic 600 includes local thread dispatcher 604 that arbitrates thread initiation requests from the graphics and media pipelines and instantiates the requested threads on one or more execution units 608A-N,” [0068], see also “In some embodiments, the pipelines send thread execution requests to 3D/Media subsystem 315, which includes thread dispatch logic for arbitrating and dispatching the various requests to available thread execution resources,” [0049], where “In traditional OpenCL memory structure, workgroups share a respective Shared Local Memory (SLM). Workgroups consist of a defined number of work items. These work items are executed by execution units,” [0112], teaching that the threads dispatched are executed by the execution units in the work items in the SLM).    
Regarding claim 8, the combination of Du, Iyer, Vaidyanathan, Dostert, and Sathe teaches the apparatus of claim 6, and the combination further teaches wherein the partitioning circuitry tracks use of the SLM blocks by the group of execution threads (as discussed in the claim 1 rationale, Dostert provides for tracking the thread statuses in the SLM; this provides for a tracking mechanism, where tracking the status of thread execution necessarily reads upon tracking use of the SLM blocks – the threads execute within the SLM, see the claim 6 rationale, so a thread status showing it’s still executing necessarily means the SLM blocks are still in use by the thread).
Regarding claim 9, the combination of Du, Iyer, Vaidyanathan, Dostert, and Sathe teaches the apparatus of claim 6, and the combination further teaches wherein the partitioning circuitry resets the memory device to operate as the cache upon a determination that the group of execution threads are no longer active (as discussed in the claim 1 rationale, Vaidyanathan discussed utilizing unused SLM as a cache, where Dostert disclosed that an idle state can be determined upon completion of a task/thread; this reads upon the limitation of the claim, as a group of threads finishing results in all threads being idle, and therefore would therefore be reconfigured for use as a cache).
Claims 11, 12, 13, and 14 are rejected according to the same rationale of claims 2, 4, 6, and 8 respectively.
Claims 17, 18, 19, and 20 are rejected according to the same rationale of claims 2, 4, 6, and 8 and 9 respectively. 
Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Du in view of Iyer, Vaidyanathan, Dostert, and Sathe, and further in view of Rao et al. (US 2015/0187040). 
Regarding claim 7, the combination of Du, Iyer, Vaidyanathan, Dostert, and Sathe teaches the apparatus of claim 6, but the combination fails to teach wherein the dispatch circuitry stalls the thread dispatch until the portioning circuitry acquires the size portion of the memory device. 
Rao’s disclosure relates to executing threads in a GPU and as such comprises analogous art.
As part of this, Rao discloses that when allocating among shared local memory, “if there is not enough shared local memory available, a scheduler will stall the dispatch of new threads until there is enough available shared local memory,” [0032].
An obvious modification can be identified: incorporating Rao’s teachings to stall dispatching threads when there is not enough shared local memory.  Such a modification reads upon the limitation of the claim, as the threads stall when there is not enough space, a condition discussed in the claims 2 and 5 rationale, where the claims 5 and 6 rationales teach allocating memory according to the request when there is sufficient space; i.e. threads will dispatch for execution then.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to incorporate Rao’s stalling with Du’s system, as stalling when there’s insufficient resources ensures that the threads won’t run into faults due to insufficient resources, while stalling the threads also ensures that the thread eventually executes, instead of being ignored/wasted.
Claim 15 is rejected according to the same rationale of claim 7. 

Response to Arguments
Applicant's arguments filed September 1, 2022 have been fully considered but are unpersuasive.
In arguing against the new Vaidyanathan reference, applicant appears to stress that Vaidyanathan fails to teach remapping a portion being used as SLM, citing [0160,0161].  Applicant’s focus appears to be that Vaidyanathan’s citation uses the term “unused” SLM when discussing reconfiguration.  However, the memory is still clearly assigned/configured as SLM, or else reconfiguration would not be necessary.  Vaidyanathan’s reconfiguration the SLM when it is idle, i.e. does not feature any local accesses – the memory is still allocated/assigned to be used as SLM, so it still reads upon “memory being used as SLM”.  Applicant’s claim term “being used as SLM” does not provide an explicit definition for “used” that is different from the above understanding of Vaidyanathan and applicant provides no additional details as to how Vaidyanathan’s disclosure fails to teach the limitation.  This line of argumentation is unpersuasive.
In arguing against the new Dostert reference, applicant focuses on how Dostert’s task reporting is not used to remap a portion of memory.  The office action does not assert this – Dostert’s disclosure is used to modify Vaidyanathan’s conditions for determining when the SLM is idle/can be reconfigured as a cache.  The argument is unpersuasive as it focuses on Dostert in isolation instead of Dostert as part of the combination of references.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON D HO whose telephone number is (469)295-9093. The examiner can normally be reached Mon-Thur 9:00-6:00 CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Reginald Bragdon can be reached on (571)272-4204. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.D.H./Examiner, Art Unit 2139 

/REGINALD G BRAGDON/Supervisory Patent Examiner, Art Unit 2139