Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
1. 	This application is responsive to communication filed on August 17th, 2020.  Claim 1-20 are pending and presented for examination.

2. 	The submission of Information Disclosure Statements (IDS) on August 17th, 2020 is complied with the provisions of 37 CFR 1.97.  Accordingly, the Information Disclosure Statements are being considered by the Examiner. 

	Rejections - 35 USC § 103
3. 	The following is a quotation of 35 U.S.C. 103 which 
forms the basis for all obviousness rejections set forth in this 
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall notbe negated by the manner in which the invention was made. 


4. 	Claims 1, 3, 6, 8-10, 12-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nassi et al. (USPGPUB: 2018/0373561), hereinafter Nassi in view of Kissell (USPGPUB: 2006/0190945).
 	As per claim 1, Nassi discloses the invention as claimed including an apparatus (e.g. see figure 2 depicted a apparatus of a computer system as hierarchy) comprises graphics processor circuitry (e.g. see para.[0598], lines 13-15) configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads, for example Nassi discloses generating and tracking a set of pages of memory for a set of threads (virtual processors) that accessing sets of pages of memory (e.g. see para.[0228], lines 6-8; noting that Nassi clearly discloses virtual processors are defined as threads or vice versa (e.g. see para.[0227], line 7).  Nassi further discloses maintaining a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads, for example, Nassi discloses maintaining translation tables that are set up in the hardware to map guest physical addresses to “real” physical addresses 1318 (the actual physical pages of memory resident on the nodes of the cluster), noting that Nassi further discloses while each application running on the guest OS has its own first level page tables, the second level page tables operate out of the same pool of memory of what the guest operating system believes to be physical memory (e.g. see para.[0240], lines 7 et seq.).  Nassi discloses the invention as claimed, Nassi however does not particularly teach execute a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads; select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.  Kissell however in his teaching of symmetric multiprocessor operating system for execution on non-independent lightweight thread context, discloses the missing elements that are known to be required in Nassi’s in order to arrive at Applicant’s current invention wherein Kissell teaches execute a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads, for example, Kissell teaches that a first thread context 104 running an operating system thread may restart a second thread context 104 by writing a 0 to the H bit 599 of the TCHalt Register 509 of the second thread context 104 (e.g. see para.[0079], lines 4-7; also see para.[0107], lines 11-14).  Kissell further discloses select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table; for example, Kissell teaches selecting the same virtual page address and same ASID generated by two different memory maps when two different threads running on two different CPUs/TCs 104, and when the second thread accessed the TLB 1302, the TLB 1302 would return a hit and output the physical page translation for the memory map of the first thread, since the entry would have been allocated and filled when the first thread caused a TLB 1302 miss (e.g. see para.[0198], lines 12 et seq.).  Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Kissell and to utilize his teaching of executing a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads; select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.  By doing so, it would allow the interchangeable and overlapping of mapping transaction between the first thread and second thread, by allowing one thread to perform mapping transaction while the other thread can perform its own processing transaction, system throughput and overall system performance can be greatly enhanced, therefore being advantageous.
 	As per claim 3, Nassi further discloses the graphics processor circuitry is further configured to translate the virtual address to a physical address of a location in a memory device; for example, Nassi discloses the virtual address has been converted to what the guest operating system believes to be a “physical address” (but is in actuality a guest physical address from the perspective of the hyper-kernel (e.g. see para.[0239], lines 18-21), Nassi also teaches the guest physical address (or block of gpa, which may, for example, be a 64 bit value) returned in response to the lookup of the first level page table is then used by the virtualization hardware of the physical processor as an index to a second level page table 1316 to obtain a corresponding physical page of memory (e.g. see para.[0240], lines 1 et seq.).
 	As per claim 6, see arguments with respect to claim 1,  Nassi further teaches the graphics processor circuitry is further configured to execute a map instruction of the first thread to map the selected page in response to an allocation request instruction executed by the second thread; for example, Nassi teaches in response to the internal vcpu (second thread) allocation request using first level page table (e.g see para.[0239], lines 4-5), the second level translation tables that belong to the virtualization hardware of the physical processor (first thread) may be set up in the hardware to map guest physical addresses to “real” physical addresses 1318 (the actual physical pages of memory resident on the nodes of the cluster) (e.g. see para.[0240], lines 2-9).
	As per claim 8, Nassi further discloses the graphics processor circuitry is further configured to execute multiple threads that include a utility program executable to modify the translation table and wherein the multiple threads use atomic operations to modify the translation table; for example, Nassi discloses that when a page is moved, its contents are moved, which includes allocating a physical page at the destination node, and copying the contents of the page to the new node, the second level page table on the destination was also updated/modified so that the entry corresponding to the gpa is filled with the newly migrated physical page (e.g. see para.[0244], lines 4-10).

 	As per claim 9, Nassi further discloses arbitration circuitry as being equivalent to the guest operating system context switching unit which configured to select from among multiple threads requesting allocation of private memory from the first thread, for example, Nassi discloses the guest operating system may perform thread context switching, where the operating system switches, moves, or multiplexes guest threads into different vcpus, when the thread in a vcpu is switched, this causes a corresponding change to the FS-Base0 register value of the vcpu (e.g., when the thread context switch occurs, the FS-Base0 register is switched or updated to a new value corresponding to the new thread). (e.g. see para.[0257], lines 9-16).
 	As per claim 10, the further limitation of the graphics processor circuitry is further configured to delay de-allocation of the private memory page until the end of a group of processing work and to allow one or more other threads to use the private memory page is taught by Nassi, for example, Nassi clearly discloses the vcpu may only exist on one node at a time, where only one vcpu thread is running the vcpu for the guest operating system which is known to contain private memory for accessing, and the other inactive vcpu threads are waiting, therefore the surrogate vcpu threads act as proxies for the vcpu, handling processing on behalf of the location (node) where the vcpu is running (e.g., a vcpu thread runs a vcpu on a node, whereas the vcpu itself may run on any node). The use of surrogate threads on the nodes of the cluster prevents the need for locking and synchronization during, for example, vcpu migration (e.g. see para.[0439], lines 16 et seq.).
 	As per claim 12, see arguments with respect to claim 1, it should be noted that Nassi clearly discloses all data structures can be pre-allocated as indexed arrays (e.g. see para.0155], lines 4-5), accordingly it would have been further obvious to one having ordinary skill in the art before the effective filing date of the current invention to further implement any threads in Nassi system including the first or second thread to allocate/pre-allocate one or more additional unrequested private memory pages that is a subset of the data structures aforementioned by Nassi as indexed array, by doing so it would allow for more additional memory space to store requested data to be accessed/processed which results to increasing of system throughput and reducing latency, therefore being advantageous.  
 	As per claim 13, Nassi discloses the apparatus is a computing device that includes an integrated circuit that includes the graphics processor circuitry 202, 204, 206 and 208 (e.g. see Nassi’s figure 2) and network interface circuitry (e.g. see para.[0599], lines 4-6), also see Kissell’ figures 1 and 2).
 	As per claim 14, Nassi discloses the invention as claimed, detailed above with respect to claim 1, Nassi however does not particularly teach the identifiers of the threads are SIMD group identifiers and wherein the private memory space is interleaved based on a thread's position in its SIMD group.  Kissell discloses the missing elements that are known to be required in the system of Massi in order to arrive at Applicant’s current invention wherein Kissell identifiers of the threads are SIMD group identifiers, for example a multithreaded microprocessor typically allows the multiple threads to share the functional units of the microprocessor (e.g., instruction fetch and decode units, caches, branch prediction units, and load/store, integer, floating-point, SIMD, etc. execution units) (e.g. see para.[0008], lines 15-19), wherein the private memory space is interleaved based on a thread's position in its SIMD group, for example, Kisell also discloses the scheduling policy may include, but is not limited to policies round-robin, or time-division-multiplexed, or interleaved, scheduling policy that allocates a predetermined number of clock cycles or instruction issue slots to each ready thread in a rotating order (e.g. see para.[0058], lines 14-18).  Accordingly, it would have been further obvious to one having ordinary skill in the art before the effective filing date of the current invention to employ the interleaving protocol for interleaving the memory spaces amongst the multithread units based on a thread's position in its SIMD group as taught by Kissell for that of Massi’s invention.  By doing so, it keeps latency to a minimum level by allowing for access interchange between the threads to the private memory pages or units of the system, which results to increasing memory access throughput, and overall system performance, therefore being further advantageous.
 	As per claim 15, Nassi discloses the invention as claimed including a method comprises generating a pool of private memory pages for a set of graphics work that includes multiple threads, for example Nassi discloses generating and tracking a set of pages of memory for a set of threads (virtual processors) that accessing sets of pages of memory (e.g. see para.[0228], lines 6-8; noting that Nassi clearly discloses virtual processors are defined as threads or vice versa (e.g. see para.[0227], line 7).  Nassi further discloses maintaining a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads, for example, Nassi discloses maintaining translation tables that are set up in the hardware to map guest physical addresses to “real” physical addresses 1318 (the actual physical pages of memory resident on the nodes of the cluster), noting that Nassi further discloses while each application running on the guest OS has its own first level page tables, the second level page tables operate out of the same pool of memory of what the guest operating system believes to be physical memory (e.g. see para.[0240], lines 7 et seq.).  Nassi discloses the invention as claimed, Nassi however does not particularly teach execute a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads; select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.  Kissell however in his teaching of symmetric multiprocessor operating system for execution on non-independent lightweight thread context, discloses the missing elements that are known to be required in Nassi’s in order to arrive at Applicant’s current invention wherein Kissell teaches execute a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads, for example, Kissell teaches that a first thread context 104 running an operating system thread may restart a second thread context 104 by writing a 0 to the H bit 599 of the TCHalt Register 509 of the second thread context 104 (e.g. see para.[0079], lines 4-7; also see para.[0107], lines 11-14).  Kissell further discloses select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table; for example, Kissell teaches selecting the same virtual page address and same ASID generated by two different memory maps when two different threads running on two different CPUs/TCs 104, and when the second thread accessed the TLB 1302, the TLB 1302 would return a hit and output the physical page translation for the memory map of the first thread, since the entry would have been allocated and filled when the first thread caused a TLB 1302 miss (e.g. see para.[0198], lines 12 et seq.).  Accordingly, it would have been obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Kissell and to utilize his teaching of executing a first thread to receive a request to allocate a private memory page for a second thread of the multiple threads; select a private memory page from the pool in response to the request; and map the selected page in the translation table for the second thread; and execute one or more instructions of the second thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.  By doing so, it would allow the interchangeable and overlapping of mapping transaction between the first thread and second thread, by allowing one thread to perform mapping transaction while the other thread can perform its own processing transaction, system throughput and overall system performance can be greatly enhanced, therefore being advantageous.
	As per claim 16, the combination of Nassi and Kissell disclose the invention as claimed, detailed above with respect to claims 1 and 15.  Nassi and Kissell however do not particularly disclose a non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform method of claim 16. However, one of ordinary skill in the art would have recognized that computer readable medium (i.e., floppy, cd-rom, etc.) carrying computer-executable instructions for implementing a method, because it would facilitate the transporting and installing of the method on other systems, is generally well-known in the art.  For example, a copy of the Microsoft Windows operating system can be found on a cd-rom from which Windows can be installed onto other systems, which is a lot easier that running a long cable or hand typing the software onto another system.  Therefore, it would have been obvious to put Nassi and Kissell’s program on a computer readable medium, because it would facilitate the transporting, installing and implementing of Nassi and Kissell’s program on other systems; therefore being advantageous.
 	As per claim 17, Nassi further discloses the graphics processor circuitry is further configured to translate the virtual address to a physical address of a location in a memory device; for example, Nassi discloses the virtual address has been converted to what the guest operating system believes to be a “physical address” (but is in actuality a guest physical address from the perspective of the hyper-kernel (e.g. see para.[0239], lines 18-21), Nassi also teaches the guest physical address (or block of gpa, which may, for example, be a 64 bit value) returned in response to the lookup of the first level page table is then used by the virtualization hardware of the physical processor as an index to a second level page table 1316 to obtain a corresponding physical page of memory (e.g. see para.[0240], lines 1 et seq.).
 	As per claim 19, see arguments with respect to claims 1 and 16, Nassi further teaches the graphics processor circuitry is further configured to execute a map instruction of the first thread to map the selected page in response to an allocation request instruction executed by the second thread; for example, Nassi teaches in response to the internal vcpu (second thread) allocation request using first level page table (e.g see para.[0239], lines 4-5), the second level translation tables that belong to the virtualization hardware of the physical processor (first thread) may be set up in the hardware to map guest physical addresses to “real” physical addresses 1318 (the actual physical pages of memory resident on the nodes of the cluster) (e.g. see para.[0240], lines 2-9).

5. 	Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Nassi et al. (USPGPUB: 2018/0373561), hereinafter Nassi in view of Kissell (USPGPUB: 2006/0190945) and further in view of Wang et al. (USPGPUB: 2019/0042304), hereinafter Wang.
	As per claim 2, the combination of Nassi and Kissell disclose the invention as claimed, detailed above with respect to claim 1.  Nassi and Kissell however do not particularly teach the first thread is a persistent thread for which resources are allocated for an entirety of a time interval over which the set of graphics work is executed.  Wang, however in his teaching of ICE architecture and mechanisms to accelerate tuple-space search with integrated GPU, discloses his thread is implemented as persistent thread wherein resources are allocated for an entirety of a time interval over which the set of graphics work is executed, for example, Wang discloses the persistent thread of his invention can continuously send data packets to the Graphic processor unit (GPU) (being equivalent to the resources are allocated for an entirety of a time interval over which the set of graphics work is executed as being claimed) (e.g. see para.[0038], lines 5-7), by doing so, the system doesn’t have to re-launch the kernel or requesting big batches to hide latency (e.g. see para.[0038], lines 8 et seq.).  Accordingly, it would have been further obvious to one having ordinary skill in the art before the effective filing date of the current invention to look into the invention of Wang and to implement his thread as a persistent thread for which resources are allocated for an entirety of a time interval over which the set of graphics work is executed as being claimed for that of Nassi and Kissell invention.  By doing so, it would avoid system operation interruption, since the system of Nassi and Kissell doesn’t have to re-launch the kernel or requesting big batches to hide latency, which results to enhancing of overall system reliability, therefore being further advantageous.

Allowable subject matter 
6. 	Claims 4, 7, 11, 18 and 20 objected to as being dependent upon rejected based claims 1 and 16 but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claim 6.  Claim 5 is also allowable as it is being depended upon claim 4.  The prior arts of record do not teach nor disclose the second thread is configured to request to allocate a first number of private memory pages based on shader execution state information for the second thread, wherein the shader execution state information was received from an external source prior to the execution (claims 4 and 18).  In addition, nor the prior arts of record teach the graphics processor circuitry is further configured to execute the first thread to load one or more page addresses into general purpose registers of the first thread and wherein the map instruction writes the page addresses to a translation table entry for the second thread (claims 7 and 20).  The prior arts of record also do not specifically teach the graphics processor circuitry is further configured to: determine a number of threads to be preempted in response to a preemption request; and allocate, based on the determination, a number of private memory pages to save preempted threads state (claim 11).  Claim 5 is also allowable since it is depended upon objected claim 4.

Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUAN V THAI whose telephone number is 571-272-4187.  The examiner can normally be reached Monday-Friday 8am-4pm.
 	Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
 	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sanjiv Shah can be reached on 571-272-4098.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-9300.  
 	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

November 04, 2022
/TUAN V THAI/Primary Examiner, Art Unit 2135