DETAILED ACTION
Claims 1-20 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The following claim language is unclear:
As per claim 1, lines 14-15 recite “receive a request from the VMM to provision a subcluster of graphics processing apparatuses” and lines 19-20 recite “provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM” it is unclear from the context of the claim whether the resources correspond to a plurality of GPU engines or if the resources are for the GPU engines which conform the subcluster. For examination purposes, examiner interprets the limitation as provisioning the GPU engines/devices.
Claims 2-7 are dependent on claim 1 and fail to cure the deficiencies set forth above for claim 1 and therefore are rejected under the same rationale above.
As per claim 8, it is a method claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale.
Claims 9-14 are dependent on claim 1 and fail to cure the deficiencies set forth above for claim 1 and therefore are rejected under the same rationale above.
As per claim 15, it is a media/product type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale.
Claims 16-20 are dependent on claim 1 and fail to cure the deficiencies set forth above for claim 1 and therefore are rejected under the same rationale above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8-11, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (US 10,325,343 B1), in view of Wooten (US 2007/0057957 A1).

Regarding claim 1, Zhao teaches the invention substantially as claimed including a system comprising: 
a host platform (Col. 3, lines 43-45: the client systems 110);
a plurality of graphics processing apparatuses coupled to the host platform using a host fabric (Fig. 1, client systems 110-n, GPU servers 120-n, network 130; Col. 2, lines 53-59: The computing system 100 comprises a plurality (n) of client systems 110-1, 110-2, . . . , 110-n (collectively referred to as client systems 110), and a server cluster 120 (e.g., server farm) comprising a plurality (s) of GPU servers 120-1, 120-2, . . . , 120-s. The client systems 110 and server cluster 120 are operatively connected over a communications network 130.), the plurality of graphics processing apparatuses coupled together using a scale-up fabric (Col. 2, lines 59-64: The communications network 130 is configured to enable network communication between the client systems 110 and the server cluster 120, as well as to enable peer-to-peer network communication between the GPU servers 120-1, 120-2, . . . , 120-s of the server cluster 120.); 
a graphics processing apparatus from the plurality of graphics processing apparatuses (GPU Server Node 200) comprising: 
one or more graphics processing engines (Fig. 2, GPU Devices 220-1 through 220-g); 
a memory (Fig. 2, System memory 210); 
a provisioning agent (Fig. 2, Service Portal and Request Handler 225) to: 
receive a request from the VMM (Client) to provision a subcluster of graphics processing apparatuses (Col. 8, lines 16-22: The service portal and request handler 225 implements interfaces and functions to enable client/server communication between the client systems 110 and the GPU server node 200. In addition, service portal and request handler 225 comprises methods to communicate with, and pass incoming service requests for GPU services from the client systems 110; Col. 10, lines 52-55: dynamically determining a sufficient GPU grouping for a given client when a service request is received from the client for GPU resources), the subcluster including a plurality of graphics processing engines (Col. 1, lines 52-55: grouping and provisioning of GPU resources for GPUaaS; Fig. 3A subcluster 307 of the GPUs GPU0, GPU1, GPU2, and GPU3); 
provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses (Col. 9, lines 4-8: The GPUs GPU0, GPU1, GPU2, and GPU3 can be interconnected 307 using any suitable wire-based communications protocol such as NVLINK (i.e., scale-up fabric) developed by NVidia. NVLINK allows for transferring of data and control code between the GPUs; Col. 10, lines 36-42: provisioning methods are configured to dynamically group GPUs together such that most or all of the GPUs within a GPU group belong to a same interconnect domain to provide much faster communication, while avoiding the formation of GPU groups that require cross-domain interconnections, which can result in degraded performance; Col. 12, lines 27-55); and 
provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM (Client) (Fig. 7, Step 710; Col. 15, lines 23-26: the system will dynamically form a GPU group for the requesting client, which meets the target policies (block 716) and the GPU group will be provisioned to the client (block 710)).

Zhao does not expressly teach wherein a host platform including a processing device and virtual machine monitor (VMM), the host platform hosting one or more virtual machines (VMs); and
a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracker.
However, Wooten teaches wherein a host platform including a processing device and virtual machine monitor (VMM), the host platform hosting one or more virtual machines (VMs) ([0023] Guest virtual machines A, B and C 210, 220 and 230 respectively, may include guest applications that request a guest virtual machine related request to perform a graphics function. A virtualizer, hypervisor, or VMM 240 couples the guest virtual machine graphics request to the host computer 100. The guest virtual machine application is executed with the CPU function 110 of the host computer 100. The main memory 140 may be written into by the CPU 110 to record the guest applications' graphic instructions and guest graphics virtual addresses (GrVA). Once the GPU functions 120 begin operation, the GPU reads the GrVA, processes it as needed, and accesses main memory 140 via the IOMMU which perform the GrVA to SPA translations. As a result, the guest virtual machine can achieve the graphics function desired of the guest application using integrated graphics chipset of the host computer.)  and a memory management unit (MMU) including a GPU second level page table ([0006] GMMU assess required two accesses (a two-level table)) and GPU dirty bit tracker ([0005] To prevent unlimited access to system memory by graphics (or any other IO device) systems include hardware that will filter accesses to main system memory by IO devices. This filtering is performed by the IO memory management unit (IOMMU) and may be as simple as a one-bit access check, or as complex as an address translation).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Wooten with the teachings of Zhao to have virtual machines hosted by a hypervisor (VMM) on a host platform and to request acceleration tasks from a GPU. The modification would have been motivated by the desire of allowing guest virtual machine to achieve desired graphic functions. See Wooten’s [0023].

Regarding claim 2, Zhao teaches wherein to provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, the provisioning agent further to: dynamically update a routing table in the scale-up fabric in response to the request from the VMM (client) to provision the subcluster of graphics processing apparatuses (Col. 14, line 64 through Col. 15, line 7: When a new client requests access to the GPU resources of the GPU server node, the SLA policies and topology performance metric tables are queried to determine if there are any qualified GPU groups available to provision for the client based on the SLA policies for the client. If there is only a single GPU group available (negative result in block 706), the GPU group will be formed (block 708) and then provisioned to the requesting client (block 710). The relevant metadata will then be updated to denote which GPUs are occupied by which client at which SLA setting. Such metadata can be referenced for other client bindings.).
	In addition, Wooten teaches a VMM ([0023] Guest virtual machines A, B and C 210, 220 and 230 respectively, may include guest applications that request a guest virtual machine related request to perform a graphics function. A virtualizer, hypervisor, or VMM 240 couples the guest virtual machine graphics request to the host computer 100. )

Regarding claim 3, Zhao teaches provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request (Fig. 7, Step 710 Provision GPU Resources to Client System). 
In addition, Wooten teaches wherein to provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM, the provisioning agent is further to: 
update a mapping in the GPU second level page table for a portion of the memory corresponding to an amount of memory specified in the request from the VMM for the subcluster, the mapping from the portion of the memory to a guest physical memory address space ([0018] Normally, the GFx partition will describe a context with context-specific translation tables. These tables translate accesses by the GPU from the virtual address space of the graphics context into a system memory address. On a system without DMA remapping, the translation is from graphics virtual address into system physical address (actual addresses used to access system memory). On a system with DMA remapping, the graphics device is treated like any other I/O device and the addresses that are presented to memory are translated by tables created and managed by the Hypervisor. On a system with a separate graphics card, a significant portion of the memory used by the GPU is located on the graphics card itself and most memory accesses related to address translation are to that GPU-dedicated memory. In this latter type of system, the overhead of going through the DMA remapping translation required for a system memory access may be relatively insignificant; [0020]; [0022] If accesses are needed to main memory, the retrieved or processed GrVA are sent out by the GPU to perform the graphics function in a direct memory address (DMA) format. However, the GrVA format cannot be used directly by the main memory 140. A GrVA to system physical address (SPA) translation is needed. An I/O Memory Management Unit (IOMMU) 130 containing the direct memory address remapping (DMAr) tables 132 is available to provide the GrVA to SPA translation so that the GPU may operate without having to perform the address translation itself. The GPU operates independently of the address translation. Stated another way, the address translation is transparent to the GPU. Thus, the system 100 provides an example mechanism whereby an application 112 and driver 114 generating only GrVAs and graphics instructions 142, may produce a graphics function using a GPU that only processes GrVAs and instructions. The GrVA and graphics instruction 142 may be executed by a GPU using an IOMMU 130 as a GVA to SPA address translation device such that main memory 140 may be accessed properly.).

Regarding claim 4, Zhao teaches wherein the subcluster of the graphics processing apparatus includes at least one graphics processing engine from two or more graphics processing apparatuses (Col. 9, line 65 through Col. 10, lines 3: GPU provisioning techniques as discussed herein are configured to achieve high aggregated performance (and avoid performance bottlenecks) by supporting efficient load balancing of client tasks across multiple GPUs which reside on one GPU server node, or across two or more GPU server nodes.).

Regarding claim 8, it is a method type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Regarding claim 9, it is a method type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above.

Regarding claim 10, it is a method type claim having similar limitations as claim 3 above. Therefore, it is rejected under the same rationale above.

Regarding claim 11, it is a method type claim having similar limitations as claim 4 above. Therefore, it is rejected under the same rationale above.

Regarding claim 15, it is a media/product type claim having similar limitations as claim 1 above. Therefore, it is rejected under the same rationale above.

Regarding claim 16, it is a media/product type claim having similar limitations as claim 2 above. Therefore, it is rejected under the same rationale above.

Regarding claim 17, it is a media/product type claim having similar limitations as claim 3 above. Therefore, it is rejected under the same rationale above.

Regarding claim 18, it is a media/product type claim having similar limitations as claim 4 above. Therefore, it is rejected under the same rationale above.

Claims 5, 6, 12, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao and Wooten, as applied to claim 1 above, in further view of Rawson et al. (US 2010/0141664 A1).

Regarding claim 5, Zhao teaches subclusters as cited above but neither Zhao nor Wooten expressly teach the graphics processing apparatus further comprising: 
a migration agent, the migration agent to: 
receive a request from the VMM to migrate GPU state from the subcluster of graphics processing apparatuses to a second subcluster of graphics processing apparatuses coupled to a second host platform; 
extract GPU state from the graphics processing apparatus, the GPU state including at least processor state and memory state; and 
send the GPU state to the VMM to be transferred to the second subcluster via the second host platform.
	However, Rawson teaches the graphics processing apparatus further comprising: 
a migration agent, the migration agent to: 
receive a request from the VMM to migrate GPU state from the subcluster of graphics processing apparatuses to a second subcluster of graphics processing apparatuses coupled to a second host platform ([0025] The agent that initiates the transfer of the content of the GCCB 252 may be a processor 202, another GPU 231 or other hardware device. Other triggering events such as exceeding a preprogrammed processing time limit or an internal hardware error may also initiate saving of a GCCB 252 to memory 204.); 
extract GPU state from the graphics processing apparatus, the GPU state including at least processor state and memory state ([0022] The computer system 200 also provides for efficient migrating of a GPU context as a result of a context switching operation. More specifically, the efficient migrating provides each graphics device 230 with a context switch module 250 which accelerates loading and otherwise accessing context data representing a snapshot of the state of the graphics device 230. The snapshot includes both GPU state and state that may be buffered in external memory. [0023] The context data includes an ordered list of any input graphics commands that have not been completed. The context data also include intermediate results such as vertex and fragment lists, and TLB contents. This type of context data may in some cases be passed to another GPU rather than being regenerated (e.g., in the TLB contents case, the cache can be pre-warmed as long as memory resources have not moved). This information is written to a graphics context control block (GCCB) 252 which is stored within a contiguous area of memory 204. Also, in operation, the graphics device 230 can accept a pointer to a previously written GCCB 252 and a resume command from software or some other external agent. The pointer may be provided well in advance of when another graphics device 230 might be writing out to a GCCB 252. The context switch module 250 can control a set of semaphores (e.g., hardware semaphores), where the semaphores may reside in another location in memory 204. Control of the semaphores is used to synchronize access to the contents of the GCCB 252 and then to individual resources that may be referenced within the GCCB. The set of semaphores synchronize and coordinate events within each of the plurality of graphics devices.); and 
send the GPU state to the VMM to be transferred to the second subcluster via the second host platform ([0024] When granted access, the new GPU is able to read in the contents of the GCCB 252, placing the information in appropriate internal registers, translation look aside buffers (TLBs), page tables, etc. of the graphics device 230 and allows the graphics device 230 to resume processing of the context starting from the point at which the context was suspended. The memory address pointer at which the GCCB 252 is to be written or read can be supplied programmatically by software, transferred to the graphics device 230 over an attachment bus or port, or supplied from an internal register within the graphics device 230.).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Rawson with the teachings of Zhao and Wooten to allow for GPU state migration. The modification would have been motivated by the desire of allowing high availability of GPU processing in case triggering events such as exceeding a preprogrammed processing time limit or an internal hardware error. See at least Rawson’s [0025].

Regarding claim 6, Rawson teaches wherein the migration agent is further to: 
identify, based on the GPU dirty bit tracker, a first portion of the memory that was written to while the GPU state was being transferred to the second subcluster ([0008] Broadly speaking, the present invention provides a mechanism for efficiently saving the context of GPU hardware so that it may be shared among a number of different contexts and for efficient migrating of a GPU context from one GPU to another as part of a context switching operation. More specifically, the efficient migrating provides a graphics processing unit with context switch module which accelerates loading and otherwise accessing context data representing a snapshot of the state of the GPU. The snapshot includes both on-chip GPU state and state that may be buffered in external memory. [0009] The context data includes both external working data such as textures, color buffers, vertex buffers, etc. contained in system or video memory and internal state. The latter includes an ordered list of any input graphics commands that have not been completed as well as temporary data, status and configuration bits contained in registers. This internal information is written to a contiguous area of memory referred to as a graphics context control block (GCCB). Also, in certain embodiments, the GPU can accept a pointer to a previously written GCCB and a resume command from software or some other external agent.); and 
send the first portion of the memory to the VMM to be transferred to the second subcluster via the second host platform ([0024] When granted access, the new GPU is able to read in the contents of the GCCB 252, placing the information in appropriate internal registers, translation look aside buffers (TLBs), page tables, etc. of the graphics device 230 and allows the graphics device 230 to resume processing of the context starting from the point at which the context was suspended. The memory address pointer at which the GCCB 252 is to be written or read can be supplied programmatically by software, transferred to the graphics device 230 over an attachment bus or port, or supplied from an internal register within the graphics device 230.).

Regarding claim 12, it is a method type claim having similar limitations as claim 5 above. Therefore, it is rejected under the same rationale above.

Regarding claim 13, it is a method type claim having similar limitations as claim 6 above. Therefore, it is rejected under the same rationale above.

Regarding claim 19, it is a media/product type claim having similar limitations as claim 5 above. Therefore, it is rejected under the same rationale above.
Allowable Subject Matter
Claims 7, 14, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JORGE A CHU JOY-DAVILA whose telephone number is (571)270-0692. The examiner can normally be reached Monday-Friday, 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Meng-Ai T An can be reached on (571)-272-3756. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JORGE A CHU JOY-DAVILA/Primary Examiner, Art Unit 2195