DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wilt et al. (United States Patent Application 20170132746) in view of Post et al. (United States Patent Application 20120084774).
As per claim 1, Wilt teaches the invention substantially as claimed including, a system comprising: 
	at least one computing device comprising at least one processor ([0182], computing device 3000 includes one or more processors 3010 coupled to a system memory 3020); 
	at least one memory comprising executable instructions ([0182], computing device 3000 includes one or more processors 3010 coupled to a system memory 3020), wherein the instructions, when executed by the at least one processor, cause the at least one computing device to at least: 
	monitor resources of a plurality of hosts of a datacenter cluster ([0111], The performance monitoring 1110 may determine any suitable set of performance metrics, e.g., metrics related to the use of the virtual GPU 151B by the instance 141E; and [0155], Performance metrics may include network-related metrics such as latency and bandwidth, as measured within the provider network and/or between the provider network and a client device. Performance metrics may include any other metrics related to processor use, GPU use, memory use, storage use, and so on); 
	identify a host executing a graphics processing unit (GPU)-remoting client virtual machine (VM) ([0178], A placement score for a current placement of a resource at a resource host may be generated with respect to one or more placement criteria. The placement criteria, as discussed above, may be used to optimize placement of resources in the provider network 100…placement score may reflect a score on how close the current placement is with respect to the more optimal scenario (e.g., same network router). The score may be a composite of multiple different placement criteria, considering the impact on the resource, resource host, and/or distributed system as a whole), the GPU-remoting client virtual machine comprising a GPU workload ([0175], Execution of the application using the virtual GPU may generate virtual GPU output, e.g., output produced by executing instructions or otherwise performing tasks on the virtual GPU); 	
	identify, based on the monitored resources, a destination host comprising a lower GPU-remoting latency than the host ([0177], Placement criteria may be optimized in this manner not only for newly provisioned resources but also for migration of a virtual compute instance and/or attached virtual GPU after their use has begun…When scaling is performed for GPU virtualization as discussed above, the locations of any virtual GPUs may be selected based on placement criteria, and/or the location of the virtual compute instance may be moved based on placement criteria. For example, if a virtual GPU is insufficient to meet the GPU requirements of a virtual compute instance, both the virtual GPU and the virtual compute instance may be moved to a different set of locations where a virtual GPU of a sufficiently capable class can be provisioned…if a virtual compute instance needs to be migrated to a different location, the location of an attached virtual GPU may be migrated as well to optimize one or more placement criteria. If the resource requirements for the instance type and/or GPU class change over time (based on user input and/or performance monitoring), either the virtual compute instance and/or attached virtual GPU (and often both) may be migrated to new locations for optimization of placement criteria. If resource availability changes over time, either the virtual compute instance and/or attached virtual GPU (and often both) may be migrated to new locations for optimization of placement criteria); and 
	migrate the GPU-remoting client virtual machine from the host to the destination host based on the destination host comprising the lower GPU-remoting latency than the host ([0147], a GPU location 1450A in a data center nearest the client device 180A may be 

	Although Wilt teaches monitoring network latencies ([0147], Performance metrics may include network-related metrics such as latency and bandwidth, as measured within the provider network and/or between the provider network and a client device. Performance metrics may include any other metrics related to processor use, GPU use, memory use, storage use, and so on). Wilt fails to specifically teach, determine GPU-remoting latencies for at least a subset of the plurality of hosts, the GPU-remoting latencies indicating network latencies to access a GPU resource through a GPU-remoting server virtual machine of the datacenter cluster.
	However, Post teaches, determine GPU-remoting latencies for at least a subset of the plurality of hosts, the GPU-remoting latencies indicating network latencies to access a GPU resource through a GPU-remoting server virtual machine of the datacenter cluster ([0063], the latency can be measured from the time that graphics kernel 508 issues a command to the GPU until an acknowledgment ("ACK") is received. After each ACK is received, graphics kernel 508 can send the latency associated with the request to 3D graphics service manager 404. 3D graphics service manager 404 can update a value in the table that reflects the average latency for the GPU, see also [0066, 0069-0070).
	
	Wilt and Post are analogous because they are each related to managing virtualized GPU resources. Wilt teaches a method of load balancing for virtual GPUs (Abstract, systems, and computer-readable media for placement optimization for virtualized graphics processing are disclosed; and [0111], The performance monitoring 1110 may determine any suitable set of performance metrics, e.g., metrics related to the use of the virtual GPU 151B by the instance 141E). Post also teaches a method of load balancing for virtualized GPU resources (Abstract, Exemplary techniques for balancing 3D graphical processor unit use among virtual machines are herein disclosed. In an exemplary embodiment, a virtualization platform can load an instance of a graphics rendering module for a virtual machine; select a GPU for the graphics rendering module to run on; and configure the virtual machine to render to the selected GPU; and [0004], determine that the first 3D graphics processing unit is overcommitted based on at least an amount of time the first 3D graphics processing unit takes to respond to commands; move a first virtual machine from the group of virtual machines to a second graphics processing unit in response to at least the determination that the first 3D graphics processing unit is overcommitted ). It would have been obvious to one having ordinary skill in the art  before the effective filing date of the claimed invention that based on the combination, the teachings of Wilt would be modified with the overcommit detection mechanism taught by Post in order to accomplish load balancing and optimal placement of virtual GPUs. The application of Post’s known techniques to Wilt would yield predictable results.  Therefore, it would have been obvious to combine the teachings of Wilt and Post. 

As per claim 2, Post teaches, wherein the destination host comprises a lowest GPU remoting latency among the subset of the plurality of hosts ([0005], cause a processor to 

As per claim 3, Wilt teaches, wherein the instructions, when executed by the at least one processor, cause the at least one computing device to at least: 
	identify the at least the subset of the plurality of hosts based on a respective one of the at least the subset of the plurality of hosts comprising sufficient resources for the GPU remoting client virtual machine ([0179], Resource hosts such as physical compute instances and graphics servers (that host physical GPUs) may be evaluated to determine those resource hosts that can host a resource such as a virtual compute instance or a virtual GPU. For instance, hosts that do not satisfy certain conditions may be filtered out of consideration… The remaining available resource hosts that can host the resource may then be evaluated as potential destination hosts. For example, placement score(s) may be generated for the placement of the resource at possible destination resource host(s)).

As per claim 4, Post teaches, wherein the host is identified to execute the GPU-remoting server virtual machine ([0067], graphics service manager 404 can be configured to identify which virtual machine is under the most stress; and [0075], determining that the first 3D graphics processing unit is overcommitted based on at least an amount of time the first 3D graphics processing unit takes to respond to commands) and the GPU-remoting client virtual machine, and a resource utilization of the host is greater than a threshold utilization
As per claim 5, Post teaches, wherein the destination host executes the GPU-remoting server virtual machine, and the destination host comprises sufficient resources for the GPU remoting client virtual machine ([0077], the decision to move virtual machine 414 to 3D GPU 504B can be based at least upon a determination that the estimated amount of graphics memory available to 3D GPU 504B is greater than a threshold, which could be based on the estimated amount of graphics memory utilized to render graphics for virtual machine 414. In this case, 3D graphics service manager 404 can be configured to move virtual machine 414 in response to a determination that 3D GPU 504 is overloaded and a determination that 3D GPU 504B can accommodate virtual machine 414).

As per claim 6, Wilt teaches, wherein the GPU-remoting client virtual machine accesses the GPU resource based on GPU-remoting Application Programming Interface (API) calls to the GPU-remoting server virtual machine ([0058], the interface device 410 may present a graphics API to the virtual compute instance 141B and receive API calls for graphics processing (e.g., accelerated 3D graphics processing). Via the network interface, the interface device 410 may communicate with the graphics server 420 (and thus with the physical GPU 152B) over a network…the physical compute instance 142B may implement a plurality of virtual compute instances, each with its own virtual interface, and the virtual compute instances may use the interface device 410 to interact with the corresponding virtual GPUs on one or more graphics servers; and [0059], Graphics offload performed by the interface device 410 (e.g., by executing custom program code on the interface device) may translate graphics API commands into network traffic (encapsulating the graphics API commands) that is transmitted to the graphics server 420, and the graphics server 420 may execute the commands on behalf of the interface device...the interface device 410 may receive calls to a graphics API (using the custom hardware , wherein the GPU resource is local to the GPU-remoting server virtual machine ([0083], In one embodiment, the application 620N may still have access to graphics processing provided by a local GPU (as discussed below with respect to FIG. 9A through FIG. 11) and/or a virtual GPU that is attached to the instance 141C but is not application-specific).

As per claim 7, Wilt  teaches, wherein a respective one of the GPU-remoting API calls comprises data and parameters issued by the GPU workload ([0058], the interface device 410 may present a graphics API to the virtual compute instance 141B and receive API calls for graphics processing (e.g., accelerated 3D graphics processing). Via the network interface, the interface device 410 may communicate with the graphics server 420 (and thus with the physical GPU 152B) over a network…the physical compute instance 142B may implement a plurality of virtual compute instances, each with its own virtual interface, and the virtual compute instances may use the interface device 410 to interact with the corresponding virtual GPUs on one or more graphics servers;  and [0059], Graphics offload performed by the interface device 410 (e.g., by executing custom program code on the interface device) may translate graphics API commands into network traffic (encapsulating the graphics API commands) that is transmitted to the graphics server 420, and the graphics server 420 may execute the commands on behalf of the interface device...the interface device 410 may receive calls to a graphics API (using the custom hardware interface) and generate graphics offload traffic to be sent to the network adapter 440 (using the network interface)).

As per claim 8, this is the “non-transitory computer-readable medium claim” corresponding to claim 1 and is rejected for the same reasons. The same motivation used in claim 1 is applicable to the instant claim.
As per claim 9, this claim is similar to claim 2 and is rejected for the same reasons.
As per claim 10, this claim is similar to claim 3 and is rejected for the same reasons.
As per claim 11, this claim is similar to claim 4 and is rejected for the same reasons.
As per claim 12, this claim is similar to claim 5 and is rejected for the same reasons.
As per claim 13, this claim is similar to claim 6 and is rejected for the same reasons.
As per claim 14, this claim is similar to claim 7 and is rejected for the same reasons.
As per claim 15, this is the “method claim” corresponding to claim 1 and is rejected for the same reasons. The same motivation used in claim 1 is applicable to the instant claim.
As per claim 16, this claim is similar to claim 2 and is rejected for the same reasons.
As per claim 17, this claim is similar to claim 3 and is rejected for the same reasons.
As per claim 18, this claim is similar to claim 4 and is rejected for the same reasons.
As per claim 19, this claim is similar to claim 5 and is rejected for the same reasons.
As per claim 20, this claim is similar to claim 6 and is rejected for the same reasons.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MELISSA A HEADLY whose telephone number is (571)272-1972. The examiner can normally be reached Monday- Friday 9-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199                                                                                                                                                                                                        

MELISSA A. HEADLY
Examiner
Art Unit 2199