DETAILED ACTION
Status of Claims:
Claims 21 – 40 are pending. 
Claims 1, 28, and 35 are amended. This rejection is FINAL. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 08/16/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claims 1, 28, and 35 are objected to because of the following informalities:  The newly amended claim language repeats the words “one or more” twice before the words “provisioning rules”. Appropriate correction is required.

Response to Arguments
Applicant's arguments in the amendments filed 11/04/2021 have been fully considered and they are persuasive. A new grounds of rejection is presented based on the Applicant’s arguments.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having 

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 21 – 40 are rejected under 35 U.S.C. 103 as being unpatentable over Li (US 9916636) and in view of Risbood (US 8261295).

As per claim 21, Li discloses a method, comprising: 
performing, at one or more computing devices: 
	identifying, … one or more remote graphics processing devices of a network-accessible graphics computing service to process requests from a graphics request source (A request (asking for the provisioning details of the GPU(s) to handle the workload) is sent to the GPU-aware data analytic platform, See Fig. 7 / Col. 15, Lines 42 - 67) as part of an auto-scaled group, and wherein the obtained scaling policy comprises one or more one or more provisioning rules for adding or removing remote graphics processing devices to or from the auto-scaled group (GPU auto-scaling module operates to automatically scale-up (adding) and/or scale-down (removing) the number of GPUs used by the workflow, for example, based on the monitored resource consumption information provided by the GPU monitoring component, See Col. 16, Lines 30 - 49); 
	causing one or more network connections to be established between the graphics request source  and a first remote graphics processing device of the one or more remote graphics processing devices (Data analytic platform issues a request to the GPU management component to provision the GPUs needed. In response, the GPU resource management component then issues a request via the management platform to provision the GPU (establishing connections) from the GPU resource pool, See Fig. 7 / Col. 16, Lines 3 - 19); and 
(The GPUs required are then provisioned from the resource pool. Processing on the workload (result of a graphics operation) is then initiated, See Fig. 7 / Col. 16, Lines 3 - 19).
	
	Li however does not expressly disclose:
	identifying, based at least in part on a scaling policy obtained via a programmatic interface.

	Risbood discloses:
	… (identifying) based at least in part on a scaling policy obtained via a programmatic interface (A class definition that models a virtual machine can include one or more class parameters for a scaling policy, such that the compiler can derive API calls to dynamically adjust the number of virtual machines deployed based on various performance metrics monitored by the cloud-service manager, See Col. 13, Lines 13 - 22),

	It would have been obvious to an artisan of ordinary skill at the time of the Applicant's filing date to combine Risbood’s teaching of identifying based on a scaling policy obtained via programmatic interface, along with auto-scaling graphics processing devices to improve Li’s system. Li and Risbood both disclose systems for implementing virtualized services via virtualized resources. Risbood’s system includes a scaling policy obtained using API calls for dynamically adjusting deployed virtual machines. The combination is an improvement upon the existing system because a scaling policy can be obtained via a programmatic interface, such as an API, to scale virtualized resources such as virtual machines that reside on processing devices as taught by Risbood, where the scaling policy can further be used to auto-scale the virtualized graphics processing devices via the virtual machines as taught by Li, to provide services to clients.


	reserving, in accordance with the exclusive provisioning mode, the one or more remote graphics processing devices for exclusive use by the graphics request source (Li, Once a particular resource of a resource pool (e.g., a GPU accelerator) is associated with a given server entity, that particular resource is not available to be used to constitute another server entity … preferably a server entity (once created) is associated with one and only one data center customer (tenant). In other words, server entities preferably are not shared across tenants, See Fig. 6 / Col. 12, Lines 5 - 16).

As per claim 23, the method as recited in claim 21, wherein the scaling policy indicates that a non-exclusive provisioning mode is to be employed to assign remote graphics processing devices, the method further comprising performing, at the one or more computing devices: 
	selecting, in accordance with the non-exclusive provisioning mode, the one or more remote graphics processing devices from a pool of remote graphics processing devices which is shared among a plurality of graphics request sources (Li, The IT datacenter that provides shared (public) resources is the “provider” and a customer or company that uses these shared resources to host, store and manage its data and applications (in all forms) is the “subscriber” (or “customer” or “tenant”), See Fig. 5 / Col. 10, Lines 29 – 67).

As per claim 24, the method as recited in claim 21, wherein the scaling policy indicates a particular category of a plurality of categories of remote graphics processing devices of the graphics computing service, wherein the particular category differs from another category of the plurality of categories in an performance capability, and wherein at least one remote graphics processing device of the one or more remote graphics processing devices belongs to the particular category (Li, The GPU-aware data analytic platform comprises a GPU sizing module, which decides the number and type of GPUs to use for a particular workload. There may be one or more GPU types, such as NVIDIA® Tesla™, NVIDIA GRID™ graphic cards, or the like. The platform preferably also includes a task-to-GPU assignment component, which assigns tasks within a workload to GPUs, e.g., based on workload characteristics, a task scheduling policy, or the like. Further, the platform preferably also includes a GPU auto-scaling module, which retrieves monitoring information from the GPU monitoring module and auto scales-up or -down the GPU resources in a fine granularity given the capability of the hardware cloud, changes to the workload, See Col. 14, Line 52 – Col. 15, Line 4).

As per claim 25, the method as recited in claim 21, wherein the scaling policy indicates one or more rules to be used to modify a number of remote graphics processing devices assigned to the first graphics request source, the method further comprising performing, by the one or more computing devices: assigning, in accordance with the one or more rules, an additional remote graphics processing device to the first graphics request source (Li, GPU auto-scaling module operates to automatically scale-up and/or scale-down the number of GPUs used by the workflow, for example, based on the monitored resource consumption information provided by the GPU monitoring component. To this end, information collected by the GPU monitoring component is communicated to the GPU auto-scaling component. Based on the monitored information (and, optionally, information collected about the health and status of other resources in the cloud), the auto-scaling component performs an auto-scaling computation. As a result of the computation, the data GPU auto-scaling component then instructs the management platform to scale-up or -down the GPU resources being used for the workload, See Col. 16, Lines 30 - 49).

As per claim 26, the method as recited in claim 21, further comprising performing, at the one or more computing devices: causing a transformed version of a first request packet originating at the graphics request source to be delivered to a particular remote graphics processing device, wherein the destination address of the first request packet differs from an address of the particular remote graphics processing device, and wherein the destination address of the transformed version is the address of the particular remote processing device (Li, A “micro-service” enabling data analytic workloads to automatically and transparently use GPU resources without providing (e.g., to the customer) the underlying provisioning details. As noted, the approach dynamically determines the number and the type of GPUs to use, and then during runtime auto-scales the GPUs based on workload, See Col. 18, Lines 24 - 35).

As per claim 27, the method as recited in claim 21, wherein the first remote graphics processing device comprises a virtualized device instantiated at a host comprising one or more graphics hardware devices including at least one graphics processing unit (GPU) (Li, The virtual machines, applications and tenant data represent a subscriber-accessible virtualized resource management domain. Through this domain, the subscriber's employees may access and manage (using various role-based privileges) virtualized resources they have been allocated by the provider and that are backed by physical IT infrastructure, See Fig. 5 / Col. 10, Lines 29 – 67).

As per claim 28, Li discloses a system, comprising: 
one or more computing devices (This environment comprises host machines (HVs) (e.g., servers or like physical machine computing devices) connected to a physical datacenter network, See Fig. 5 / Col. 10, Lines 29 – 67); 
wherein the one or more computing devices include instructions that upon execution on or across one or more processors  cause the one or more computing devices to: 
	identify, … one or more remote graphics processing devices of a network-accessible graphics computing service to process requests from a graphics request source (A request (asking for the provisioning details of the GPU(s) to handle the workload) is sent to the GPU-aware data analytic platform, See Fig. 7 / Col. 15, Lines 42 - 67) as part of an auto-scaled group, and wherein the obtained scaling policy comprises one or more one or more provisioning rules for adding or removing remote graphics processing devices to or from the auto-scaled group (GPU auto-scaling module operates to automatically scale-up (adding) and/or scale-down (removing) the number of GPUs used by the workflow, for example, based on the monitored resource consumption information provided by the GPU monitoring component, See Col. 16, Lines 30 - 49); 
	cause one or more network connections to be established between the graphics request source and a first remote graphics processing device of the one or more remote graphics processing devices (Data analytic platform issues a request to the GPU management component to provision the GPUs needed. In response, the GPU resource management component then issues a request via the management platform to provision the GPU (establishing connections) from the GPU resource pool, See Fig. 7 / Col. 16, Lines 3 – 19); and 
	transmit, from the first remote graphics processing device, a result of a graphics operation requested by the graphics request source via the one or more network connections (The GPUs required are then provisioned from the resource pool. Processing on the workload (result of a graphics operation) is then initiated, See Fig. 7 / Col. 16, Lines 3 - 19).

	Li however does not expressly disclose:
	identify, based at least in part on a scaling policy obtained via a programmatic interface.

	Risbood discloses:
	… (identify) based at least in part on a scaling policy obtained via a programmatic interface (A class definition that models a virtual machine can include one or more class parameters for a scaling policy, such that the compiler can derive API calls to dynamically adjust the number of virtual machines deployed based on various performance metrics monitored by the cloud-service manager, See Col. 13, Lines 13 - 22),

	It would have been obvious to an artisan of ordinary skill at the time of the Applicant's filing date to combine Risbood’s teaching of identifying based on a scaling policy obtained via programmatic interface, along with auto-scaling graphics processing devices to improve Li’s system. Li and Risbood both disclose systems for implementing virtualized services via virtualized resources. Risbood’s system includes a scaling policy obtained using API calls for dynamically adjusting deployed virtual machines. The combination is an improvement upon the existing system because a scaling policy can be obtained via a programmatic interface, such as an API, to scale virtualized resources such as virtual machines that reside on processing devices as taught by Risbood, where the scaling policy can further be used to auto-

As per claim 29, the system as recited in claim 28, wherein the scaling policy indicates that an exclusive provisioning mode is to be employed to assign remote graphics processing devices, and wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: 
	reserve, in accordance with the exclusive provisioning mode, the one or more remote graphics processing devices for exclusive use by the graphics request source (Li, Once a particular resource of a resource pool (e.g., a GPU accelerator) is associated with a given server entity, that particular resource is not available to be used to constitute another server entity … preferably a server entity (once created) is associated with one and only one data center customer (tenant). In other words, server entities preferably are not shared across tenants, See Fig. 6 / Col. 12, Lines 5 - 16).

As per claim 30, the system as recited in claim 28, wherein the scaling policy indicates that a non-exclusive provisioning mode is to be employed to assign remote graphics processing devices, and wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: 
	select, in accordance with the non-exclusive provisioning mode, the one or more remote graphics processing devices from a pool of remote graphics processing devices which is shared among a plurality of graphics request sources (Li, The IT datacenter that provides shared (public) resources is the “provider” and a customer or company that uses these shared resources to host, store and manage its data and applications (in all forms) is the “subscriber” (or “customer” or “tenant”), See Fig. 5 / Col. 10, Lines 29 – 67).

As per claim 31, the system as recited in claim 28, wherein the scaling policy indicates a particular category of a plurality of categories of remote graphics processing devices of the graphics computing service, wherein the particular category differs from another category of the plurality of categories in an (Li, The GPU-aware data analytic platform comprises a GPU sizing module, which decides the number and type of GPUs to use for a particular workload. There may be one or more GPU types, such as NVIDIA® Tesla™, NVIDIA GRID™ graphic cards, or the like. The platform preferably also includes a task-to-GPU assignment component, which assigns tasks within a workload to GPUs, e.g., based on workload characteristics, a task scheduling policy, or the like. Further, the platform preferably also includes a GPU auto-scaling module, which retrieves monitoring information from the GPU monitoring module and auto scales-up or -down the GPU resources in a fine granularity given the capability of the hardware cloud, changes to the workload, See Col. 14, Line 52 – Col. 15, Line 4).

As per claim 32, the system as recited in claim 28, wherein the scaling policy indicates one or more rules to be used to modify a number of remote graphics processing devices assigned to the first graphics request source, and wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: reduce, in accordance with the one or more rules, a count of remote graphics processing devices assigned to the first graphics request source (Li, A virtual compute instance may be provisioned, and a first set of one or more GPU(s) may be attached to the instance to provide graphics processing. The first set of one or more virtual GPUs may provide a particular level of graphics processing. After a change in GPU requirements for the instance is determined, the second set of one or more virtual GPU(s) may be selected and attached to the virtual compute instance to replace the graphics processing of the first virtual GPU(s) with a different level of graphics processing. The second virtual GPU(s) may be selected based on the change in GPU requirements, See Col. 13, Lines 26 - 56).

As per claim 33, the system as recited in claim 28, wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: cause a transformed version of a first request packet originating at the (Li, A “micro-service” enabling data analytic workloads to automatically and transparently use GPU resources without providing (e.g., to the customer) the underlying provisioning details. As noted, the approach dynamically determines the number and the type of GPUs to use, and then during runtime auto-scales the GPUs based on workload, See Col. 18, Lines 24 - 35).

As per claim 34, the method as recited in claim 21, wherein the first remote graphics processing device is configured to utilize at least one graphics processing unit (GPU) to perform operations requested from the graphics request source (Li, GPUs from a GPU accelerator pool are dynamically provisioned and scaled, e.g., to handle data analytic workloads in a hardware cloud, See Fig. 7 / Col. 14, Lines 35 - 51).

As per claim 35, Li discloses one or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors cause one or more computer systems to: 
	identify, … one or more remote graphics processing devices of a network-accessible graphics computing service to process requests from a graphics request source (A request (asking for the provisioning details of the GPU(s) to handle the workload) is sent to the GPU-aware data analytic platform, See Fig. 7 / Col. 15, Lines 42 - 67) as part of an auto-scaled group, and wherein the obtained scaling policy comprises one or more one or more provisioning rules for adding or removing remote graphics processing devices to or from the auto-scaled group (GPU auto-scaling module operates to automatically scale-up (adding) and/or scale-down (removing) the number of GPUs used by the workflow, for example, based on the monitored resource consumption information provided by the GPU monitoring component, See Col. 16, Lines 30 - 49);  
 (Data analytic platform issues a request to the GPU management component to provision the GPUs needed. In response, the GPU resource management component then issues a request via the management platform to provision the GPU (establishing connections) from the GPU resource pool, See Fig. 7 / Col. 16, Lines 3 – 19); and 
	transmit, from the first remote graphics processing device, a result of a graphics operation requested by the graphics request source via the one or more network connections (The GPUs required are then provisioned from the resource pool. Processing on the workload (result of a graphics operation) is then initiated, See Fig. 7 / Col. 16, Lines 3 - 19).

	Li however does not expressly disclose:
	identify, based at least in part on a scaling policy obtained via a programmatic interface.

	Risbood discloses:
	… (identify) based at least in part on a scaling policy obtained via a programmatic interface (A class definition that models a virtual machine can include one or more class parameters for a scaling policy, such that the compiler can derive API calls to dynamically adjust the number of virtual machines deployed based on various performance metrics monitored by the cloud-service manager, See Col. 13, Lines 13 - 22),

	It would have been obvious to an artisan of ordinary skill at the time of the Applicant's filing date to combine Risbood’s teaching of identifying based on a scaling policy obtained via programmatic interface, along with auto-scaling graphics processing devices to improve Li’s system. Li and Risbood both disclose systems for implementing virtualized services via virtualized resources. Risbood’s system includes a scaling policy obtained using API calls for dynamically adjusting deployed virtual machines. The combination is an improvement upon the existing system because a scaling policy can be obtained via a programmatic interface, such as an API, to scale virtualized resources such as virtual machines that 

As per claim 36, the one or more non-transitory computer-accessible storage media as recited in claim 35, wherein the scaling policy indicates that an exclusive provisioning mode is to be employed to assign remote graphics processing devices, and wherein the one or more storage media store further program instructions that when executed on or across the one or more processors further cause the one or more computer systems to: 
	reserve, in accordance with the exclusive provisioning mode, the one or more remote graphics processing devices for exclusive use by the graphics request source (Li, Once a particular resource of a resource pool (e.g., a GPU accelerator) is associated with a given server entity, that particular resource is not available to be used to constitute another server entity … preferably a server entity (once created) is associated with one and only one data center customer (tenant). In other words, server entities preferably are not shared across tenants, See Fig. 6 / Col. 12, Lines 5 - 16).

As per claim 37, the one or more non-transitory computer-accessible storage media as recited in claim 35, wherein the scaling policy indicates that a non-exclusive provisioning mode is to be employed to assign remote graphics processing devices, and wherein the one or more storage media store further program instructions that when executed on or across the one or more processors further cause the one or more computer systems to: 
	select, in accordance with the non-exclusive provisioning mode, the one or more remote graphics processing devices from a pool of remote graphics processing devices which is shared among a plurality of graphics request sources (Li, The IT datacenter that provides shared (public) resources is the “provider” and a customer or company that uses these shared resources to host, store and manage its data and applications (in all forms) is the “subscriber” (or “customer” or “tenant”), See Fig. 5 / Col. 10, Lines 29 – 67).

(Li,  The GPU-aware data analytic platform comprises a GPU sizing module, which decides the number and type of GPUs to use for a particular workload. There may be one or more GPU types, such as NVIDIA® Tesla™, NVIDIA GRID™ graphic cards, or the like. The platform preferably also includes a task-to-GPU assignment component, which assigns tasks within a workload to GPUs, e.g., based on workload characteristics, a task scheduling policy, or the like. Further, the platform preferably also includes a GPU auto-scaling module, which retrieves monitoring information from the GPU monitoring module and auto scales-up or -down the GPU resources in a fine granularity given the capability of the hardware cloud, changes to the workload, See Col. 14, Line 52 – Col. 15, Line 4).

As per claim 39, the one or more non-transitory computer-accessible storage media as recited in claim 35, wherein the scaling policy indicates one or more rules to be used to modify a number of remote graphics processing devices assigned to the first graphics request source, and wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: increase, in accordance with the one or more rules, a count of remote graphics processing devices assigned to the first graphics request source  (Li, GPU auto-scaling module operates to automatically scale-up and/or scale-down the number of GPUs used by the workflow, for example, based on the monitored resource consumption information provided by the GPU monitoring component. To this end, information collected by the GPU monitoring component is communicated to the GPU auto-scaling component. Based on the monitored information (and, optionally, information collected about the health and status of other resources in the cloud), the auto-scaling component performs an auto-scaling computation. As a result of the computation, the data GPU auto-scaling component then instructs the management platform to scale-up or -down the GPU resources being used for the workload, See Col. 16, Lines 30 - 49).

As per claim 40, the one or more non-transitory computer-accessible storage media as recited in claim 35, wherein the one or more computing devices include further instructions that upon execution on or across the one or more processors further cause the one or more computing devices to: cause a transformed version of a first request packet originating at the graphics request source to be delivered to a particular remote graphics processing device, wherein the destination address of the first request packet differs from an address of the particular remote graphics processing device, and wherein the destination address of the transformed version is the address of the particular remote processing device (Li, A “micro-service” enabling data analytic workloads to automatically and transparently use GPU resources without providing (e.g., to the customer) the underlying provisioning details. As noted, the approach dynamically determines the number and the type of GPUs to use, and then during runtime auto-scales the GPUs based on workload, See Col. 18, Lines 24 - 35).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kevin Bates can be reached on 571-272-3980.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/NAZIA NAOREEN/Examiner, Art Unit 2458                                                                                                                                                                                                        
/KEVIN T BATES/Supervisory Patent Examiner, Art Unit 2458