DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on May 5, 2022 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on June 27, 2018. 
This action is in response to amendments and/or remarks filed on May 5, 2022. In the current amendment, claims 1, 5, and 17 are amended. No claims are cancelled. Claims 1-20 are pending. 
In response to amendments and/or remarks, the 35 U.S.C. 103 rejection applied to claims 1-20 made in the previous office action has been withdrawn. 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on May 16, 2022 was filed after the mailing date of the final office action on January 6, 2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Applicant’s representative, Adam Stone, on May 31, 2022. 
This Examiner’s Amendment is necessary to ensure that the claims are clear and definite under 35 U.S.C. 112(b). 

The Application has been amended as follows: 
Claims 10 and 17 are amended. 

10. (Currently Amended) The method of claim 5, further comprising: 


17. (Currently Amended) A system comprising: 
storage to store an application, the application including a machine learning model; and one or more electronic devices to implement an elastic inference service, the elastic inference service including an application instance and an accelerator appliance, the elastic inference service to: 
receive, in a multi-tenant web services provider, an application instance configuration for an application instance, the application instance configuration indicating both an arithmetic precision and a processing speed to be used in determining the portion of the accelerator to provision for hardware acceleration of machine learning model inference, the arithmetic precision being one of a plurality of arithmetic precision capabilities that the elastic inference service is configured to provide for hardware acceleration of machine learning model inference, and the processing speed being one of a plurality of processing speed capabilities that the elastic inference service is configured to provide for hardware acceleration of machine learning model inference; 
determine a portion of an accelerator’s compute capacity to provision to the application instance based at least in part on the arithmetic precision and the processing speed indicated by the application instance configuration; 
provision the portion of the accelerator to the application instance, wherein the application instance is implemented using a physical compute instance in a first location, wherein the portion of the accelerator is implemented using a physical accelerator in a second location, and wherein the physical accelerator is accessible to the physical compute instance; 
load a machine learning model of the application instance onto the portion of the accelerator; and 
perform inference using the loaded machine learning model of the application using the portion of the accelerator 

Allowable Subject Matter
Claims 1-20 are allowed. 

REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance:
Independent claims 1, 5, and 17 are considered allowable since when reading the claim in light of the specification, as per MPEP 2111.01, none of the references of record either alone or in combination fairly disclose or suggest the limitations specified in the claims, including at least: 
In claim 1: 
determining a portion of a GPU’s compute capacity to provision to the application instance based at least in part on both the arithmetic precision and the processing speed specified by the application instance configuration;
and performing inference using the loaded machine learning model of the application instance using the portion of the GPU, wherein using the portion of the GPU to perform inference using the loaded machine model comprises multiplexing execution of inference calls from the application instance on the GPU. 

In claim 5: 
determining a portion of an accelerator’s compute capacity to provision to the application instance based at least in part on the arithmetic precision and the processing speed indicated by the application instance configuration;
and performing inference using the loaded machine learning model of the application instance using the portion of the accelerator, wherein using the portion of the accelerator to perform inference using the loaded machine model comprises multiplexing execution of inference calls from the application instance on the accelerator.

In claim 17: 
determine a portion of an accelerator’s compute capacity to provision to the application instance based at least in part on the arithmetic precision and the processing speed indicated by the application instance configuration;
and perform inference using the loaded machine learning model of the application using the portion of the accelerator by multiplexing execution of inference calls from the application instance on the accelerator. 

The closest prior art of record is Wang et al. (“Real-time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off between QoS and QoR”) which teaches reconfiguring an accelerator by using different processing elements to handle 8 bit and 16 bit precisions. Therefore, the accelerator determines a portion of the accelerator to provision based on the precision that is specified by the convolutional neural network. However, Wang is silent with regards to provisioning compute capacity based on a specified processing speed and arithmetic precision. Likewise, Wang does not multiplex execution of inference calls from application instances on a GPU. Fong (US 2018/0276044) teaches scheduling workloads on GPUs according to computing requirements and resources, however this art does not teach provisioning based on both arithmetic precision and processing speed. Wilt (US 2017/0132746) teaches provisioning a portion of a GPU to an application instance and that the application instance can be implemented in a first location, a virtual GPU is implemented using a physical GPU in another location, and that the physical GPU would be accessible through a network. However, Wilt also does not teach provisioning a portion of a GPU based on a specified processing speed and arithmetic precision. 
Taken alone or in combination, the aforementioned prior art references do not sufficiently teach or suggest the claim limitations as recited in the claimed invention in each of independent claims 1, 5, and 17, which includes the features recited above. Therefore, the present claims are allowable. 
When taken as a whole, the dependent claims have been found allowable due to at least the above features recited in the independent claims upon which they depend. 

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144. The examiner can normally be reached Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122