DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to application filed on 6/27/2018, wherein claims 1-20 are pending.

Double Patenting
Claims 1-20 of this application are patentably indistinct from claims 1-20 of Application No.16020788. Pursuant to 37 CFR 1.78(f), when two or more applications filed by the same applicant or assignee contain patentably indistinct claims, elimination of such claims from all but one application may be required in the absence of good and sufficient reason for their retention during pendency in more than one application. Applicant is required to either cancel the patentably indistinct claims from all but one application or maintain a clear line of demarcation between the applications. See MPEP § 822.
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent Application No.16/020788. Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not been patented.
A claim has been mapped out below as example:
Reference Application (16/020788)
Instant Application (16/020776)
1. A computer-implemented method, comprising:
receiving, in a multi-tenant web services provider, an application instance configuration, an application of the application instance to utilize a portion of an attached graphics processing unit (GPU) during execution of a machine learning model and the application instance configuration including 



an arithmetic precision of the machine learning model to be used in determining the portion of the GPU to provision; 








provisioning the application instance and the portion of the GPU attached to the application instance, wherein the application instance is implemented using a physical compute instance in a first instance location, wherein the portion of the GPU is implemented using a physical GPU in the second location, and wherein the physical GPU is accessible to the physical compute instance over a network; 




loading the machine learning model onto the portion of the GPU; and 

performing inference using the loaded machine learning model of the application using the portion of the GPU on the attached GPU.
1. A computer-implemented method, comprising:
receiving, in a multi-tenant web services provider, an application instance configuration, an application of the application instance to utilize a portion of an attached graphics processing unit (GPU) during execution of a machine learning model and the application instance configuration including: 

an indication of the central processing unit (CPU) capability to be used, 

an arithmetic precision of the machine learning model to be used, 

an indication of the GPU capability to be used, 

a storage location of the application, and 

an indication of an amount of random access memory to use; 

provisioning the application instance and the portion of the GPU attached to the application instance, wherein the application instance is implemented using a physical compute instance in a first instance location, wherein the portion of the GPU is implemented using a physical GPU in the second location, and wherein the physical GPU is accessible to the physical compute instance over a network; 

attaching the portion of the GPU to the application instance; 

loading the machine learning model onto the attached portion of the GPU; and 

performing inference using the loaded machine learning model of the application using the portion of the GPU on the attached GPU.


Regarding Claim 1, the reference application, in claim 1, does not teach the request includes an indication of the central processing unit capability to be used, an indication of the GPU capability to be used, a storage location of the application and an indication of an amount of random access memory to use, or attaching the portion of the GPU to the application instance. Fong et al. (US PGPUB 2018/0276044) teaches request requirements including   an indication of the central processing unit capability to be used, an indication of the GPU capability to be used, a storage location of the application and an indication of an amount of random access memory to use (paragraph 26), and attaching the portion of the GPU to the application instance (paragraph 30).  One of ordinary skill in the arts would have been motivated to make this modification in order to improve network topology-aware cloud scheduling of machine learning workloads (Fong, paragraph 3).
As for claims 5 and 16, they contain similar limitations as claim 1 above.  Thus, they are rejected under the same rationales.
As for claims 2-4, 6-15, and 17-20, they contain limitations that are similarly obvious to claims 2-4, 6-16, and 18-20 of reference application and does not offer additional limitations that renders them non-obvious in light of the reference application and Fong et al.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
The following claim limitations are unclear and indefinite:
As for claim 1, it is unclear what is meant by “…utilize a portion of a graphics processing unit (GPU) during execution of a machine learning model …provisioning the application instance and the portion of the GPU, wherein the application instance is implemented using a physical compute instance….wherein the portion of the GPU is implemented using a physical GPU……receiving an inference request by the application instance…transmitting the inference request data to…” First, it is unclear what is the meaning of “provisioning the application instance” in view of subsequent limitations of “…provisioning…the portion of the GPU…wherein the application instance is implemented using a physical compute instance…wherein the portion of the GPU is implemented using a physical GPU…receiving an inference request by the application instance….” where it appears provisioning is used to mean different functionalities regarding the application instance and portion of the GPU.  In particular, the provisioned application instance appears to function as software entity in subsequent limitations whereas portion of GPU is used as physical resources. Examiner note, provisioning is commonly understood as give said provisioned item as a resource, it is not understood how to provision/give an application instance as a resource when it appears that is the workload that needs to be given resource, and it appears the word provisioning is used in different functional capacity between provisioning the application instance, and the physical resources subsequently recited.  For the purpose of examination, Examiner assume provisioning means to provision the hardware resources related to execution of the applicant instance, not to actually allocate application instance as a resource for use by another program.
As for claim 1, it is additionally unclear what is meant by “provisioning…the portion of the GPU…attaching the portion of the GPU….using the attached portion of the GPU…configuring a plurality of GPU portions …for use by the application instance…” because if it is unclear if the portion of the GPU provisioned is a part of the subsequently claimed plurality of GPU portions, or if the portion of GPU provisioned is subsequently configured as a plurality of sub GPU portions.  Examiner note, claim in part states, “performing inference…using the attached portion of the GPU….” and yet, subsequently claims “configuring a plurality of GPU portions of the physical GPU for use by the application instance”, rendering it entirely unclear if the inference is performed on the attached portion of the GPU, or the configured “plurality of GPU portions”, and what is the relationship between the two claimed terms.  Moreover.  it appears attaching is required before performing inference, and attaching depends on provisioning the portion of the GPU first, it is unclear what is the meaning and scope of the configuring step of a plurality of GPU portions of the physical GPU in relation to the provisioning/attaching steps regarding portion of the GPU…. if it is separate and distinct from, a super set of, or a subset of steps of the provision/attaching.  For the purpose of examination, Examiner assume they do not have to be the same, and the claim limitation requires both an attached portion of the GPU, and a configured plurality of GPU portions to perform inference.  
As for claim 5, it is unclear what is meant by “…utilize a portion of an attached accelerator including a plurality of accelerator slots…provisioning the application instance, a physical compute instance including a configuration of a CPU, memory, storage, and networking capacity to execute the application, the plurality of accelerator slots, and an accelerator appliance including a configuration of a CPU, memory, storage, and networking capacity to execute a machine learning model of the application…wherein each accelerator slot is implemented using the physical accelerator, …hosted by the accelerator appliance…” First, it is unclear what is the meaning of “provisioning the application instance” in view of subsequent limitations including “…wherein the application instance is implemented using the physical compute instance”, i.e., provisioning means the same as implementing, or if the claim language means resource provisioning for the applicant instance running, which encompasses provisioning the physical compute instance, the accelerator slots, and accelerator appliance.  Examiner note, provisioning is commonly understood as give said provisioned item as a resource, it is not only not understood how to provision/give an application instance as a resource when it appears that is the workload that needs to be given resource, but also what is the distinction between provisioning and implementing in the subsequent limitation, and it appears the word provisioning is used in different functional capacity between provisioning the application instance, and the physical resources subsequently recited.  For the purpose of examination, Examiner assume provisioning means to provision the hardware resources related to execution of the applicant instance, not to actually allocate application instance as a resource for use by another program.
Second, it is unclear the relationship between the resources claimed, in particular, what is meant by “…an attached accelerator including a plurality of accelerator slots….the plurality of accelerator slots…an accelerator appliance including a configuration of a cpu, memory, storage and network capacity to execute a machine learning model of the application, wherein…each accelerator slot is implemented using the physical accelerator, hosted by the accelerator appliance…”.  it is unclear if the plurality of accelerator slots are part of “a configuration of a cpu, memory, storage, and networking capacity of accelerator appliance or a separate set of resources.  as it is recited separately from the accelerator appliance when provisioned, and yet physical accelerator implementing the slots is hosted by the accelerator appliance.  For the purpose of examination, Examiner assume provisioning of an accelerator appliance is separate and distinct from provisioning the plurality of accelerator slots of the accelerator, where the accelerator appliance provisioned resource is separate from the physical accelerator included in the accelerator appliance.
As for claim 16, it contain same defect as claim 5 above.  Thus, it is rejected under the same rationales. 
As for claims 2-4 and 6-15, and 17-20, they are rejected for failure to cure the defect of the claim upon which they depend.
The following claim limitations lacks antecedent basis:
Claim 5 and 16: “the physical accelerator” 

EXAMINER COMMENTS
Examiner and Applicant representative discussed a proposed amendment to move prosecution forward, where examiner suggested amending claim 5 according to the proposed amendment and make claim 16 corresponding in scope, and either cancel or make claim 1 corresponding in scope as claim 5 but as a product claim.  However, due to time constraint, no agreement could be made.  Examiner reproduces the proposed amendment for convenience:
5. (Currently Amended) A computer-implemented method, comprising: 
receiving, in a multi-tenant web services provider, an application instance configuration, an application of the application instance to utilize a portion of an , having a plurality of accelerator slots, during execution of a machine learning model, the application instance configuration including: 
an indication of a central processing unit (CPU) capability to be used, 
an arithmetic precision of the machine learning model to be used, 
an indication of  an accelerator capability to be used, 
a storage location of the application, and 
an indication of an amount of random access memory to use; 
based on the received application instance configuration, provisioning the application instance wherein provisioning includes: 
provisioning a physical compute instance including a configuration of a CPU, memory, storage, and networking capacity to execute the application in a first location, ;
provisioning an accelerator appliance including a configuration of a CPU, memory, storage, and networking capacity to execute a machine learning model of the application in a second location, wherein the accelerator appliance comprises one or more physical accelerators; and 
provisioning the plurality of accelerator slots of the accelerator, wherein the application instance is implemented using the physical compute instanceone of the s
attaching the plurality of accelerator slots to the application instance; 
loading the machine learning model onto the attached plurality of accelerator slots; and 
performing inference using the loaded machine learning model of the application using the attached plurality of accelerator slots, by: 
receiving an inference request by the application instance; 
transmitting inference request data to the attached plurality of accelerator slots; 
receiving and using in the application, an initial response from one of the attached plurality of accelerator slots; 
processing subsequent responses that are received by discarding one or more of the subsequent responses; and 
tracking timing of the responses to determine if migration in any attached accelerator slot from the attached plurality of accelerator slots should occur , wherein:
if a timing of one or more responses is greater than a threshold, performing a migration to a different accelerator slot from the attached plurality of accelerator slots, wherein the migration includes replacing one or more underperforming accelerator slots with new one or more accelerator slots to assume operation in place of the replaced one or more accelerator slots, and
if a timing of one or more responses is less than or equal to a threshold, not perform a migration.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN X LU whose telephone number is (571)270-1233.  The examiner can normally be reached on M-F 10am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 5712723759.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 

USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KEVIN X LU/
Examiner, Art Unit 2199

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199