DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statement (IDS) submitted on February 11, 2022 was filed on the mailing date of the application on February 11, 2022.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections

Claim 9 is objected to under 37 CFR 1.75 as being a substantial duplicate of claim 8. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m).

Double Patenting

The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 4-6, 8-10, 13, 16, 17, 19, and 20 are rejected on the ground of non-statutory double patenting as being unpatentable over claims 8, 11, 15, 16, 18, and 19 of U.S. Patent No. 11,282,160. Although the claims at issue are not identical, they are not patentably distinct from each other as shown in the tables below.

Present Application #17/669,647  
1
4
5
6
8
U.S. Patent #11,282,160  
15
16
15,18
15
15,19


Present Application #17/669,647  
9
10
13
16
17
19
20
U.S. Patent #11,282,160  
15,19
8
8
8,11
8
8,11
8


Present Application #17/669,647  Claim 1
U.S. Patent #11,282,160  Claim 15
A method comprising:
A method implemented at least in part by a system that includes a specialized processing unit, the method comprising:
receiving, at a service and from a first edge device that is remote from the service, a first request to reserve a first number of cores of a specialized processing unit of the service;
receiving, from a first application hosted on a first edge node that is remote from the system, a first request to reserve a first number of cores of the specialized processing unit during a first period of time;

receiving, from a second application hosted on a second edge node that is remote from the system and remote from the first edge node, a second request to reserve a second number of the cores of the specialized processing unit during a second period of time that at least partly overlaps the first period of time;
determining, based at least in part on a parameter included in the first request, that the first number of the cores of the specialized processing unit are available for use by the first edge device;
determining that the first application is associated with a higher priority than the second application;
reserving the first number of the cores for the first edge device; and
reserving, based at least in part on the determining that the first application is associated with the higher priority, the first number of the cores during the first period of time for the first application; and
sending, to the first edge device, an indication that the first number of the cores have been reserved for use by the first edge device.
sending, to the first application, an indication that the first number of the cores have been reserved for the first period of time for the first application.


Claim 1 of the present invention differ from claim 15 of the patent application in that claim 1 of the present invention is broader in scope than claim 15 of the patent application, thus encompasses that of the patent application.

Present Application #17/669,647  Claim 4
U.S. Patent #11,282,160  Claim 16
The method of claim 1, wherein
The method as recited in claim 15, wherein
the specialized processing unit comprises at least one of a Graphics Processing Unit (GPU), a Tensor Processing Unit (TPU), or a Data Processing Unit (DPU).
the specialized processing unit comprises at least one of a graphics processing unit (GPU) or a tensor processing unit (TPU).


Present Application #17/669,647  Claim 5
U.S. Patent #11,282,160  Claims 15 and 18
The method of claim 1, further comprising:
A method implemented at least in part by a system that includes a specialized processing unit, the method comprising:
receiving, at the service and from a second edge device that is remote from the service, a second request to reserve a second number of the cores of the specialized processing unit during a same period of time as the first number of the cores are to be reserved for the first edge device;
… receiving, from a second application hosted on a second edge node that is remote from the system and remote from the first edge node, a second request to reserve a second number of the cores of the specialized processing unit during a second period of time that at least partly overlaps the first period of time…(claim 15)
determining that the second number of the cores of the specialized processing unit are available for use by the second edge device;
… determining that the first application is associated with a higher priority than the second application… (claim 15)
reserving the second number of the cores for the second edge device; and
… reserving, based at least in part on the determining that the first application is associated with the higher priority, the second number of the cores a third period of time for the second application, the third period of time at least one of less than the second period of time or occurring later than the second period of time…(claim 18)
sending, to the second edge device, an indication that the second number of the cores have been reserved for use by the second edge device.
… sending, to the second application, an indication that the second number of the cores have been reserved for the third period of time for the second application…(claim 18)


Present Application #17/669,647  Claim 6
U.S. Patent #11,282,160  Claim 15
The method of claim 1, further comprising:
A method implemented at least in part by a system that includes a specialized processing unit, the method comprising:
receiving, at the service and from a second edge device that is remote from the service, a second request to reserve a second number of the cores of the specialized processing unit during a same period of time as the first number of the cores are to be reserved for the first edge device;
… receiving, from a second application hosted on a second edge node that is remote from the system and remote from the first edge node, a second request to reserve a second number of the cores of the specialized processing unit during a second period of time that at least partly overlaps the first period of time…
determining, by the service, that the first edge device is associated with a higher priority than the second edge device; and
… determining that the first application is associated with a higher priority than the second application… 
wherein reserving the first number of the cores for the first edge device is based at least in part on the first edge device being associated with the higher priority than the second edge device.
… reserving, based at least in part on the determining that the first application is associated with the higher priority, the first number of the cores during the first period of time for the first application…


Present Application #17/669,647  Claim 8
U.S. Patent #11,282,160  Claims 15 and 19
The method of claim 1, wherein
A method implemented at least in part by a system that includes a specialized processing unit, the method comprising:
the parameter included in the first request is indicative of at least one of:
… receiving, from a first application hosted on a first edge node that is remote from the system, a first request…(claim 15) 
a length of time that the first number of the cores are to be reserved;
… to reserve a first number of cores of the specialized processing unit during a first period of time…(claim 15)
a task that is to be performed by the first number of the cores;
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of …a task to be performed by the first application, a task to be performed by the second application…(claim 19)
a priority associated with the task that is to be performed; or
… determining that the first request is associated with a higher priority than the second request…(claim 19)
a service level that the first number of the cores are to provide for the task.
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of…a quality of service (QoS) level associated with the first application, a QoS level associated with the second application…(claim 19)


Present Application #17/669,647  Claim 9
U.S. Patent #11,282,160  Claims 15 and 19
The method of claim 1, wherein
A method implemented at least in part by a system that includes a specialized processing unit, the method comprising:
the parameter included in the first request is indicative of at least one of:
… receiving, from a first application hosted on a first edge node that is remote from the system, a first request…(claim 15) 
a length of time that the first number of the cores are to be reserved;
… to reserve a first number of cores of the specialized processing unit during a first period of time…(claim 15)
a task that is to be performed by the first number of the cores;
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of …a task to be performed by the first application, a task to be performed by the second application…(claim 19)
a priority associated with the task that is to be performed; or
… determining that the first request is associated with a higher priority than the second request…(claim 19)
a service level that the first number of the cores are to provide for the task.
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of…a quality of service (QoS) level associated with the first application, a QoS level associated with the second application…(claim 19)


Present Application #17/669,647  Claim 10
U.S. Patent #11,282,160  Claim 8
A system comprising:
A system comprising:
one or more specialized processing units; one or more processors; and
one or more first processors including at least one of a graphics processing unit (GPU) or a tensor processing unit (TPU);
one or more non-transitory computer-readable media storing instructions that, when executed, cause the one or more processors to perform operations comprising:
one or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause the one or more second processors to perform acts comprising:
receiving, at a service and from a first edge device that is remote from the service, a first request to reserve a first number of cores of a specialized processing unit of the service;
receiving, from a first application hosted on a first edge node that is remote from the system, a first request to reserve a first number of cores of the at least one of the GPU or the TPU during a first period of time;

receiving, from a second application hosted on a second edge node that is remote from the system and remote from the first edge node, a second request to reserve a second number of the cores of the at least one of the GPU or the TPU during a second period of time that at least partly overlaps the first period of time;
determining, based at least in part on a parameter included in the first request, that the first number of the cores of the specialized processing unit are available for use by the first edge device;
determining that the first application is associated with a higher priority than the second application;
reserving the first number of the cores for the first edge device; and
reserving, based at least in part on the first application being associated with the higher priority, the first number of the cores during the first period of time for the first application; and
sending, to the first edge device, an indication that the first number of the cores have been reserved for use by the first edge device.
sending, to the first application, an indication that the first number of the cores have been reserved for the first period of time for the first application.


Claim 10 of the present invention differ from claim 8 of the patent application in that claim 10 of the present invention is broader in scope than claim 8 of the patent application, thus encompasses that of the patent application.

Present Application #17/669,647  Claim 13
U.S. Patent #11,282,160  Claim 8
The system of claim 10, wherein
A system comprising:
the one or more specialized processing units include one or more Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), or Data Processing Units (DPUs).
… one or more first processors including at least one of a graphics processing unit (GPU) or a tensor processing unit (TPU)…


Present Application #17/669,647  Claim 16
U.S. Patent #11,282,160  Claims 8 and 11
The system of claim 10, wherein
The system comprising:
the parameter included in the first request is indicative of at least one of:
…receiving, at a server that includes a graphics processing unit (GPU) and from a first application hosted on a first edge node that is remote from the server, a first request…(claim 8) 
a length of time that the first number of the cores are to be reserved;
… to reserve a first number of cores of the GPU during a first period of time…(claim 8)
a task that is to be performed by the first number of the cores;
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of …a task to be performed by the first application, a task to be performed by the second application…(claim 11)
a priority associated with the task that is to be performed; or
… determining that the first request is associated with a higher priority than the second request…(claim 11)
a service level that the first number of the cores are to provide for the task.
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of…a quality of service (QoS) level associated with the first application, a QoS level associated with the second application…(claim 11)


Present Application #17/669,647  Claim 17
U.S. Patent #11,282,160  Claim 8

A system comprising:

one or more first processors including at least one of a graphics processing unit (GPU) or a tensor processing unit (TPU);
One or more non-transitory computer-readable media storing instructions that, when executed, cause one or more processors to perform operations comprising:
one or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause the one or more second processors to perform acts comprising:
receiving, at a service and from a first edge device that is remote from the service, a first request to reserve a first number of cores of a specialized processing unit of the service;
receiving, from a first application hosted on a first edge node that is remote from the system, a first request to reserve a first number of cores of the at least one of the GPU or the TPU during a first period of time;

receiving, from a second application hosted on a second edge node that is remote from the system and remote from the first edge node, a second request to reserve a second number of the cores of the at least one of the GPU or the TPU during a second period of time that at least partly overlaps the first period of time;
determining, based at least in part on a parameter included in the first request, that the first number of the cores of the specialized processing unit are available for use by the first edge device;
determining that the first application is associated with a higher priority than the second application;
reserving the first number of the cores for the first edge device; and
reserving, based at least in part on the first application being associated with the higher priority, the first number of the cores during the first period of time for the first application; and
sending, to the first edge device, an indication that the first number of the cores have been reserved for use by the first edge device.
sending, to the first application, an indication that the first number of the cores have been reserved for the first period of time for the first application.


Claim 17 of the present invention differ from claim 8 of the patent application in that claim 17 of the present invention is broader in scope than claim 8 of the patent application, thus encompasses that of the patent application.

Present Application #17/669,647  Claim 19
U.S. Patent #11,282,160  Claims 8 and 11
The one or more non-transitory computer-readable media of claim 17, wherein
The system comprising:
the parameter included in the first request is indicative of at least one of:
…receiving, at a server that includes a graphics processing unit (GPU) and from a first application hosted on a first edge node that is remote from the server, a first request…(claim 8) 
a length of time that the first number of the cores are to be reserved;
… to reserve a first number of cores of the GPU during a first period of time…(claim 8)
a task that is to be performed by the first number of the cores;
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of …a task to be performed by the first application, a task to be performed by the second application…(claim 11)
a priority associated with the task that is to be performed; or
… determining that the first request is associated with a higher priority than the second request…(claim 11)
a service level that the first number of the cores are to provide for the task.
… determining that the first request is associated with a higher priority than the second request is based at least in part on at least one of…a quality of service (QoS) level associated with the first application, a QoS level associated with the second application…(claim 11)


Present Application #17/669,647  Claim 20
U.S. Patent #11,282,160  Claim 8
The one or more non-transitory computer-readable media of claim 17, wherein
A system comprising:
the specialized processing unit comprises at least one of a Graphics Processing Unit (GPU), a Tensor Processing Unit (TPU), or a Data Processing Unit (DPU).
…one or more first processors including at least one of a graphics processing unit (GPU) or a tensor processing unit (TPU)…


Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gandhi et al. (US 9,830,677) in view of Zhao et al. (US 2019/0132257).

Claim 1 is similar in scope to claim 10 below, and is therefore rejected under similar rationale.

As to claim 2, Gandhi et al. modified with Zhao et al. disclose the first number of the cores requested (Gandhi, e.g. GPU processing cores requested by a first application, e.g. application 210) is less than a total number of the cores of the specialized processing unit (Gandhi, column 3, lines 43-53 notes a dedicated mode, where the total amount of allocated resources between the containers should be less than or equal to the total amount of available resources of the GPU, column 4, line 53-62 notes exposing the resource availability of GPUs 264, 274, 276, 284 to applications 210, 212 who can then decide how much of the resources each application 210, 212 wishes to utilize, then the application may request specific slices of specific GPUs; modified with Zhao, [0028] notes three types of quotas, type 1 is an absolute number, type 2 is a proportion of the total GPU resources, type 3 is a proportion of the available GPU resources, [0029] notes each application associated with a predetermined quota and quota setting, thus may be less than the total amount of GPU resources).

As to claim 3, Gandhi et al. modified with Zhao et al. disclose determining that the first number of the cores will be available for use by the first edge device after a period of time; and wherein the indication further indicates a length of the period of time before the first number of the cores will be available for use by the first edge device (Gandhi, column 3, lines 54 thru column 4, lines 3 notes multiplexed mode (e.g. determined by GPU gatekeeper), including time-based multiplexing, where the amount of time each container can execute may be limited (e.g. the amount of time each container containing the plurality of GPU processing cores may be utilized by an application before becoming available for use by another application, and priority-based multiplexing, where lower priority containers have to wait for higher priority containers to execute; modified with Zhao, [0042] notes resource monitoring module 316 (part of server 314 and may replace server drive 216 (thus perform similar functionality)) may transmit a signal to applications with lower priorities to defer sending of subsequent requests for GPU resources, where the application may place the subsequent requests in a corresponding queue thus indicating GPU resources are currently unavailable for a period of time).

As to claim 4, Gandhi et al. modified with Zhao et al. disclose the specialized processing unit comprises at least one of a Graphics Processing Unit (GPU), a Tensor Processing Unit (TPU), or a Data Processing Unit (DPU)(Gandhi, e.g. Figure 1, graphics processing unit (GPU) 102 and/or Figure 2, GPUs 264, 274, 276, and 284, Figure 5, GPU 37, where column 7, lines 43-51 notes GPU, e.g. GPU 102 and/or GPUs 264, 274, 276, 284, as a specialized electronic circuit; modified with Zhao, Figures 2 and 3, graphics processing units (GPUs) 220/318, 222/320, 224/322, 226/324).

As to claim 5, Gandhi et al. modified with Zhao et al. disclose receiving, at the service and from a second edge device that is remote from the service (Gandhi, e.g. a second one of nodes 260, 270, 280; modified with Zhao, e.g. a second user station and/or client device), a second request (Gandhi, e.g. request from a second application, e.g. application 212 executing on a different particular node, considered a “second request,” where a request from a first application, e.g. application 210 executing on a particular node, may be considered a “first request;” modified with Zhao, e.g. request from second one of first application 202/302, second application 204/304, or third application 206/306, e.g. request from second application 204/304 may be considered a “second request”) to reserve a second number of the cores of the specialized processing unit (Gandhi, e.g. GPU processing cores of one or more of GPUs 264, 274, 276, 284; modified with Zhao, e.g. one or more cores of GPUs 220/318, 222/320, 224/322, 226/324) during a same period of time as the first number of the cores are to be reserved for the first edge device (Gandhi, e.g. request from the second application, e.g. application 212, to allocate GPU resources, e.g. GPU processing cores and/or memory, at the same time as receiving a request from the first application, e.g. application 210, to allocate GPU resources, e.g. GPU processing cores and/or memory; modified with Zhao, e.g. request from second application, e.g. second application 204/304, to allocate GPU resources at the same time as receiving a request from the first application 202/302 to allocate GPU resources)(Gandhi, e.g. Figure 3, step 304, column 5, lines 12-19 notes receiving a first request from a first application for first requested GPU resources, e.g. a minimum and/or maximum amount of resources, the GPU resources comprising a processor and a memory, the processor further comprising multiple GPU cores; step 306, column 5, lines 19-23 notes receiving a second request from a second application for second requested GPU resources, e.g. a minimum and/or maximum amount of resources, where column 4, lines 45-62 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements (from applications 210, 212) to resource availability (on GPUs 264, 274, 276, 284), where GPU distributed gatekeeper 250 may decide how much of each GPU 264, 274, 276, 284 should be given to each application 210, 212 and/or GPU distributed gatekeeper 250 may expose resource availability to applications 210, 212, who can then decide how much of the GPU resources each application 210, 212 wishes to utilize, e.g. request specific slices of specific GPUs; modified with Zhao, [0033] notes server drive 216 of server 218 receives request for  first amount of GPU resources from any one of the first application 202, second application 204, and third application 206); determining that the second number of the cores of the specialized processing unit are available for use by the second edge device (Gandhi, e.g. Figure 3, steps 308, 310, column 5, lines 23-28 notes determining whether the request(s) may be fulfilled by getting the availability of the GPU capacity, e.g. the amount of resources available; modified with Zhao, [0029] notes applications 202, 204, and 206 can send a request in a predetermined format as <AppID, quotaType, quotaSetting> the AppID indicates an identifier of the application, the quotaType indicates one of quota types type 1, type 2, or type 3 (see [0028]), and the quotaSetting indicates the value of the quota, [0030] notes client drive may intercept requests associated with GPU from respective applications and forward them to the server 214 for processing, [0031] notes server drive 216 of server 214 receives the requests from the client drive and extract information associated with the quota from the request, [0032] notes server drive 216 may adopt a policy based on the request, [0033] notes the server drive 216 determines, based on the request and the maintained amount of resources already utilized by the first application 202 (or second application 204) in the GPU, the total amount of GPU resources to be utilized by the first application 202 (or second application 204), e.g. whether the total amount of GPU resources to be utilized by the first application 202 (or second application 204) does or does not exceed the predetermined quota associated with the first application 202 (or second application 204) (as determined by the request)); reserving the second number of the cores for the second edge device (Gandhi, e.g. Figure 3, step 314, column 5, lines 32-36 notes responsive to determining that the second requested amount of GPU resources are available, allocating a second slice of the GPU resources with a second requested amount to the second application, e.g. application 212; modified with Zhao, [0033] notes the server drive 216 determines, based on the request and the maintained amount of resources already utilized by the first application 202 (or second application 204) in the GPU, the total amount of GPU resources to be utilized by the first application 202 (or second application 204), and in response to the total amount of GPU resources to be utilized does not exceed the predetermined quota associated with the first application 202 (or second application 204) (as determined by the request), the requested amount of GPU resources are allocated to the first application 202); and sending, to the second edge device, an indication that the second number of the cores have been reserved for use by the second edge device (e.g. modified with Zhao, [0042] notes resource monitoring module 316 (part of server 314 and may replace server drive 216 (thus perform similar functionality)) may transmit a signal to all applications to indicate them to defer sending of subsequent requests for GPU resources, thus indicating GPU resources already reserved).

As to claim 6, Gandhi et al. modified with Zhao et al. disclose receiving, at the service and from a second edge device that is remote from the service (Gandhi, e.g. a second one of nodes 260, 270, 280; modified with Zhao, e.g. a second user station and/or client device), a second request (Gandhi, e.g. request from a second application, e.g. application 212, considered a “second request,” where a request from a first application, e.g. application 210, may be considered a “first request;” modified with Zhao, e.g. request from second one of first application 202/302, second application 204/304, or third application 206/306, e.g. request from second application 204/304 may be considered a “second request”) to reserve a second number of the cores of the specialized processing unit (Gandhi, e.g. GPU processing cores of one or more of GPUs 264, 274, 276, 284; modified with Zhao, e.g. one or more cores of GPUs 220/318, 222/320, 224/322, 226/324) during a same period of time as the first number of the cores are to be reserved for the first edge device (Gandhi, e.g. request from the second application, e.g. application 212, to allocate GPU resources, e.g. GPU processing cores and/or memory, at the same time as receiving a request from the first application, e.g. application 210, to allocate GPU resources, e.g. GPU processing cores and/or memory; modified with Zhao, e.g. request from second application, e.g. second application 204/304, to allocate GPU resources at the same time as receiving a request from the first application 202/302 to allocate GPU resources)(Gandhi, e.g. Figure 3, step 304, column 5, lines 12-19 notes receiving a first request from a first application for first requested GPU resources, e.g. a minimum and/or maximum amount of resources, the GPU resources comprising a processor and a memory, the processor further comprising multiple GPU cores; step 306, column 5, lines 19-23 notes receiving a second request from a second application for second requested GPU resources, e.g. a minimum and/or maximum amount of resources, where column 4, lines 45-62 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements (from applications 210, 212) to resource availability (on GPUs 264, 274, 276, 284), where GPU distributed gatekeeper 250 may decide how much of each GPU 264, 274, 276, 284 should be given to each application 210, 212 and/or GPU distributed gatekeeper 250 may expose resource availability to applications 210, 212, who can then decide how much of the GPU resources each application 210, 212 wishes to utilize, e.g. request specific slices of specific GPUs; modified with Zhao, [0033] notes server drive 216 of server 218 receives request for  first amount of GPU resources from any one of the first application 202, second application 204, and third application 206); determining, by the service, that the first edge device is associated with a higher priority than the second edge device (Gandhi, column 3, lines 67 thru column 4, lines 3 notes priority-based multiplexing where lower priority containers may have to wait for higher priority (e.g. determined by GPU distributed gatekeeper 250); modified with Zhao, [0022], [0023], [0028], [0029] note a predetermined quota associated with a priority of an application is set for the application, the predetermined quota increases with the increase of the priority, where setting the quota associated with the priority of the application for the application ensures an application with higher priority can acquire more resources of the dedicated processing unit than the application with lower priority); and wherein reserving the first number of the cores for the first edge device is based at least in part on the first edge device being associated with the higher priority than the second edge device (Gandhi, e.g. Figure 3, step 312, column 5, lines 28-32 notes responsive to determining that the first requested amount of GPU resources are available, allocating a slice of the GPU resources with a first requested amount to the first application, e.g. application 210, e.g. allocating higher priority application GPU resources over lower priority application, where the lower priority application will have to wait for the higher priority application; modified with Zhao, [0042] notes resource monitoring module 316 (part of server 314 and may replace server drive 216 (thus perform similar functionality)) may transmit a signal to applications with lower priorities to defer sending of subsequent requests for GPU resources, where the lower priority application(s) may place subsequent requests in a corresponding queue until a higher priority application no longer needs allocated GPU resources).

As to claim 7, Gandhi et al. modified with Zhao et al. disclose the first edge device is hosting an application (Gandhi, e.g. one of applications 210, 212), the application configured to utilize the cores of the specialized processing unit (Gandhi, e.g. GPU processing cores of one or more GPUs 264, 274, 276, 284) of the service to perform parallel tasks (Gandhi, column 4, lines 63 thru column 5, line 1 notes once mapping of applications 210, 212 to GPUs 264, 274, 276, 284 is complete, sharing at each GPU 264, 274, 276, 284 is managed by respective local gatekeepers 262, 272, 282, where each application 210, 212 may be responsible for distributing its computation and data on the multiple GPUs 264, 274, 276, 284, column 7, lines 43-51 notes graphics processing unit (GPU), e.g. such as GPUs 264, 274, 276, 284, has a highly parallel structure where processing of large block of data is done in parallel) and to utilize a processing unit of the first edge device to perform non-parallel tasks (Gandhi, column 5, lines 1-6 notes applications 210, 212 may also get access to at least one central processing unit (CPU) core on each node that hosts a GPU slice; Figure 5 further illustrates system comprising CPUs 21 and GPU 37, where CPU 21 may perform tasks while GPU 37 may perform tasks).

As to claim 8, Gandhi et al. modified with Zhao et al. disclose the parameter included in the first request (Gandhi, e.g. request by first application, e.g. application 210; modified with Zhao, e.g. request by first application, e.g. first application 202/302) is indicative of at least one of (Gandhi, column 4, lines 42-62 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements to resource availability and/or expose the resource availability of GPUs 264, 274, 276, 284 to applications 210, 212, who can then decide how much of the resources each application 210, 212 wishes to utilize, e.g. request specific slices of specific GPUs; modified with Zhao, [0029] notes applications 202, 204, and 206 can send a request in a predetermined format as <AppID, quotaType, quotaSetting> the AppID indicates an identifier of the application, the quotaType indicates one of quota types type 1, type 2, or type 3 (see [0028]), and the quotaSetting indicates the value of the quota): a length of time that the first number of the cores are to be reserved (Gandhi, column 3, lines 54-67 notes time-based multiplexing, where the amount of time each container can execute may be limited); a task that is to be performed by the first number of the cores (Gandhi, column 4, lines 45-48 notes applications 210, 212 specify their requirements, column 5, lines 12-14 notes the request may include a minimum and/or maximum amount of resources, where it is understood that the requirements and/or amount of resources correlates to how much work, e.g. tasks, are to be performed by the application); a priority associated with the task that is to be performed (Gandhi, column 3, line 67 thru column 4, lines 3 notes priority-based multiplexing, where lower priority containers have to wait for higher priority containers; modified with Zhao, [0022] and [0023] notes predetermined quota associated with a priority of an application, the predetermined quota indicating an upper limit of the resources allowed for use by the application); or a service level that the first number of the cores are to provide for the task (Gandhi, column 10, lines 13-19 notes service level management for resource allocation and management; modified with Zhao, [0040] notes Quality of Service (QoS) control of GPU resources at the server side).

Claim 9 is similar in scope to claim 8 above, and is therefore rejected under similar rationale.

As to claim 10, Gandhi et al. disclose a system (e.g. Figure 5, system 20, where systems 100 and/or 200 of Figures 1 and/or 2 may be implemented) comprising: one or more specialized processing units (e.g. Figure 1, graphics processing unit (GPU) 102 and/or Figure 2, GPUs 264, 274, 276, and 284, Figure 5, GPU 37); one or more processors (Figure 5, central processing units (CPU) 21); and one or more non-transitory computer-readable media (e.g. Figure 5, read-only memory (ROM) 22 and/or random access memory (RAM) 24) storing instructions that, when executed, cause the one or more processors to perform operations (column 10, lines 27-57 notes various types of computer readable mediums/medias having computer readable program instructions thereon for causing a processor, e.g. CPU 21, to carry out the present disclosure)(Please NOTE the system of Figure 2 is noted in the rejection below for simplicity, but may also apply for the system of Figure 1) comprising: receiving, from an edge node that is remote from the system (e.g. Figure 2 illustrates various nodes 260, 270, 280 comprising local GPU gatekeepers 262, 272, 282, respectively, and GPUs 264, 274, 276, 284, respectively for enabling resource sharing amongst the multiple GPUs 264, 274, 276, 284, where each of the nodes are comprised on the system 200), a request (e.g. a request from a first application, e.g. application 210 executing on a particular node, considered a “first request,” where a request from a second application, e.g. application 212 executing on a particular node, considered a “second request”) to reserve a number of cores of the one or more specialized processing units (e.g. GPU processing cores of one or more of GPUs 264, 274, 276, 284) for performing a task associated with an application (e.g. application 210, 212, where column 4, lines 66 thru column 5, line 1 notes application 210, 212 for distributing its computation and data on the multiple GPUs 264, 274, 276, 284) that is running on the edge node (e.g. running on a particular node)(e.g. Figure 3, step 304, column 5, lines 12-19 notes receiving a first request from a first application for first requested GPU resources, e.g. a minimum and/or maximum amount of resources, the GPU resources comprising a processor and a memory, the processor further comprising multiple GPU cores, where column 4, lines 45-62 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements (from applications 210, 212) to resource availability (on GPUs 264, 274, 276, 284), where GPU distributed gatekeeper 250 may decide how much of each GPU 264, 274, 276, 284 should be given to each application 210, 212 and/or GPU distributed gatekeeper 250 may expose resource availability to applications 210, 212, who can then decide how much of the GPU resources each application 210, 212 wishes to utilize, e.g. request specific slices of specific GPUs); determining, based at least in part on a parameter included in the request (e.g. column 4, lines 45-48 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, e.g. in the form of a request, column 5, lines 12-14 notes the request may include a minimum and/or maximum amount of resources), that the number of the cores are available to perform the task (e.g. Figure 3, steps 308, 310, column 5, lines 23-28 notes determining (e.g. via GPU distributed gatekeeper 250) whether the request(s) may be fulfilled by getting the availability of the GPU capacity, e.g. the amount of resources available, where column 4, lines 45-48 notes applications 210, 212 specify their requirements to GPU distributed gatekeeper 250, which can then perform matching of resource requirements (from applications 210, 212) to resource availability (on GPUs 264, 274, 276, 284), GPU distributed gatekeeper 250 may decide how much of each GPU 264, 274, 276, 284 should be given to each application 210, 212 and/or GPU distributed gatekeeper 250 may expose resource availability to applications 210, 212, who can then decide how much of the resources each application 210, 212 wishes to utilize, e.g. request specific slices of specific GPUs); reserving the number of the cores to perform the task (e.g. Figure 3, step 312, column 5, lines 28-32 notes responsive to determining that the first requested amount of GPU resources are available, allocating a first slice of the GPU resources with a first requested amount to the first application, e.g. application 210).

As noted in the rejection above, Gandhi et al. describes the request that is received may be of nodes, e.g. nodes 260, 270, 280 of the same system 200 as illustrated and noted in Figure 2.  However, Gandhi et al. further describes the systems may be implemented in various forms (see column 8, lines 4 thru column 9, lines 46), including in a cloud computing environment, which may comprise one or more cloud computing nodes 10 that may communicate over a network (Figure 6 and its associated text, e.g. column 9, lines 26-46), thus one or more nodes may be remote from the system.  Gandhi et al. differ from the invention defined in claim 10 in that Gandhi et al. do not disclose “sending, to the edge node, an indication that the number of the cores have been reserved to perform the task.”

Zhao et al. also disclose a system (Figures 2 and 3, architecture 200/300 with server 214/314, further illustrated in Figure 5) comprising: one or more specialized processing units (e.g. graphics processing units (GPUs) 220/318, 222/320, 224/322, 226/324); one or more processors (central processing unit (CPU) 501); and one or more non-transitory computer-readable media (ROM 502 and/or RAM 503) storing instructions that, when executed, cause the one or more processors to perform operations comprising: receiving, from an edge node that is remote from the system (e.g. application 202/302, second application 204/304, or third application 206/306 are illustrated as remote from server 214/314, e.g. executing on a user station and/or client device as illustrated in Figure 1), a request (e.g. request from one of first application 202/302, second application 204/304, or third application 206/306, e.g. request from first application 202/302 may be considered a “first request”) to reserve a number of cores of one or more specialized processing units (e.g. one or more cores of GPUs 220/318, 222/320, 224/322, 226/324) for performing a task associated with an application (e.g. application 202/302, second application 204/304, or third application 206/306) that is running on the edge node (e.g. user station/client devices)([0033] notes server drive 216 of server 218 receives request for  first amount of GPU resources from any one of the first application 202, second application 204, and third application 206, where [0024] notes GPU known to include massive cores, thus obvious the GPU resources requested may include a number of cores of a GPU); determining, based at least in part on a parameter included in the request, that the number of cores are available to perform the task ([0029] notes applications 202, 204, and 206 can send a request in a predetermined format as <AppID, quotaType, quotaSetting> the AppID indicates an identifier of the application, the quotaType indicates one of quota types type 1, type 2, or type 3 (see [0028]), and the quotaSetting indicates the value of the quota, [0030] notes client drive may intercept requests associated with GPU from respective applications and forward them to the server 214 for processing, [0031] notes server drive 216 of server 214 receives the requests from the client drive and extract information associated with the quota from the request, [0032] notes server drive 216 may adopt a policy based on the request, [0033] notes the server drive 216 determines, based on the request and the maintained amount of resources already utilized by the first application 202 in the GPU, the total amount of GPU resources to be utilized by the first application 202, e.g. whether the total amount of GPU resources to be utilized by the first application 202 does or does not exceed the predetermined quota associated with the first application 202 (as determined by the request)); reserving the number of cores to perform the task ([0033] notes the server drive 216 determines, based on the request and the maintained amount of resources already utilized by the first application 202 in the GPU, the total amount of GPU resources to be utilized by the first application 202, and in response to the total amount of GPU resources to be utilized does not exceed the predetermined quota associated with the first application 202 (as determined by the request), the requested amount of GPU resources are allocated to the first application 202); and sending, to the edge node (e.g. user station and/or client device executing applications), an indication that the number of cores have been reserved to perform the task ([0042] notes resource monitoring module 316 (part of server 314 and may replace server drive 216 (thus perform similar functionality)) can transmit a signal to all applications to indicate them to defer sending of subsequent requests for GPU resources, thus indicating GPU resources already reserved).

It would have been obvious to one of ordinary skill in the art at the time of the invention to modify Gandhi et al.’s system and method of reserving cores of a GPU with Zhao et al.’s method of sending a signal to applications regarding GPU resources as means of tracking and/or confirmation of GPU resources currently in use by applications. 

As to claim 11, Gandhi et al. modified with Zhao et al. disclose receiving, from the edge node, data associated with the task (Gandhi, e.g. computation and data of application 210, 212) that is to be performed by the number of the cores of the one or more specialized processing units (Gandhi, e.g. GPU processing cores of one or more GPUs 264, 274, 276, 284)(column 4, lines 62 thru column 5, lines 6 notes once the mapping of applications 210, 212 to GPUs is complete, sharing at each GPU is managed by the respective local gatekeepers 262, 272, 282, e.g. each application 210, 212 may be responsible for distributing its computation and data on multiple GPUs 264, 274, 276, 284); and causing the data associated with the task to be processed by the number of the cores of the one or more specialized processing units (Gandhi, column 7, lines 43-51 notes graphics processing unit (GPU), e.g. GPUs 264, 274, 276, 284, designed to manipulate and alter memory, e.g. computer graphics and image processing, to accelerate the creation of images in a frame buffer intended for output to a display).

Claim 12 is similar in scope to claim 2 above, and is therefore rejected under similar rationale.

Claim 13 is similar in scope to claim 4 above, and is therefore rejected under similar rationale.

Claim 14 is similar in scope to claim 7 above, and is therefore rejected under similar rationale.

Claim 15 is similar in scope to claim 3 above, and is therefore rejected under similar rationale.

Claim 16 is similar in scope to claim 8 above, and is therefore rejected under similar rationale.

Claim 17 is similar in scope to claim 10 above, and is therefore rejected under similar rationale.

Claim 18 is similar in scope to claim 2 above, and is therefore rejected under similar rationale.

Claim 19 is similar in scope to claim 8 above, and is therefore rejected under similar rationale.

Claim 20 is similar in scope to claim 4 above, and is therefore rejected under similar rationale.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACINTA M CRAWFORD whose telephone number is (571)270-1539. The examiner can normally be reached 9:00 a.m. to 5:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached on (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JACINTA M CRAWFORD/Primary Examiner, Art Unit 2612