Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112 
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.

In claim 1, the term “morphs the received requests into a plurality of threads, a plurality of dimensions and a plurality of memory sizes, wherein the dynamic adaptive scheduler module computes the dimensions of each of the received request” is not clear. The term “morph” is not very clear. The examiner believes a better term would be “group” instead of “morph”. 
 The transition from grouping (into plurality of threads with memory requirements) and then computing the dimensions is also not clear. If the dimensions are part of the grouping process, then why do they have to be computed before they are scheduled?. 
Perhaps the best way to clarify this is something along the lines of what is shown in Table 1. The examiner notices that this is consistent with how grids are used in CUDA parallel programming paradigm that makes blocks of threads.   
Paragraph 48 of the specification mentions “The computational cores of the coprocessors are divided into a set of blocks, and a set of blocks are divided into a set of grids. The neural networks compute their matrices in thread/block/grid dimensions based on the application request matrix size.”.
The best way to overcome this ambiguity is to make it clear that dimensions are computed in matrices involving thread/block/grid based on the request matrix size as shown in paragraph 48 above.
Claims 7 and 14 have the same problem and are rejected for the same reasons.
The remaining claims, not specifically mentioned, are rejected for being dependent upon one of the claims above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 7, 8, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (US 2019/0312772 A1) in view of Ashbaugh (US 2020/0293380 A1).

As per claim 1, Zhao teaches A device in a network, comprising: 
a non-transitory storage device having embodied therein one or more routines operable to dynamically and automatically sharing one or more resources of a coprocessor AI accelerator based on a plurality of workload changes during training and inference of a plurality of neural networks; (Zhao [0035] provisioning module 142 is configured to implement a topology aware provisioning process that is based on a "weighted" consideration of factors including current cluster topology and bandwidth usage, which enables the computing service platform 130 to provide intelligent, optimized computing infrastructures that can fully utilize state-of-the-art hardware accelerators (e.g., GPU, FPGA etc.) and better serve emerging workloads like distributed deep learning or other HPC workloads. While the exemplary scheduling and provisioning methods discussed herein can be implemented for various HPC applications, for illustrative purposes, the exemplary methods will be discussed in the context performing distributed DL training for Deep Neural Network (DNN) applications in a heterogeneous computing environment. In addition, embodiments of the invention will be discussed in the context of parallelizing DL training of a neural network using a plurality of accelerator devices (e.g., GPU devices) in a logical ring communication framework)
a plurality of client units configured to receive the morphed requests from the dynamic adaptive scheduler module, wherein each of the neural networks is mapped with at least one of the client units on a plurality of graphics processing unit (GPU) hosts; (Zhao [0004]  For example, one embodiment includes a method which comprises: receiving, by a control server node, a service request from a client system to perform a data processing job in a server cluster managed by the control server node; determining, by the control server node, candidate accelerator devices that reside in one or more server nodes of the server cluster, which can be utilized to perform the data processing job and [0015] FIG. 1 is a high-level schematic illustration of a system 100 which comprises a computing service platform that is configured to provide topology-aware provisioning of computing resources in a distributed heterogeneous computing environment, according to an embodiment of the invention. The system 100 comprises a plurality (m) of client systems 110-1, 110-2, . . . , 110-m (collectively referred to as client systems 110), a communications network 120, and a computing service platform 130 which can be accessed by the client systems 110 over the communications network 120.);
 a plurality of server units configured to receive the morphed requests from the plurality of client units; (Zhao Fig 1 Service Controller and [0020] The service controller 140 is configured to control and manage various functionalities of the computing service platform 130. For example, the service controller 140 receives service requests from the client systems 110 for executing HPC jobs on the server cluster 160 (e.g., distributed DL training, or other HPC jobs), and the received service requests are stored in the request queue 144. The service controller 140 utilizes the topology-aware provisioning system 140-1 to schedule and provision computing resources for jobs pending in the request queue 144. A service request can include various user-specified conditions and demands for executing a given job (e.g., DL training) associated with the service request. For example, a service request may specify (i) a desired number (N) of accelerator devices (e.g., GPU devices) to provision for the requested job, (ii) a specific type/model of accelerator device (e.g., NVidia P100 GPU, Tensor flow TPU, etc.) to be utilized for the requested job, (iii) whether the provisioned accelerator devices should be exclusively allocated for the requested job or can be shared with other jobs, and/or (iv) other conditions based on a service level agreement (SLA) with the given client. In addition, the provisioning of accelerator resources for pending jobs can be based on predefined policies of the service provider for handing specific types of jobs. The service request and associated provisioning specifications are stored in the request queue 144 pending scheduling by the computing resource scheduling and provisioning module 142.)
 and one or more coprocessors configured to receive the morphed request from the plurality of server units, wherein the coprocessors comprise a plurality of graphics processing units (GPUs), Field Programmable Gate Arrays (FPGAs), and a plurality of Artificial Intelligence (AI) Accelerators. (Zhao Fig 1 GPU Server Nodes 160-1 to 160-n and [0015] The GPU server nodes 160-1, 160-2, . .. , 160-n comprise reporting agents 162 and GPU devices164 (as well as other possible computing resources including, but not limited to, central processing units (CPUs), field programmable gate array (FPGA) devices, application specific integrated circuit (ASIC) devices, tensor processing units (TPUs), image processing units (IPUs), etc.). The server cluster 160 comprises a heterogeneous cluster of GPU server nodes which can have different hardware and network connection topologies/configurations, examples of which will be explained below with reference to FIGS. 3A through 3E.)

Zhao does not teach one or more processors coupled to the non-transitory storage device and operable to execute the one or more routines, wherein the one or more routines include: a dynamic adaptive scheduler module configured to receive a plurality of requests from each of neural network and a plurality of high-performance computing applications (HPCs), wherein the dynamic adaptive scheduler module morphs the received requests into a plurality of threads, a plurality of dimensions and a plurality of memory sizes, wherein the dynamic adaptive scheduler module computes the dimensions of each of the received request and dynamically assigns the dimensions for each of the received requests; 
However, Ashbugh teaches one or more processors coupled to the non-transitory storage device and operable to execute the one or more routines, wherein the one or more routines include: a dynamic adaptive scheduler module configured to receive a plurality of requests from each of neural network and a plurality of high-performance computing applications (HPCs), wherein the dynamic adaptive scheduler module morphs the received requests into a plurality of threads, a plurality of dimensions and a plurality of memory sizes, wherein the dynamic adaptive scheduler module computes the dimensions of each of the received request and dynamically assigns the dimensions for each of the received requests (Ashbugh [0040] In some embodiments, a graphics processing unit (GPU) is communicatively coupled to host/processor cores to accelerate graphics operations, machine-learning operations, pattern analysis operations, and various general-purpose GPU (GPGPU) functions. [0163] The architecture described above can be applied to perform training and inference operations using machine learning models. Machine learning has been successful at solving many kinds of tasks. The computations that arise when training and using machine learning algorithms (e.g., neural networks) lend themselves naturally to efficient parallel implementations. [0222] FIG. 15 is an illustration of scheduling of thread groups for graphics processing utilizing cache [memory size needed to store thread information] locality according to some embodiments. As illustrated in FIG. 15, a grid 1500 represents the scheduling of thread groups to processors. Certain sub-sets of thread assignments, such as sub-set 1510, are cached together. In some embodiments, in contrast with the conventional scheduling of thread groups illustrated in FIG. 14, thread groups are assigned utilizing cache locality. For example, in a particular instance thread groups 0 to 3 are assigned in a manner to follow the cache locality established for thread group assignments. [0224] In some embodiments, the bias is a hint regarding cache locality that may be utilized in thread group assignment, such as for kernels with regular access patterns. The hint may direct that there be an attempt to execute on a similar cache domain but allowing failure if this doesn't make sense under the circumstances, such as when there is a timeout that occurs or when following the hint would impair performance. The hint could take several forms, such as a preference to keep an N×M block of thread groups together, or a preference to schedule along one dimension versus another, or a combination of preferences for the scheduling of groups of threads.)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Ashbugh with the system of Zhao to schedule dynamically using dimensions. One having ordinary skill in the art would have been motivated to use Ashbugh into the system of Zhao for the purpose of thread group scheduling for graphics processing. (Ashbaugh paragraph 01)

As to claims 7, 8 and 14, they are rejected based on the same reason as claim 1.
As to claim 15, it is rejected based on the same reason as claim 8.

Claims 2, 5, 9, 12, 16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (US 2019/0312772 A1) in view of Ashbaugh (US 2020/0293380 A1) in further view of Maiyuran (US 2020/0293369 A1).

As per claim 2, Zhao and Ashbaugh do not teach the dynamic adaptive scheduler module computes a number of threads, and memory size required and allocates cores and memory on the coprocessors.
However, Maiyuran teaches the dynamic adaptive scheduler module computes a number of threads, and memory size required and allocates cores and memory on the coprocessors. (Maiyuran [0213] A scheduling optimization determines a number of available threads for a graphics multiprocessor, a memory size needed for a thread, and then schedules threads dynamically based on available memory space. Thus, the thread dispatch is dependent on shared memory allocation. [0214] At operation 1602, commands from a command queue in system memory are sent to a command parser (e.g., command parser 1710 of FIG. 17). At operation 1604, the command parser sends parsed commands to the thread dispatcher circuitry (e.g., thread dispatcher circuitry 1720). At operation 1606, the thread dispatcher circuitry determines a number of available threads for execution units (e.g., execution units 1740) of a graphics multiprocessor (e.g., graphics multiprocessor 1780) and a memory size needed for a thread and/or a thread-group. At operation 1608, the thread dispatcher dynamically determines whether memory (e.g., shared memory, cache) has any available space and if so a size of the available space. At operation 1610, when memory space for a new thread (or a thread-group) and available execution unit are available, the thread dispatcher circuitry dispatches the new thread to the available execution unit.)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Maiyuran with the system of Zhao and Ashbugh to compute number of threads and memory size. One having ordinary skill in the art would have been motivated to use Maiyuran into the system of Zhao and Ashbugh for the purpose of scheduling optimization to dynamically schedule threads based on dynamically determine memory space (e.g., cache, shared memory) of graphics processing unit.  (Maiyuran paragraph 20)

           As per claim 5, Zhao and Ashbaugh do not teach wherein one or more resources of a coprocessor comprises cores, threads, and memory.
          However, Maiyuran teaches wherein one or more resources of a coprocessor comprises cores, threads, and memory. (Maiyuran [0213] A scheduling optimization determines a number of available threads for a graphics multiprocessor, a memory size needed for a thread, and then schedules threads dynamically based on available memory space. Thus, the thread dispatch is dependent on shared memory allocation. [0214] At operation 1602, commands from a command queue in system memory are sent to a command parser (e.g., command parser 1710 of FIG. 17). At operation 1604, the command parser sends parsed commands to the thread dispatcher circuitry (e.g., thread dispatcher circuitry 1720). At operation 1606, the thread dispatcher circuitry determines a number of available threads for execution units (e.g., execution units 1740) of a graphics multiprocessor (e.g., graphics multiprocessor 1780) and a memory size needed for a thread and/or a thread-group. At operation 1608, the thread dispatcher dynamically determines whether memory (e.g., shared memory, cache) has any available space and if so a size of the available space. At operation 1610, when memory space for a new thread (or a thread-group) and available execution unit are available, the thread dispatcher circuitry dispatches the new thread to the available execution unit.)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Maiyuran with the system of Zhao and Ashbugh to use resources of a co-processor. One having ordinary skill in the art would have been motivated to use Maiyuran into the system of Zhao and Ashbugh for the purpose of scheduling optimization to dynamically schedule threads based on dynamically determine memory space (e.g., cache, shared memory) of graphics processing unit.  (Maiyuran paragraph 20)
. 
As to claims 9 and 16, they are rejected based on the same reason as claim 2.
As to claims 12 and 19, they are rejected based on the same reason as claim 5.

Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (US 2019/0312772 A1) in view of Ashbaugh (US 2020/0293380 A1) in further view of Sridharan (US 2019/0205745 A1).

As per claim 6, Zhao and Ashbaugh do not teach wherein the plurality of graphics processing units (GPUs), Field Programmable Gate Arrays (FPGAs), and Artificial Intelligence (AI) Accelerators are collected as a GPU pool by an orchestrator
However, Sridharan teaches wherein the plurality of graphics processing units (GPUs), Field Programmable Gate Arrays (FPGAs), and Artificial Intelligence (AI) Accelerators are collected as a GPU pool by an orchestrator. (Sridharan [0262] As shown in FIG. 29C a process 2930 for configuring server-less accelerator pools for inferencing can be implemented by a CPU server. Once an accelerator pool is configured, the CPU server can be removed from the inferencing critical path. Alternatively, a single CPU server can be coupled to multiple accelerator pools. While GPU pools are illustrated in FIGS. 29A-29B, accelerator pools can be configured to use various types of machine learning optimized processing elements, including but not limited to GPGPUs, FPGAs, ASICs, or other types of computing elements that are optimized for machine learning compute.)

It would have been obvious to a person in the ordinary skill in the art before the filing date of the claimed invention to combine Sridharan with the system of Zhao and Ashbugh to use a GPU pool. One having ordinary skill in the art would have been motivated to use Sridharan into the system of Zhao and Ashbugh for the purpose of achieving optimizations for distributed machine learning (Sridharan paragraph 01). 

As to claims 13 and 20, they are rejected based on the same reason as claim 6.
Allowable Subject Matter
Claims 3, 4, 10, 11, 17 and 18 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 20220100566 A1 – discloses an apparatus to facilitate metrics-based scheduling for hardware accelerator resources in a service mesh environment is disclosed. The apparatus includes processors to collect metrics corresponding to communication links between microservices of a service managed by a service mesh; determine, based on analysis of the metrics, that a workload of the service can be accelerated by offload to a hardware accelerator device; generate a rebalancing request to cause the workload to be assigned to the hardware accelerator device for execution of the service; cause the workload to be annotated to indicate execution by the hardware accelerator device; and deploy, based on the annotation, the workload to the hardware accelerator device for execution in accordance with a restart policy corresponding to the service.

US 20220083389 A1 – discloses node resource scheduling. AI inference services described herein may receive a request to execute a machine learning model in a clustered edge system. To determine which hardware resource comprising computing nodes of the clustered edge system on which to execute the machine learning model, AI inference services may compare the computational workload of the machine learning model, with the computational abilities and functions of the hardware resources. In examples, the comparison is based on a scheduling algorithm, including an identification stage to identify candidate hardware resources capable of executing the machine learning model, and a scoring stage to select the best candidate hardware resource for executing the machine learning model. A scheduler may assign the machine learning model to the selected hardware resource for execution by the AI inference services.

US 20210216375 A1 – discloses workload selection and placement in systems that include graphics processing units (GPUs) that are virtual GPU (vGPU) enabled. In some aspects, workloads are assigned to virtual graphics processing unit (vGPU)-enabled graphics processing units (GPUs) based on a variety of vGPU placement models. A number of vGPU placement neural networks are trained to maximize a composite efficiency metric based on workload data and GPU data for the plurality of vGPU placement models. A combined neural network selector is generated using the vGPU placement neural networks, and utilized to assign a workload to a vGPU-enabled GPU.

US 20210026696 A1 – discloses a method and apparatus for scheduling a plurality of available graphics processing units (GPUs). Multiple GPU pools may be set, wherein each GPU pool is configured to serve one to serve one or more jobs requiring the same number of GPUs. Available GPUs may be assigned to each GPU pool. A job and job information related to the job may be received, wherein the job information indicates a number of GPUs required for performing the job. A corresponding GPU pool may be selected from the multiple GPU pools based at least on the job information. Available GPUs to be scheduled to the job in the selected GPU pool may be determined based at least on the job information. In addition, the determined available GPUs may be scheduled to the job.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEHRAN KAMRAN whose telephone number is (571)272-3401.  The examiner can normally be reached on 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on (571)272-3652.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MEHRAN KAMRAN/           Primary Examiner, Art Unit 2196