DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (hereinafter Zhao) (US 2019/0312772 A1) in view of Guim Bernat et al. (hereinafter Guim Bernat) (US 2021/0144517 A1).

As to claim 1, Zhao teaches a method for scheduling tasks and allocating resources to perform a machine-learning workload using hardware accelerators that are each configured to implement a neural network comprising a plurality of neural network layers (scheduling and provisioning the selected group of accelerator devices to execute the data processing job using a neural network with multiple hidden layers) ([0004]; [0036]; Abstract), the method comprising: 
receiving a request to perform the machine-learning (ML) workload (the service controller 140 receives services requests from the client systems 110 for executing HPC jobs on the server cluster 160 (e.g., distributed DL training, distributed machine learning (ML), or other HPC jobs) ([0020]; [0002]; [0040]; [0057]); 
determining, based on the request, resource information to perform the ML workload at a distributed processing system comprising a plurality of hosts (one or more GPU server nodes 160-1, 160-2,…, 160-n are hosts), each host in the plurality of hosts comprising a respective plurality of hardware accelerators (the service controller 140 utilizes the topology-aware provisioning system 140-1 to schedule and provision computing resources for jobs pending in the request queue 144) ([0019]-[0020]; Fig. 1); 
determining, based on the resource information and the respective plurality of hardware accelerators for each host, a quantity of hosts that are each assigned to execute a respective task from a set of tasks that form the ML workload (a service request may specify (i) a desired number (N) of accelerator devices (e.g., GPU devices) to provision for the requested job, (ii) a specified type/model of accelerator device (e.g. Nvidia P100 GPU, Tensor flow TPU, etc.) to be utilized for the requested job, (iii) whether the provisioned accelerator devices should be exclusively allocated for the requested job or can be shared with other jobs, based on resource information ([0014]; [0020]); 
for each host in the quantity of hosts: 
generating, based on a memory access topology of the host, a respective task specification that specifies the task assigned to be executed at the host using resources of the host that include the respective plurality of hardware accelerators (the system topology view 420 includes information which indicates that (i) 4 GPUs were detected in the example topology 400 ([0064]); and 
providing the respective task specification to the host in the quantity of hosts (The resource allocation and provisioning module 142 will communicate with the topology graph generator and analysis module 146 to request a set of resources to allocate and provision for a new workload based on, e.g., an application ID, constraints, policies, user-specified conditions, etc. ([0033]); and 
performing the ML workload by executing, by each host in the quantity of hosts, the task specified in the respective task specification for the host (The computing resource scheduling and provisioning module 142 will allocate either a single GPU server node or multiple GPU server nodes within the cluster of GPU server nodes 160 to handle a given service request depending on, e.g., the available GPU devices and processing resources of the GPU server nodes, the nature of the GPU processing tasks associated with the service request.) ([0021]).
As shown above, Zhao teaches assigning tasks to execute based on resource information, but Zhao does not explicitly make clear that the resource information is a resource requirement.  However, Guim Bernat teaches executing computer processes based on resource requirements or service level agreements (SLA) or better resource locality ([0134]-[0135]; [0168]; [0481]; [0550]; [0568]; [1010]; [1044]; [1070]).  It would have been obvious to one of ordinary skill in the art to modify Zhao’s resource allocation system and method such that it would include the feature of process execution based on resource requirements, as taught and suggested in Guim Bernat.  The suggestion/motivation for doing so would have been to provide the predicted result of assuring end-to-end service with mandatory and expected standards that could be used to estimate how much resources are needed in a particular location.  Additionally, this service management framework is service aware and naturally balances the service delivery requirements with the capability and availability of the resources and the access for the data upload the data analytics systems. If the network transports degrade, fail or change to a higher cost or lower bandwidth function, service policy monitoring functions provide alternative analytics and service delivery mechanisms within the privacy or cost constraints of the user. With these features, the policies can trigger the invocation of analytics and dashboard services at the edge ensuring continuous service availability at reduced fidelity or granularity. Once network transports are re-established, regular data collection, upload and analytics services can resume (Guim Bernat - [0125]; [0139]; [0192]).

As to claim 2, Zhao teaches wherein: the memory access topology of each host comprises a respective non-uniform memory access (NUMA) topology that includes a respective memory that is local to the host; and the respective memory includes a socket interface that couples the respective memory to each hardware accelerator of the respective plurality of hardware accelerators and one or more other resources of the host ([0025]; [0032]-[0034]).

As to claim 3, Zhao teaches wherein executing the task specified in the respective task specification comprises: performing multiple neural network computations to generate an output for each neural network layer of the plurality of neural network layers in response to assigning respective portions of the multiple neural network computations to each hardware accelerator in the respective plurality of hardware accelerators ([0040]).

As to claim 4, Zhao teaches wherein performing the ML workload comprises: processing instructions for the respective task specification using each resource of a control group of the host and based on data exchanged between the respective memory, the hardware accelerator, and a respective processor that is included among the resources of the host (Fig. 1).

As to claim 5, Zhao teaches wherein performing the ML workload comprises: executing tasks specified in the respective task specification in response to processing the instructions based on the data being exchanged via a hardware socket that links each resource of the control group of the host, wherein the hardware socket defines a local communication bus that is shared among multiple resources managed by the host ([0034]; [0052]-[0053]).

As to claim 6, Guim Bernat teaches wherein a respective NUMA topology for a first host is based in part on: i) a respective first memory in a respective configuration of resources (system configurations) that is local to the first host (local region, etc.); and ii) a respective second, different memory in a respective configuration of resources that is local to a second, different host, but that is remote to the first host (different geographic region or locality, etc.) ([0125]; [0559]; [0590]-[0591]; [0687]; [1044]).

As to claim 7, Guim Bernat teaches wherein determining the quantity of hosts comprises: obtaining a system file that describes a configuration of resources that are managed by each host of the plurality of hosts; and determining the quantity of hosts based on the configuration of resources described in the system file for each host of the plurality of hosts ([0222]; [0283]).

As to claim 8, Zhao teaches identifying one or more sockets that couple resources of the host based on a system file that describes a mapping of NUMA sockets for each host of the plurality of hosts; and forming a control group of the host based on the one or more sockets that couple the resources of the host ([0025]).

As to claim 9, Zhao teaches assigning an ML task of the task specification to the control group of the host based on one or more socket interfaces for accelerators in the control group, wherein the socket interfaces are included in the mapping of NUMA sockets described in the system file; and using the accelerators in the control group to execute the ML task as a process under the control group ([0025]).

As to claim 10, it is rejected for the same reasons as stated in the rejection of claim 1.

As to claim 11, it is rejected for the same reasons as stated in the rejection of claim 2.

As to claim 12, it is rejected for the same reasons as stated in the rejection of claim 3.

As to claim 13, it is rejected for the same reasons as stated in the rejection of claim 4.

As to claim 14, it is rejected for the same reasons as stated in the rejection of claim 5.

As to claim 15, it is rejected for the same reasons as stated in the rejection of claim 6.

As to claim 16, it is rejected for the same reasons as stated in the rejection of claim 7.

As to claim 17, it is rejected for the same reasons as stated in the rejection of claim 8.

As to claim 18, it is rejected for the same reasons as stated in the rejection of claim 9.

As to claim 19, it is rejected for the same reasons as stated in the rejection of claim 1.

As to claim 20, it is rejected for the same reasons as stated in the rejection of claim 2.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Goglin teaches managing the topology of heterogeneous cluster nodes with hardware locality with NUMA architecture.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNETH TANG whose telephone number is (571)272-3772. The examiner can normally be reached Monday-Friday 7AM-3PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lewis Bullock can be reached on 571-272-3759. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KENNETH TANG/Primary Examiner, Art Unit 2199