DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on January 26, 2022 has been entered.

Status
This instant application No. 16/454,026 has claims 1-4, 7-8, 10-11, 14-15, 17-18, and 21 pending.  
Claims 5-6, 9, 12-13, 16, and 19-20 are cancelled. 

Claim Objections
An objection is made to claims 1, 8, and 15 the following reason: minor informality. Please see objection below, along with suggested amendments. 
Claim 1 – improper formatting of recited condition “responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors”, followed by sub-steps of “computing” and “assigning”. See language on next page.
(Currently Amended) A computer-implemented method, comprising: 
receiving, at a resource and power management (RPM) processor core, a request to execute a deep neural network (DNN) workload; 
dividing, by way of the RPM processor core, the DNN workload into a plurality of workload fragments; 
determining, by way of the RPM processor core, whether a workload fragment of the plurality of the workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors, the DNN processors comprising a plurality of neural processing elements; 
responsive to determining that the workload fragment is to be statically assigned to one of the plurality of DNN processors, assigning the workload fragment to a predetermined DNN processor of the plurality of DNN processors; 
storing an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of DNN processors; 
responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors:[[,]] 
computing an estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors based on the execution time scoreboard; and 
assigning, by way of the RPM processor core, the workload fragment to a DNN processor of the plurality of DNN processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors; 
Serial No.: 16/454,026-2- Atty/Agent: Leonard J. HopeNewport IP, LLCdetermining, by way of the RPM processor core, that one or more of the plurality of DNN processors are idle; and 
removing power from the one or more of the plurality of DNN processors that are determined to be idle.”
Claim 8 – improper formatting of recited condition “responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors”, followed by sub-steps of “computing” and “assigning”; also has improper tense on the verb “assigning”.
“(Currently Amended) A computing system, comprising: 
a plurality of deep neural network (DNN) processors; and 
a plurality of processor cores, one of the plurality of processor cores comprising a resource and power management (RPM) processor core, the RPM processor core configured to: 
receive a request to execute a deep neural network (DNN) workload from one of the plurality of processor cores;  
divide the DNN workload into a plurality of workload fragments; 
determine whether a workload fragment of the plurality workload fragments is to be statically or dynamically allocated to one of the plurality of DNN processors; 
responsive to determining that the workload fragment is to be statically allocated to one of the plurality of DNN processors,  assign the workload fragment to a predetermined DNN processor of the plurality of DNN processors; 
store an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of DNN processors; 
responsive to determining that the workload fragment is to be dynamically allocated to one of the plurality of DNN processors:[[,]] 
compute an estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors based on the execution time scoreboard; and 
assign the workload fragment of the plurality of workload fragments to one of the plurality of DNN processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors; 
determine that one or more of the plurality of DNN processors are idle; and 
remove power from the one or more of the plurality of DNN processors that are determined to be idle.”
Claim 15 – improper formatting of recited condition “responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors”, followed by sub-steps of “computing” and “assigning”. 
“(Currently Amended) A processor core configured to: 
receive a request to execute a deep neural network (DNN) workload; 
divide the DNN workload into a plurality of workload fragments; 
determine whether a workload fragment of the plurality of workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors; 
responsive to determining that the workload fragment is to be statically assigned to one of the plurality of DNN processors, assign the workload fragment to a predetermined DNN processor of the plurality of DNN processors; 
store an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of DNN processors; 
responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors:[[,]] 
compute an estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors based on the execution time scoreboard; and 
assign a workload fragment of the plurality of workload fragments to one of the plurality of DNN processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of DNN processors;
determine that one or more of the plurality of DNN processors are idle; and 
cause power to be removed from the one or more of the plurality of DNN processors that are determined to be idle.”

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 7-8, 10, 14-15, 17, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (Pub. No. US2020/0012531 filed on April 1, 2017; hereinafter Li) in view of Campos et al. (Pub. No. US2018/0293498; hereinafter Campos) in view of Shimizu et al. (Pub. No. US2008/0115143; hereinafter Shimizu) in view of Potter et al. (Pub. No. US2003/0023885; hereinafter Potter).
Regarding claim 1, Li discloses the following: 
A computer-implemented method, comprising: 
receiving, at a resource and power management (RPM) processor core, a request to execute a deep neural network (DNN) workload; 
(Li teaches receiving, at a resource and power management (RPM) processor core [0254-0255], a request to execute a deep neural network (DNN) [0046, 0192] workload [0058-0059], e.g. “receives commands directed to performing processing operations” [0058]. 

“processor 1900 may also include a set of one or more bus controller units 1916 and a system agent core 1910 ... System agent core 1910 provides management functionality for the various processor components” [0254])
dividing, by way of the RPM processor core, the DNN workload into a plurality of workload fragments; 
(Li teaches dividing, by way of the RPM processor core [0254-0255], the DNN workload into a plurality of workload fragments [0064, 0230], e.g. “divide the processing workload into approximately equal sized tasks, to better enable distribution of the graphics processing operations to multiple clusters 214A-214N of the processing cluster array 212” [0064])
determining, by way of the RPM processor core, whether a workload fragment of the plurality of the workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors, 
(Li teaches determining, by way of the RPM processor core [0254-0255], whether a workload fragment of the plurality of the workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors [0164-0165] – this is done by “detection/observation logic” [0164] and “a status query to check on the shared function pipeline's status before dispatching any workload on it” [0165])
responsive to determining that the workload fragment is to be statically assigned to one of the plurality of DNN processors, assigning the workload fragment to a predetermined DNN processor of the plurality of DNN processors 
(Li discloses responsive to determining that the workload fragment is to be statically assigned - “conventional static workload scheduling techniques” [0171] – to one of the plurality of DNN processors, e.g. “parallel processors such as general-purpose graphic processing units (GPGPUs) have 
responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors, 
assigning the workload fragment to a DNN processor of the plurality of DNN processors based upon an estimated time to completion of workload fragments executing on the plurality of DNN processors.  
(Li discloses responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of DNN processors [0060, 0168], e.g. “The scheduling can be handled dynamically by the scheduler 210” [0060], assigning the workload fragment to a DNN processor of the plurality of DNN processors [0113, 0199], e.g. “the general-purpose processing unit (GPGPU) 1100 can be configured to be particularly efficient in processing the type of computational workloads associated with training deep neural networks” [0199], based upon an estimated time to completion of workload fragments [0122] executing on the plurality of DNN processors [0199], e.g. “the graphics acceleration module 446 may adhere to the following requirements: ... 2) An application's job request is guaranteed by the graphics acceleration module 446 to complete in a specified amount of time, including any translation faults, or the graphics acceleration module 446 provides the ability to preempt the processing of the job” [0122]) 

However, Li does not disclose the following:
(1)	the DNN processors comprising a plurality of neural processing elements; and 
(2)	DNN processors are a type of processor
Nonetheless, this feature would have been made obvious, as evidenced by Campos.
(1) (Campos discloses a set of the DNN processors comprising a plurality of neural processing elements, e.g. “a network of intelligent processing nodes making up an AI object” [0044]) 
(2) (Campos discloses that DNN processors [0044] are a type of processor, e.g. “one or more processors in the one or more computing platforms” [0047])
These prior art elements of Campos can be substituted for the prior art elements of Li, in order to yield an expected outcome for the system of Li. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li with the teachings of Campos. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale B.  Simple substitution of one known, equivalent element for another to obtain predictable results.
The predictable result would have been as follows: “Training can include teaching a network of intelligent processing nodes to get one or more outcomes, for example, on a simulator” [0195 – Campos].

However, Li in view of Campos does not disclose the following:
(1)	storing an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors; 
(2)	responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of processors, 
computing an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard and 
assigning, by way of the RPM processor core, the workload fragment to a processor of the plurality of processors the estimated time to completion of the execution of the workload fragments currently executing on the 
Nonetheless, this feature would have been made obvious, as evidenced by Shimizu.
(1) (Shimizu discloses a well-known method of storing an execution time scoreboard, e.g. “a job completion ratio” [0053, 0055] comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors, e.g. “Since tasks are processed one by one in each of the nodes, a job completion ratio can be calculated by counting the number of tasks that have been processed every time a task is processed and comparing the number of tasks with the total number of tasks in a job. After predetermined time has elapsed after starting execution of a job, 2% of tasks have been completed in the node 1, and no task has been processed in the node 2, as shown in (A) of FIG. 3. However, after the job is divided and transferred in the node 1, 3% of tasks have been completed in the node 1, and 1% of tasks have been completed in the node 2, as shown in (B) of FIG. 3” [0053]) 
(2) (Shimizu discloses a well-known method of responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of processors, 
computing an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard, e.g. “Then, completion time T.sub.E necessary to complete a job is estimated from the progress rate p and the elapsed time T according to equation T.sub.E=T(1-p)/p (step 710). Then, remaining time until deadline time is calculated with reference to data of the deadline time stored in the memory unit 122 and is compared with the completion time T.sub.E (step 715). 
assigning, by way of the RPM processor core or “execution adaptor unit” [0046], the workload fragment to a processor of the plurality of processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors, e.g. “In a node A, when it is determined that tasks need to be divided, setup time T.sub.SI necessary for setup, such as copying of divided data, and acquisition time T.sub.SO necessary to receive a result of processing after the processing in a remote node B is completed are first estimated. Then, the setup time T.sub.SI and the acquisition time T.sub.SO are added to estimated completion time T.sub.E, and the result is divided into two parts, so that time, out of the completion time T.sub.E, necessary for a first group of divided tasks, such as the first half of the tasks, to be left in the node A is necessary time T.sub.1, and remaining time, out of the completion time T.sub.E, necessary for a second group of the divided tasks, such as the second half of the tasks, to be delegated to the node B is necessary time T.sub.2.” [0065]) 
These well-known methods performed on the processors of Shimizu can also be applied to the similar DNN processors of Li in view of Campos in the same way, for improvement in a DNN processing environment. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos with the teachings of Shimizu. 

Using the known storing, computing, and assigning techniques of Shimizu would be beneficial to the DNN processing environment of Li in view of Campos for the following reason: “Scalable distributed job execution can be implemented by appropriately dividing tasks on each of the nodes, transferring some of the tasks to another node, and executing some of the tasks on the other node, as necessary, in response to the operational status of the tasks on each of the nodes, scalable distributed job execution involving appropriate task division” [0032 – Shimizu].

However, Li in view of Campos in view of Shimizu does not disclose the following:
(1)	determining, by way of the RPM processor core, that one or more of the plurality of processors are idle; and 
(2)	removing power from the one or more of the plurality of processors that are determined to be idle.
Nonetheless, this feature would have been made obvious, as evidenced by Potter.
(1) (Potter discloses determining, by way of the RPM processor code or “master power management agent (PMA) 206” [0032] that “can request the power usage values from each TPC to determine the overall power usage by the rotation group 220” [0034], that one or more of the plurality of processors [0022-0023, 0027; FIG. 2, Element 232; Claim 20 of Potter] are idle [0041], e.g. “Examples of such techniques include the master PMA waiting a predetermined sufficient period of time to permit the TPC to become idle, the master PMA receiving a message from the load balancer 202 that the TPC is idle, or the master PMA polling the TPC until the TPC reports it has completed processing all pending transactions or equivalent message” [0041]) 
(2) (Potter discloses removing power from the one or more of the plurality of processors that are determined to be idle [0041], e.g. "Once that TPC has ceased receiving transactions and is in an " idle” state (i.e., not processing a transaction), the master PMA 206 can then command that TPC'c slave PMA 228 to the off state. The master PMA 206 may determine the appropriate time to turn off the TPC in accordance with any suitable technique. Examples of such techniques include the master PMA waiting a predetermined sufficient period of time to permit the TPC to become idle, the master PMA receiving a message from the load balancer 202 that the TPC is idle, or the master PMA polling the TPC until the TPC reports it has completed processing all pending transactions or equivalent message" [0041])
These well-known techniques performed on the processor of Potter can also be applied to the improved DNN processing environment of Li in view of Campos in view of Shimizu.
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos in view of Shimizu with the teachings of Potter. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale C.  Use of known techniques to improve a similar processing device in the same way.
The well-known techniques of Potter would be suitable to the DNN processing environment of Li in view of Campos in view of Shimizu, as they improve capabilities of an RPM processor, enabling it with a feature to determine the appropriate time to turn off the TPC” and turn off its processing elements accordingly [0041 – Potter].
Regarding claims 2, 10, and 17, Li in view of Campos in view of Shimizu in view of Potter discloses the following: 
wherein each of the plurality of DNN processors is configured to maintain a workload fragment queue, and wherein assigning the workload fragment to one of the plurality of DNN processors comprises enqueuing the workload fragment on a workload fragment queue associated with the one of the plurality of DNN processors.  
Li discloses features, wherein each of the plurality of DNN processors is configured to maintain a workload fragment queue [0059, 0112-0113], and wherein assigning the workload fragment to one of the plurality of DNN processors comprises enqueuing the workload fragment on a workload fragment queue [0125; FIG. 4D, Elements 483-484] associated with the one of the plurality of DNN processors [0125])
Regarding claims 7, 14, and 21, Li in view of Campos in view of Shimizu in view of Potter disclose the following: 
a DNN processor is a processor
(Li discloses that a DNN processor is a processor [0164-0165])

However, Li does not disclose the following:
wherein the workload fragments are configured to store the data indicating the percentage completion of execution of the workload fragments executing on the plurality of processors in the execution time scoreboard.
Nonetheless, this feature would have been made obvious, as evidenced by Shimizu.
(Shimizu teaches that the workload fragments on each node are configured to store the data indicating the percentage completion of execution, evidence by “the received data of the job completion ratio of each of the nodes” [0050], of the workload fragments executing on the plurality of processors in the execution time scoreboard [0049-0050, 0053, 0055])
These teachings of Shimizu are applicable to the workload fragments of Li in view of Campos.
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos with the teachings of Shimizu. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale G. Teaching, Suggestion, and Motivation. 
Shimizu].
Regarding claim 8, Li discloses the following: 
A computing system, comprising: 
a plurality of deep neural network (DNN) processors; and 
(Li discloses a plurality of processors [0045, 0057] for a deep neural network (DNN) [0046])
a plurality of processor cores, one of the plurality of processor cores comprising a resource and power management (RPM) processor core, 
(Li discloses a plurality of processor cores [0047, 0051], one of the plurality of processor cores [0051] comprising a resource and power management (RPM) processor core – see system agent core [0254-0255])
the RPM processor core configured to 
receive a request to execute a deep neural network (DNN) workload from one of the plurality of processor cores; [[,]]
(Li teaches receiving, at a resource and power management (RPM) processor core [0254-0255], a request to execute a deep neural network (DNN) [0046, 0192] workload [0058-0059] from one of the plurality of processor cores [0047, 0051], e.g. “receives commands directed to performing processing operations” [0058]. 
For more evidence of a resource and power management (RPM), see the following citations below: 
“processor 1900 may also include a set of one or more bus controller units 1916 and a system agent core 1910 ... System agent core 1910 provides management functionality for the various processor components” [0254]
“System agent core 1910 may additionally include a power control unit (PCU), which includes logic and components to regulate the power state of processor cores 1902A-1902N and graphics processor 1908” [0255])
divide the DNN workload into a plurality of workload fragments; [[,]] and 
(Li teaches dividing the DNN workload into a plurality of workload fragments [0064, 0230], e.g. “divide the processing workload into approximately equal sized tasks, to better enable distribution of the graphics processing operations to multiple clusters 214A-214N of the processing cluster array 212” [0064])
determine whether a workload fragment of the plurality workload fragments is to be statically or dynamically allocated to one of the plurality of DNN processors; [[,]] 
(Li teaches determining whether a workload fragment of the plurality of the workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors [0164-0165] – this is done by  “detection/observation logic” [0164] and “a status query to check on the shared function pipeline's status before dispatching any workload on it” [0165])
responsive to determining that the workload fragment is to be statically allocated to one of the plurality of DNN processors, assigning the workload fragment to a predetermined DNN processor of the plurality of DNN processors; [[, and]]  
(Li discloses responsive to determining that the workload fragment is to be statically allocated – “conventional static workload scheduling techniques” [0171] – to one of the plurality of DNN processors, e.g. “parallel processors such as general-purpose graphic processing units (GPGPUs) have played a significant role in the practical implementation of deep neural networks” [0004], assigning the workload fragment to a predetermined DNN processor of the plurality of DNN processors for processing by the neural processing elements [0004, 0169-0171])
assign the workload fragment of the plurality of workload fragments to one of the plurality of DNN processors based upon [[an]] the estimated time to completion of the execution of the workload fragments executing on the plurality of DNN processors.  
(Li teaches assigning the workload fragment to a DNN processor of the plurality of DNN processors [0060, 0113, 0199], e.g. “the general-purpose processing unit (GPGPU) 1100 can be configured to be particularly efficient in processing the type of computational workloads associated with training deep neural networks” [0199], based upon the estimated time to completion of the execution of the workload fragments [0122] executing on the plurality of DNN processors [0199], e.g. “the graphics acceleration module 446 may adhere to the following requirements: ... 2) An application's job request is guaranteed by the graphics acceleration module 446 to complete in a specified amount of time, including any translation faults, or the graphics acceleration module 446 provides the ability to preempt the processing of the job” [0122]) 

However, Li does not disclose the following:
DNN processors are a type of processor
Nonetheless, this feature would have been made obvious, as evidenced by Campos.
(Campos discloses that DNN processors [0044] are a type of processor, e.g. “one or more processors in the one or more computing platforms” [0047])
These prior art elements of Campos can be substituted for the prior art elements of Li, in order to yield an expected outcome for the system of Li. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li with the teachings of Campos. 

The predictable result would have been as follows: “Training can include teaching a network of intelligent processing nodes to get one or more outcomes, for example, on a simulator” [0195 – Campos].

However, Li in view of Campos does not disclose the following:
(1)	store an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors; 
(2)	responsive to determining that the workload fragment is to be dynamically allocated to one of the plurality of processors, 
compute an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard and 
assign the workload fragment of the plurality of workload fragments to one of the plurality of processors based upon [[an]] the estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors[[,]]
Nonetheless, this feature would have been made obvious, as evidenced by Shimizu.
(1) (Shimizu discloses a well-known method of storing an execution time scoreboard, e.g. “a job completion ratio” [0053, 0055] comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors, e.g. “Since tasks are processed one by one in each of the nodes, a job completion ratio can be calculated by counting the number of tasks that have been processed every time a task is processed and comparing the number of tasks with the total number of tasks in a job. After predetermined time has elapsed after starting execution of a  
(2) (Shimizu discloses a well-known method of responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of processors, 
computing an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard, e.g. “Then, completion time T.sub.E necessary to complete a job is estimated from the progress rate p and the elapsed time T according to equation T.sub.E=T(1-p)/p (step 710). Then, remaining time until deadline time is calculated with reference to data of the deadline time stored in the memory unit 122 and is compared with the completion time T.sub.E (step 715). When the remaining time is longer than the completion time T.sub.E, it is determined that division is not necessary, so that the process of determining whether to perform division is completed, and the process proceeds to step 620 in FIG. 6. When the remaining time is not longer than the completion time T.sub.E, an inquiry is sent to the execution information management unit 115 in the center console 110 to acquire a transfer destination node (step 720)” [0063], and 
assigning, by way of the RPM processor core or “execution adaptor unit” [0046], the workload fragment to a processor of the plurality of processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors, e.g. “In a node A, when it is determined that tasks need to be divided, setup time T.sub.SI necessary for setup, such as copying of divided data, and acquisition time T.sub.SO necessary to receive a result of processing after the processing in a remote node B is completed tasks, to be left in the node A is necessary time T.sub.1, and remaining time, out of the completion time T.sub.E, necessary for a second group of the divided tasks, such as the second half of the tasks, to be delegated to the node B is necessary time T.sub.2.” [0065]) 
These well-known methods performed on the processors of Shimizu can also be applied to the similar DNN processors of Li in view of Campos in the same way, for improvement in a DNN processing environment. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos with the teachings of Shimizu. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale C.  Use of known techniques to improve a similar processing device in the same way.
Using the known storing, computing, and assigning techniques of Shimizu would be beneficial to the DNN processing environment of Li in view of Campos for the following reason: “Scalable distributed job execution can be implemented by appropriately dividing tasks on each of the nodes, transferring some of the tasks to another node, and executing some of the tasks on the other node, as necessary, in response to the operational status of the tasks on each of the nodes, scalable distributed job execution involving appropriate task division” [0032 – Shimizu].

However, Li in view of Campos in view of Shimizu does not disclose the following:
(1)	determine that one or more of the plurality of processors are idle; and 
(2)	remove power from the one or more of the plurality of processors that are determined to be idle.
Nonetheless, this feature would have been made obvious, as evidenced by Potter.
(1) (Potter discloses determining, by way of the RPM processor code or “master power management agent (PMA) 206” [0032] that “can request the power usage values from each TPC to determine the overall power usage by the rotation group 220” [0034], that one or more of the plurality of processors [0022-0023, 0027; FIG. 2, Element 232; Claim 20 of Potter] are idle [0041], e.g. “Examples of such techniques include the master PMA waiting a predetermined sufficient period of time to permit the TPC to become idle, the master PMA receiving a message from the load balancer 202 that the TPC is idle, or the master PMA polling the TPC until the TPC reports it has completed processing all pending transactions or equivalent message” [0041]) 
(2) (Potter discloses removing power from the one or more of the plurality of processors that are determined to be idle [0041], e.g. "Once that TPC has ceased receiving transactions and is in an " idle” state (i.e., not processing a transaction), the master PMA 206 can then command that TPC'c slave PMA 228 to the off state. The master PMA 206 may determine the appropriate time to turn off the TPC in accordance with any suitable technique. Examples of such techniques include the master PMA waiting a predetermined sufficient period of time to permit the TPC to become idle, the master PMA receiving a message from the load balancer 202 that the TPC is idle, or the master PMA polling the TPC until the TPC reports it has completed processing all pending transactions or equivalent message" [0041])
These well-known techniques performed on the processor of Potter can also be applied to the improved DNN processing environment of Li in view of Campos in view of Shimizu.
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos in view of Shimizu with the teachings of Potter. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale C.  Use of known techniques to improve a similar processing device in the same way.
The well-known techniques of Potter would be suitable to the DNN processing environment of Li in view of Campos in view of Shimizu, as they improve capabilities of an RPM processor, enabling it with a Potter].
Regarding claim 15, Li discloses the following: 
A processor core configured to: 
receive a request to execute a deep neural network (DNN) workload; 
(Li teaches receiving, by a processor core [0254-0255], a request to execute a deep neural network (DNN) [0046, 0192] workload [0058-0059] from one of the plurality of processor cores [0047, 0051], e.g. “receives commands directed to performing processing operations” [0058]. 
For more evidence of processor core, see the following citations below: 
“processor 1900 may also include a set of one or more bus controller units 1916 and a system agent core 1910 ... System agent core 1910 provides management functionality for the various processor components” [0254]
“System agent core 1910 may additionally include a power control unit (PCU), which includes logic and components to regulate the power state of processor cores 1902A-1902N and graphics processor 1908” [0255])
divide the DNN workload into a plurality of workload fragments; and 
(Li teaches dividing the DNN workload into a plurality of workload fragments [0064, 0230], e.g. “divide the processing workload into approximately equal sized tasks, to better enable distribution of the graphics processing operations to multiple clusters 214A-214N of the processing cluster array 212” [0064])
determine whether a workload fragment of the plurality of workload fragments is to be statically or dynamically allocated to one of a plurality of DNN processors; 
(Li teaches determining whether a workload fragment of the plurality of the workload fragments is to be statically or dynamically assigned to one of a plurality of DNN processors [0164-0165] – this is done by  
responsive to determining that the workload fragment is to be statically assigned to one of the plurality of DNN processors, assign the workload fragment to a predetermined DNN processor of the plurality of DNN processors, and 
(Li discloses responsive to determining that the workload fragment is to be statically assigned – “conventional static workload scheduling techniques” [0171] – to one of the plurality of DNN processors, e.g. “parallel processors such as general-purpose graphic processing units (GPGPUs) have played a significant role in the practical implementation of deep neural networks” [0004], assigning the workload fragment to a predetermined DNN processor of the plurality of DNN processors [0004, 0169-0171])
assign a selected workload fragment of the plurality of workload fragments to one of a plurality of DNN processors based upon an estimated time to completion of workload fragments currently executing on the plurality of DNN processors.  
(Li teaches assigning a selected workload fragment of the plurality of workload fragments to one of a plurality of DNN processors [0060, 0113, 0199], e.g. “the general-purpose processing unit (GPGPU) 1100 can be configured to be particularly efficient in processing the type of computational workloads associated with training deep neural networks” [0199], based upon an estimated time to completion of workload fragments [0122] executing on the plurality of DNN processors [0199], e.g. “the graphics acceleration module 446 may adhere to the following requirements: ... 2) An application's job request is guaranteed by the graphics acceleration module 446 to complete in a specified amount of time, including any translation faults, or the graphics acceleration module 446 provides the ability to preempt the processing of the job” [0122]) 

Li does not disclose the following:
DNN processors are a type of processor
Nonetheless, this feature would have been made obvious, as evidenced by Campos.
(Campos discloses that DNN processors [0044] are a type of processor, e.g. “one or more processors in the one or more computing platforms” [0047])
These prior art elements of Campos can be substituted for the prior art elements of Li, in order to yield an expected outcome for the system of Li. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li with the teachings of Campos. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale B.  Simple substitution of one known, equivalent element for another to obtain predictable results.
The predictable result would have been as follows: “Training can include teaching a network of intelligent processing nodes to get one or more outcomes, for example, on a simulator” [0195 – Campos].

However, Li in view of Campos does not disclose the following:
(1)	store an execution time scoreboard comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors;
(2)	responsive to determining that the workload fragment is to [[by]] be dynamically assigned to one of the plurality of processors, 
compute an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard and 
assign a workload fragment of the plurality of workload fragments to one of the plurality of processors based upon the [[an]] estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors,
Nonetheless, this feature would have been made obvious, as evidenced by Shimizu.
(1) (Shimizu discloses a well-known method of storing an execution time scoreboard, e.g. “a job completion ratio” [0053, 0055] comprising data indicating a percentage completion of the execution of workload fragments currently executing on the plurality of processors, e.g. “Since tasks are processed one by one in each of the nodes, a job completion ratio can be calculated by counting the number of tasks that have been processed every time a task is processed and comparing the number of tasks with the total number of tasks in a job. After predetermined time has elapsed after starting execution of a job, 2% of tasks have been completed in the node 1, and no task has been processed in the node 2, as shown in (A) of FIG. 3. However, after the job is divided and transferred in the node 1, 3% of tasks have been completed in the node 1, and 1% of tasks have been completed in the node 2, as shown in (B) of FIG. 3” [0053]) 
(2) (Shimizu discloses a well-known method of responsive to determining that the workload fragment is to be dynamically assigned to one of the plurality of processors, 
computing an estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors based on the execution time scoreboard, e.g. “Then, completion time T.sub.E necessary to complete a job is estimated from the progress rate p and the elapsed time T according to equation T.sub.E=T(1-p)/p (step 710). Then, remaining time until deadline time is calculated with reference to data of the deadline time 
assigning, by way of the RPM processor core or “execution adaptor unit” [0046], the workload fragment to a processor of the plurality of processors based upon the estimated time to completion of the execution of the workload fragments currently executing on the plurality of processors, e.g. “In a node A, when it is determined that tasks need to be divided, setup time T.sub.SI necessary for setup, such as copying of divided data, and acquisition time T.sub.SO necessary to receive a result of processing after the processing in a remote node B is completed are first estimated. Then, the setup time T.sub.SI and the acquisition time T.sub.SO are added to estimated completion time T.sub.E, and the result is divided into two parts, so that time, out of the completion time T.sub.E, necessary for a first group of divided tasks, such as the first half of the tasks, to be left in the node A is necessary time T.sub.1, and remaining time, out of the completion time T.sub.E, necessary for a second group of the divided tasks, such as the second half of the tasks, to be delegated to the node B is necessary time T.sub.2.” [0065]) 
These well-known methods performed on the processors of Shimizu can also be applied to the similar DNN processors of Li in view of Campos in the same way, for improvement in a DNN processing environment. 
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos with the teachings of Shimizu. 

Using the known storing, computing, and assigning techniques of Shimizu would be beneficial to the DNN processing environment of Li in view of Campos for the following reason: “Scalable distributed job execution can be implemented by appropriately dividing tasks on each of the nodes, transferring some of the tasks to another node, and executing some of the tasks on the other node, as necessary, in response to the operational status of the tasks on each of the nodes, scalable distributed job execution involving appropriate task division” [0032 – Shimizu].

However, Li in view of Campos in view of Shimizu does not disclose the following:
(1)	determine that one or more of the plurality of processors are idle; and 
(2)	cause power to be removed from the one or more of the plurality of processors that are determined to be idle.
Nonetheless, this feature would have been made obvious, as evidenced by Potter.
(1) (Potter discloses determining, by way of the RPM processor code or “master power management agent (PMA) 206” [0032] that “can request the power usage values from each TPC to determine the overall power usage by the rotation group 220” [0034], that one or more of the plurality of processors [0022-0023, 0027; FIG. 2, Element 232; Claim 20 of Potter] are idle [0041], e.g. “Examples of such techniques include the master PMA waiting a predetermined sufficient period of time to permit the TPC to become idle, the master PMA receiving a message from the load balancer 202 that the TPC is idle, or the master PMA polling the TPC until the TPC reports it has completed processing all pending transactions or equivalent message” [0041]) 
(2) (Potter discloses causing power to be removed from the one or more of the plurality of processors that are determined to be idle [0041], e.g. "Once that TPC has ceased receiving transactions and is in an 
These well-known techniques performed on the processor of Potter can also be applied to the improved DNN processing environment of Li in view of Campos in view of Shimizu.
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos in view of Shimizu with the teachings of Potter. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Rationale C.  Use of known techniques to improve a similar processing device in the same way.
The well-known techniques of Potter would be suitable to the DNN processing environment of Li in view of Campos in view of Shimizu, as they improve capabilities of an RPM processor, enabling it with a feature to determine the appropriate time to turn off the TPC” and turn off its processing elements accordingly [0041 – Potter].
Claim(s) 3, 11, and 18 rejected under 35 U.S.C. 103 as being unpatentable over Li in view of Campos in view of Shimizu in view of Potter in view of Doniwa et al. (Pub. No. US2017/0169329 published on June 15, 2017; hereinafter Doniwa).
Regarding claims 3, 11, and 18, Li in view of Campos in view of Shimizu in view of Potter does not disclose the following: 
wherein each of the plurality of DNN processors is configured to generate an interrupt upon completing execution of a workload fragment, and wherein a message indicating that the workload fragment has completed execution is transmitted to the RPM processor core responsive to the interrupt.  
Doniwa.
(Doniwa teaches that each of the plurality of DNN processors [0017, 0026] is configured to generate an interrupt upon completing execution of a workload fragment [0037], e.g. “If it is determined that the index is not greater than the threshold, monitoring of the index is continued until the learning is completed (step S35). If it is determined that the index is greater than the threshold, the learning is immediately interrupted (step S36). If it is determined in step S35 that the learning has been completed, or it is determined in step S36 that the learning has been interrupted, the result of learning (in the case of the interruption of learning, data indicating the interrupt and the result of learning assumed when the learning was interrupted) is transmitted to the manager 11 (step S37)” [0037], and wherein a message indicating that the workload fragment has completed execution is transmitted to the RPM processor core, cited as the manager [0037], responsive to the interrupt, e.g. “processing performed by each worker 12-i when it has an interrupt processing function” [0037])
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos in view of Shimizu in view of Potter with the teachings of Doniwa. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Apply the teachings of Doniwa in accordance with the DNN processors and RPM processor core of Li in view of Campos in view of Shimizu in view of Potter. 
The motivation would have been as follows: “the result of learning is an index indicating performance that is assumed to be, for example, a recognition ratio, an error ratio or cross-entropy” [0037 – Doniwa].
Claim(s) 4-5, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable Li in view of Campos in view of Shimizu in view of Potter in view of Koker et al. (Pub. No. US2018/0293205 filed on April 9, 2017; hereinafter Koker).
Regarding claim 4, Li in view of Campos in view of Shimizu in view of Potter does not disclose the following: 
wherein the request to execute the DNN workload is generated by one of a plurality of processor cores communicatively coupled to the RPM processor core.  
Nonetheless, this feature would have been made obvious, as evidenced by Koker.
(Koker teaches that the request to execute the DNN workload is generated [0047, 0052-0053], e.g. “the scheduler 210 can be configured to divide the processing workload into approximately equal sized tasks, to better enable distribution of the graphics processing operations to multiple clusters 214A-214N of the processing cluster array 212” [0052] and “the processing cluster array 212 can receive processing tasks to be executed via the scheduler 210, which receives commands defining processing tasks from front end 208 … the workload specified by incoming command buffers (e.g., batch-buffers, push buffers, etc.) is initiated” [0053], by one of a plurality of processor cores communicatively coupled to the RPM processor core [0039, 0052-0053, 0068-0069, 0147])
At a time prior to the effective filing date of Applicant’s claimed invention, it would have been obvious to modify Li in view of Campos in view of Shimizu in view of Potter with the teachings of Koker. 
One of ordinary skill in the art would recognize the desirability of performing the following modification: Apply this teaching of Koker with respect to DNN workload, processor cores, and RPM core of Li in view of Campos in view of Shimizu in view of Potter. 
The motivation would have been to receive a request, in order “to distribute commands or other work items to a processing cluster array 212” [0047 – Koker].

Response to Amendments
Applicant’s arguments, see “REMARKS”, filed January 26, 2022, with respect to claims 1-4, 7-8, 10-11, 14-15, 17-18, and 21. Those arguments have been considered but are moot in view of the new ground(s) of rejection for claims 1-4, 7-8, 10-11, 14-15, 17-18, and 21
With respect to prior art teachings for obviousness, Examiner has removed the disclosures and teachings of Manula (Pub. No. US2020/0174828 filed on May 31, 2019). Examiner has now applied cited evidence from newly discovered prior arts of Campos et al. (Pub. No. US2018/0293498), Shimizu et al. (Pub. No. US2008/0115143), and Potter et al. (Pub. No. US2003/0023885) to show that the claim amendments, along with prima facie cases of obviousness, render claims are unpatentable over 35 U.S.C. 103.
Examiner maintains rejection of all claims under 35 U.S.C. 103.
Examiner recommends that Applicant further amend the claims to overcome the rejection set forth, along with the prior art of record.

Conclusion
The prior arts used for this office action were the most substantial for this rejection. 

Contact Information
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to Gilles Kepnang whose telephone number is (571) 270-7417. Business hours for Examiner are Monday – Friday (8:00 AM – 5:00 PM).
If attempts to reach the Examiner by telephone are unsuccessful, please contact Lewis Bullock (571) 272-3759. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic 
/GILLES R KEPNANG/Examiner, Art Unit 2199                                                                                                                                                                                                        
January 29, 2022

/LEWIS A BULLOCK  JR/Supervisory Patent Examiner, Art Unit 2199