Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are presented for examination.

Allowable Subject Matter
Claims 2-3, 5, 11-15 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US 20190114548 A1), in view of Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”), further in view of Dolan (“Compiler Support for Lightweight Context Switching”) and Bieiweiss (US20190205737A1).

Regarding Claim 1, Wu (US 20190114548 A1) teaches
A method comprising: 
receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases).

Wu did not specifically teach
determining, based at least in part on a cost of transferring the operations between the multiple processors, and a cost of performing the operations at the respective processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations,
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).



Wu and Joseph did not teach
and a cost of performing the operations at the respective processors
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations  (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu and Joseph’s teaching to Dolan’s in order in order to provide efficient context switching and message passing between 

Wu, Joseph and Dulan did not specifically teach
and a cost of performing the operations at the respective processors.

However, Bieiweiss teaches
and a cost of performing the operations at the respective processors (Para [0228], FIG. 22 illustrates one embodiment of a scheduling process implemented at a machine learning acceleration mechanism. At processing block 2205, computation costs of graph nodes and sub-graphs are determined. In one embodiment, scheduler 2015 performs this determination based on available information regarding a computation cost of the each of the DNN operators on CPU).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dulan’s teaching to Bieiweiss’ in order to facilitate acceleration of machine learning operations that has accelerator circuitry communicatively coupled to processor by performing compute operations for neural network (Bieiweiss [Summary]).

Regarding Claim 4, Wu, Joseph, Dolan and Bieiweiss teach
The method of Claim 1.

wherein the cost of transferring the operations comprises an amount of latency for transferring the operations between the multiple processors.

However, Bieiweiss teaches
wherein the cost of transferring the operations comprises an amount of latency for transferring the operations between the multiple processors (Para [0240], Other FA configurations may be implemented based on the goals and requirements for a particular project (e.g., what domain being targeted, performance/energy requirements, process technology target(s), bandwidth/latency characteristics of multi-chip interfaces, etc).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dulan’s teaching to Bieiweiss’ in order to facilitate acceleration of machine learning operations that has accelerator circuitry communicatively coupled to processor by performing compute operations for neural network (Bieiweiss [Summary]).

Regarding Claim 10, Wu teaches
A system comprising; a processor; a memory device containing instructions, which when executed by the processor cause the processor to: 
receive a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sort the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases). 

Wu did not specifically teach
determine, based at least in part on a cost of transferring the operations between the multiple processors, and a cost of performing the operations at the respective processors, an 
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determine, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu’s teaching to Joseph’s in order to increase the efficient utilization of huge collection of high-end resources at low cost by utilizing an algorithm to reduce the waiting time as well as the number of context switches (Joseph [abstract]).

Wu and Joseph did not teach
and a cost of performing the operations at the respective processors
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu and Joseph’s teaching to Dolan’s in order in order to provide efficient context switching and message passing between lightweight threads of control by using a new language-neutral primitive for the LLVM compiler (Dolan [Abstract]).

Wu, Joseph and Dulan did not specifically teach
and a cost of performing the operations at the respective processors.


and a cost of performing the operations at the respective processors (Para [0228], FIG. 22 illustrates one embodiment of a scheduling process implemented at a machine learning acceleration mechanism. At processing block 2205, computation costs of graph nodes and sub-graphs are determined. In one embodiment, scheduler 2015 performs this determination based on available information regarding a computation cost of the each of the DNN operators on CPU).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dulan’s teaching to Bieiweiss’ in order to facilitate acceleration of machine learning operations that has accelerator circuitry communicatively coupled to processor by performing compute operations for neural network (Bieiweiss [Summary]).


Regarding Claim 19, Wu (US 20190114548 A1) teaches
 A non-transitory computer-readable medium comprising instructions, which when executed by a computing device, cause the computing device to perform operations comprising: 
receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases).

Wu did not specifically teach
determining, based at least in part on a cost of transferring the operations between the multiple processors, and a cost of performing the operations at the respective processors, an 
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu’s teaching to Joseph’s in order to increase the efficient utilization of huge collection of high-end resources at low cost by utilizing an algorithm to reduce the waiting time as well as the number of context switches (Joseph [abstract]).

Wu and Joseph did not teach
and a cost of performing the operations at the respective processors
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations  (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu and Joseph’s teaching to Dolan’s in order in order to provide efficient context switching and message passing between lightweight threads of control by using a new language-neutral primitive for the LLVM compiler (Dolan [Abstract]).

Wu, Joseph and Dulan did not specifically teach
and a cost of performing the operations at the respective processors.


and a cost of performing the operations at the respective processors (Para [0228], FIG. 22 illustrates one embodiment of a scheduling process implemented at a machine learning acceleration mechanism. At processing block 2205, computation costs of graph nodes and sub-graphs are determined. In one embodiment, scheduler 2015 performs this determination based on available information regarding a computation cost of the each of the DNN operators on CPU).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dulan’s teaching to Bieiweiss’ in order to facilitate acceleration of machine learning operations that has accelerator circuitry communicatively coupled to processor by performing compute operations for neural network (Bieiweiss [Summary]).


Claims 6-9, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US 20190114548 A1), in view of Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”), and Dolan (“Compiler Support for Lightweight Context Switching”), Bieiweiss (US 20190205737 A1) further in view of Yang (US 20190095212 A1).

Regarding Claim 6, Wu, Joseph, Dolan and Bieiweiss teach


Wu, Joseph, Dolan and Bieiweiss did not teach
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor.

However, Yang (US 20190095212 A1) teaches 
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 7, Wu, Joseph, Dolan, Bieiweiss and Yang teach
The method of claim 6.


wherein the neural processor is configured to perform operations related to neural network models.

However, Yang (US 20190095212 A1) teaches 
wherein the neural processor is configured to perform operations related to neural network models (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 8, Wu, Joseph, Dolan, Bieiweiss and Yang teach
The method of claim 7.

Wu, Joseph, Dolan and Bieiweiss did not teach


However, Yang teaches 
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations (Para [0104], if the computing load is increased and the computing capability is sufficient, the size of each of the neural network inputs NNI_1 through NNI_4 may be increased. Alternatively, if the computing load is decreased, the size of each of the neural network inputs NNI_1 through NNI_4 may be decreased, considering instantaneous power consumption).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 9, Wu, Joseph, Dolan and Bieiweiss teach
The method of claim 1.

Wu, Joseph, Dolan and Bieiweiss did not teach


However, Yang (US 20190095212 A1) teaches 
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations (Para [0033], In an embodiment, the electronic system 100 of FIG. 1 is an application processor (AP) located within a mobile device; Para [0035], The electronic system 100 may be defined to include a neural network system NNS in that the electronic system 100 performs a neural network computing function. The neural network system NNS may include at least some elements from among elements included in the electronic system 100, the at least some elements being associated with a neural network operation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 15,  Wu, Joseph, Dolan and Bieiweiss teach


Wu, Joseph, Dolan and Bieiweiss did not teach
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor.

However, Yang (US 20190095212 A1) teaches 
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 16, Wu, Joseph, Dolan, Bieiweiss and Yang teach
The system of claim 15.


wherein the neural processor is configured to perform operations related to neural network models.

However, Yang (US 20190095212 A1) teaches 
wherein the neural processor is configured to perform operations related to neural network models (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 17, Wu, Joseph, Dolan, Bieiweiss and Yang teach
The system of claim 16.

Wu, Joseph, Dolan and Bieiweiss did not teach


However, Yang teaches 
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations (Para [0104], if the computing load is increased and the computing capability is sufficient, the size of each of the neural network inputs NNI_1 through NNI_4 may be increased. Alternatively, if the computing load is decreased, the size of each of the neural network inputs NNI_1 through NNI_4 may be decreased, considering instantaneous power consumption).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 18, Wu, Joseph, Dolan and Bieiweiss teach
The system of claim 10.

Wu, Joseph, Dolan and Bieiweiss did not teach


However, Yang (US 20190095212 A1) teaches 
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations (Para [0033], In an embodiment, the electronic system 100 of FIG. 1 is an application processor (AP) located within a mobile device; Para [0035], The electronic system 100 may be defined to include a neural network system NNS in that the electronic system 100 performs a neural network computing function. The neural network system NNS may include at least some elements from among elements included in the electronic system 100, the at least some elements being associated with a neural network operation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph, Dolan and Bieiweiss teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Response to Arguments


	
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMIR SOLTANZADEH whose telephone number is (571)272-3451. The examiner can normally be reached M-F, 9am - 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Zhen can be reached on (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMIR SOLTANZADEH/Examiner, Art Unit 2191    

/WEI Y ZHEN/Supervisory Patent Examiner, Art Unit 2191