Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are presented for examination.

Allowable Subject Matter
Claims 2-5, 11-15 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US 20190114548 A1), in view of Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”), further in view of Dolan (“Compiler Support for Lightweight Context Switching”).

Regarding Claim 1, Wu (US 20190114548 A1) teaches
A method comprising: 
receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases).

Wu did not specifically teach
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations,
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).



Wu and Joseph did not teach
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations  (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu and Joseph’s teaching to Dolan’s in order in order to provide efficient context switching and message passing between lightweight threads of control by using a new language-neutral primitive for the LLVM compiler (Dolan [Abstract]).

Regarding Claim 10, Wu teaches
A system comprising; a processor; a memory device containing instructions, which when executed by the processor cause the processor to: 
receive a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sort the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases). 

Wu did not specifically teach
determine, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations,
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determine, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu’s teaching to Joseph’s in order to increase the efficient utilization of huge collection of high-end resources at low cost by utilizing an algorithm to reduce the waiting time as well as the number of context switches (Joseph [abstract]).

Wu and Joseph did not teach
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, include an annotation to indicate the processor assigned for each of the operations (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu and Joseph’s teaching to Dolan’s in order in order to provide efficient context switching and message passing between 

Regarding Claim 19, Wu (US 20190114548 A1) teaches
 A non-transitory computer-readable medium comprising instructions, which when executed by a computing device, cause the computing device to perform operations comprising: 
receiving a neural network (NN) model to be executed on a target platform, the NN model including multiple layers that include operations, (Para [0007], receiving a model defining a sequential order of a plurality of functions performed when executing at least one layer in the neural network where the neural network comprises a plurality of layers),
at least some of the operations being executable on multiple processors of the target platform (Para [0053], By subdividing scheduling into multiple levels, the compiler and scheduler can generate hardware level code (e.g., RTL code) which configures a hardware system such that the different blocks, software functions/methods, and processing elements operate concurrently); 
sorting the operations from the multiple layers in a particular order based at least in part on grouping the operations that are executable by a particular processor of the multiple processors (Para [0028], The layers are defined in a sequential order such that Layer 1 is performed before Layer 2, Layer 2 is performed before Layer 3, and so forth. Thus, there exists a data dependency between the lower layers and the upper layer(s). Although Layer 2 waits to receive data from Layer 1, in one embodiment, the neural network 100 can be parallelized such that each layer can operate concurrently. … Thus, implementing the layers in hardware to form a parallel pipeline can vastly increase the throughput of the neural network when compared to operating the layers one at a time. The timing benefits of scheduling the layers in a massively parallel hardware system improve further as the number of layers in the neural network 100 increases).

Wu did not specifically teach
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations,
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”) teaches 
determining, based at least in part on a cost of transferring the operations between the multiple processors, an assignment of one of the multiple processors for each of the sorted operations of each of the layers in a manner that minimizes a total cost of executing the operations (Page 546, right Col, last paragraph,  Along with reducing the delay, proposed system tries to reduce the power consumption by reducing the number of context switches in the new scheduling algorithms. Each process in the cloud must be executed in a VM. Hence cost and energy usage is related to the number of context switches. Cost and energy increases as the size of processes increases, because larger process causes more context switches than smaller process).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu’s teaching to Joseph’s in order to increase the efficient utilization of huge collection of high-end resources at low cost by utilizing an algorithm to reduce the waiting time as well as the number of context switches (Joseph [abstract]).

Wu and Joseph did not teach
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations.

However, Dolan (“Compiler Support for Lightweight Context Switching”) teaches
and for each layer of the NN model, including an annotation to indicate the processor assigned for each of the operations  (Page 36:7, We extend the LLVM intermediate representation to allow a function to be marked nocalleesave, which indicates that it may not preserve the values of the standard callee-save registers. Our compiler marks all functions containing a context switch with this annotation).

.

Claims 6-9, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Wu (US 20190114548 A1), in view of Joseph (“Scheduling to Minimize Context Switches for Reduced Power Consumption and Delay in the Cloud”), and Dolan (“Compiler Support for Lightweight Context Switching”) further in view of Yang (US 20190095212 A1).

Regarding Claim 6, Wu, Joseph and Dolan teach
The method of claim 1.

Wu, Joseph and Dolan did not teach
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor.

However, Yang (US 20190095212 A1) teaches 
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 7, Wu, Joseph, Dolan and Yang teach
The method of claim 6.

Wu, Joseph and Dolan did not teach
wherein the neural processor is configured to perform operations related to neural network models.

However, Yang (US 20190095212 A1) teaches 
wherein the neural processor is configured to perform operations related to neural network models (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 8, Wu, Joseph, Dolan and Yang teach
The method of claim 7.

Wu, Joseph and Dolan did not teach
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations.

However, Yang teaches 
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations (Para [0104], if the computing load is increased and the computing capability is sufficient, the size of each of the neural network inputs NNI_1 through NNI_4 may be increased. Alternatively, if the computing load is decreased, the size of each of the neural network inputs NNI_1 through NNI_4 may be decreased, considering instantaneous power consumption).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 9, Wu, Joseph and Dolan teach
The method of claim 1.

Wu, Joseph and Dolan did not teach
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations.

However, Yang (US 20190095212 A1) teaches 
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations (Para [0033], In an embodiment, the electronic system 100 of FIG. 1 is an application processor (AP) located within a mobile device; Para [0035], The electronic system 100 may be defined to include a neural network system NNS in that the electronic system 100 performs a neural network computing function. The neural network system NNS may include at least some elements from among elements included in the electronic system 100, the at least some elements being associated with a neural network operation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 15,  Wu, Joseph and Dolan teach
The system of claim 10.

Wu, Joseph and Dolan did not teach
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor.

However, Yang (US 20190095212 A1) teaches 
wherein the multiple processors comprise at least a CPU, a GPU, and a neural processor (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 16, Wu, Joseph, Dolan and Yang teach
The system of claim 15.

Wu, Joseph and Dolan did not teach
wherein the neural processor is configured to perform operations related to neural network models.

However, Yang (US 20190095212 A1) teaches 
wherein the neural processor is configured to perform operations related to neural network models (Para [0135], The neural network device 340 is a processor that performs a computation based on a second algorithm (i.e., neural network model). The neural network device 340 may perform a second computation on the plurality of candidate images CI1, CI2, and CI3 received from the VRA 330. The neural network device 340 may be one of a CPU, a GPU, an NPU, and a DSP or may be a dedicated processor for a neural network computation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 17, Wu, Joseph, Dolan and Yang teach
The system of claim 16.

Wu, Joseph and Dolan did not teach
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations.

However, Yang teaches 
wherein the neural processor utilizes a lower amount of power when performing the operations when compared to the CPU or the GPU performing the operations (Para [0104], if the computing load is increased and the computing capability is sufficient, the size of each of the neural network inputs NNI_1 through NNI_4 may be increased. Alternatively, if the computing load is decreased, the size of each of the neural network inputs NNI_1 through NNI_4 may be decreased, considering instantaneous power consumption).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Regarding Claim 18, Wu, Joseph and Dolan teach
The system of claim 10.

Wu, Joseph and Dolan did not teach
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations.

However, Yang (US 20190095212 A1) teaches 
wherein the target platform comprises a mobile electronic device, and the mobile electronic device executes the NN model based at least in part on the annotation to indicate the processor assigned for each of the operations (Para [0033], In an embodiment, the electronic system 100 of FIG. 1 is an application processor (AP) located within a mobile device; Para [0035], The electronic system 100 may be defined to include a neural network system NNS in that the electronic system 100 performs a neural network computing function. The neural network system NNS may include at least some elements from among elements included in the electronic system 100, the at least some elements being associated with a neural network operation).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have combined Wu, Joseph and Dolan teaching to Yang’s in order to enhance the performance of the electronic system or the neural network system by determining a computing parameter in an adaptive manner based on one of a computing load and a computing capability of the neural network system (Yang [Summary]).

Notice of References Cited
	NG (US 20190114535 A1) discloses a neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters.

Conclusion

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Zhen can be reached on (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AMIR SOLTANZADEH/Examiner, Art Unit 2191    


 Supervisory Patent Examiner, Art Unit 2191