DETAILED ACTION
Claims 1-10 are pending.
The office acknowledges the following papers:
Drawings, specification, and remarks filed on 5/27/2020.

	Priority
The effective filing date for the subject matter defined in the pending claims in this application is 9/9/2019.

Drawings
The Examiner contends that the drawings submitted on 5/27/2020 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “On-chip heterogeneous AI processor with distributed task queues allowing parallel task execution”.
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. See In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970);and, In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent is shown to be commonly owned with this application. See 37 CFR 1.130(b).
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Applicants can file an eTerminal Disclaimer (eTD) in utility applications filed under 35 U.S.C. 111(a) or in compliance with 35 U.S.C. 371, and design applications. Filing an eTD via EFS-Web is highly recommended due to an extensive backlog for processing paper TDs. However, applicants may still file a TD for manual review.
Claims 1-2, 5, 7, and 10 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-10 of copending Application No. 16/812,817 in view of Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521).
Instant Application
Patent / Copending Application
1. A configurable heterogeneous Artificial Intelligence (AI) processor, comprising:
1. An on-chip heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units, wherein each of the computation units is associated with a task queue; and 
at least two different architectural types of computation units, each of the computation units being associated with a task queue configured to store 

a storage unit configured to store data associated with executing the plurality of computation subtasks; and
a controller, wherein the controller comprises a task scheduling module, a task synchronization module and an access control module, and wherein:

the task scheduling module is configured to partition, according to a configuration option indicating task allocation, a computation graph associated with a neural network into a plurality of computation subtasks, distribute the computation subtasks to the respective task queues of the computation units, and set a dependency among the computation subtasks;
a controller configured to partition a received computation graph associated with a neural network into a plurality of computation subtasks and distribute the plurality of computation subtasks to the respective task queues associated with the computation units;
the task synchronization module is configured to realize the synchronization of the computation subtasks according to the set dependency; and

the access control module is configured to control access to data involved in the computation subtasks on the storage unit and an off-chip memory.
an access interface configured to access an off-chip memory.

The limitations not shown by the claims in copending application 16/812,817 are read upon as specified by the rejection below.
This is a provisional obviousness-type double patenting rejection.  
Claims 3-4, 6, and 8-9 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-10 of copending Application No. 16/812,817 in view of Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521), in view of Official Notice.

This is a provisional obviousness-type double patenting rejection.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 7, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521).
As per claim 1:
Che and Wu disclosed a configurable heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.), wherein each of the computation units is associated with a task queue (Wu: Figures 1, 3, 
 a storage unit (Wu: Figure 1 element 620, paragraph 102)(Che: Figures 1-2 elements 116 and 220, paragraphs 19-20 and 28-29)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In addition, the combination allows for Che to include caches for each target device.); and 
a controller, wherein the controller comprises a task scheduling module (Che: Figure 2 element 210, paragraph 29), a task synchronization module (Che: Figures 2-3 elements 210 and 214-215, paragraphs 29, 42-43, and 55) and an access control module (Che: Figure 1 element 106, paragraph 19-20), and wherein:
the task scheduling module is configured to partition, according to a configuration option indicating task allocation, a computation graph associated with a neural network into a plurality of computation subtasks (Che: Figure 2-3 elements 210, 311-313, and 403, paragraphs 14, 32-33, 35, 37-40, and 42)(The graph partitioner of the scheduler partitions a received computation graph into a plurality of subsets. Neural network models can be graphically represented by such computation graphs. The scheduler can be configured to create supernodes that offloaded to target devices.), distribute the computation subtasks to the respective task queues of the computation units (Wu: 
the task synchronization module is configured to realize the synchronization of the computation subtasks according to the set dependency (Wu: Figures 3 and 7 elements S303 and S703, paragraphs 116-117, 153-156, and 159-160)(Che: Figure 3 elements 214-215, paragraphs 42-43 and 55)(The combination allows for Che to set references counts (i.e. set a dependency) based on the number of task dependencies within the computation graph. Wu disclosed execution and finishing of tasks updates reference counts for remaining dependent tasks to be executed. The combination further allows for 
the access control module is configured to control access to data involved in the computation subtasks on the storage unit and an off-chip memory (Wu: Figure 1 element 620, paragraph 102)(Che: Figure 1 element 106, paragraph 19-20)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In addition, the combination allows for Che to include caches for each target device (i.e. local memory).).
The advantage of work queues buffering tasks is that tasks waiting to be executed can be stored and load balanced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the work queues of Wu into the processing system of Che for the above advantage.
As per claim 2:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to set, according to a configuration option indicating an operation mode, an operation mode for the computation units, the operation mode comprising an independent parallel mode, a cooperative parallel mode or an interactive cooperation mode (Wu: Figures 1, 3, 7, and 9 elements 620, S303, and S703, paragraphs 102, 116-117, 153-156, and 159-160)(Che: Figures 1-3 elements 100, 214-215, and 220, paragraphs 14, 26, 30, 42-43 and 55)(The combination adds the work queues of Wu into the neural network processing unit of Che such that the target devices include work queues. The combination allows for Che to set references counts (i.e. set a dependency) based on the number of task dependencies 
in the independent parallel mode, the computation subtasks of the computation units are executed independently and in parallel with each other (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(Che disclosed task allocation to target devices, but doesn’t explicitly state if task execution is performed sequentially or parallel. Wu disclosed distributing tasks that can be executed in parallel to parallel work queues for execution. The combination allows for executing a plurality of independent tasks in the computation graph of Che in parallel when the tasks aren’t dependent upon each other.);
in the cooperative parallel mode, the computation subtasks of the computation units are executed cooperatively in a pipelined manner (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(Che disclosed task allocation to target devices, but doesn’t explicitly state if task execution is performed sequentially or parallel. Wu disclosed distributing tasks that can be executed in parallel to parallel work queues for execution. The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other.); and
in the interactive cooperation mode, a first one of the computation units, during the execution of a computation subtask distributed to the first one of the computation units, 
As per claim 5:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to perform, according to a configuration option indicating operator fusion, operator fusion on the computation subtasks allocated to a computation unit (Che: Figures 3-4 elements 212 and 411, paragraph 37)(The graph optimizer allows for fusing subgraphs of the computation graph for scheduling to a single target device.).
As per claim 7:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the architectural types of the computation units include one Application Specific Integrated Circuit (ASIC), General-Purpose Graphics Processing Unit (GPGPU), Field-Programmable Gate Array (FPGA), or Digital Signal Processor (DSP) (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.).
As per claim 10:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to distribute the computation subtasks to the respective computation units according to the capabilities of the computation units (Che: Figure 2 element 210, paragraph 14 and 29)(The scheduler takes into consideration target device capabilities to process received tasks for execution.).

Claims 3-4, 6, and 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521), in view of Official Notice.
As per claim 3:
Che and Wu disclosed the heterogeneous Al processor according to claim 2, wherein the storage unit comprises a cache memory and a scratch-pad memory (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include shared cache and scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement shared on-chip cache and scratch-pad memories in Che.).
As per claim 4:
Che and Wu disclosed the heterogeneous Al processor according to claim 3, wherein the access control module is configured to set, according to the set operation 
in the independent parallel mode, the storage location is set on the off-chip memory (Wu: Figures 1, 3, and 8 elements 104 and S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 19, 29, and 48)(The combination allows for executing a plurality of independent tasks in the computation graph of Che in parallel when the tasks aren’t dependent upon each other. Task data and instructions are stored in the shared host memory.);
in the cooperative parallel mode, the storage location is set on the scratch-pad memory (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other. In view of the above official notice, the heterogeneous system includes a shared scratch-pad memory.); and
in the interactive cooperation mode, the storage location is set on the cache memory (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 19, 29, and 48)(The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other. Task data and instructions are stored in the global memory. Official notice is given that the global 
As per claim 6:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the storage unit comprises a scratch-pad memory (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement off-chip scratch-pad memories in Che.) and the task scheduling module is further configured to notify, according to a configuration option indicating inter-layer fusion, the access control module to store outputs from one or more intermediate layers of the neural network in the scratch-pad memory (Wu: Figure 1 element 620, paragraph 102)(Che: Figure 1 element 106, paragraph 19-20 and 28-29)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In view of the above official notice, a scratch-pad memory is added to Che to store data. Official notice is given that task execution results can be written back to memory for storage for the advantage of saving results for later processing. Thus, it would have been obvious to one of ordinary skill in the art to implement task execution result writeback to memory, including the added scratch-pad memory.).
As per claim 8:

As per claim 9:
Che and Wu disclosed the heterogeneous Al processor according to claim 8, wherein the computation units comprise a computation unit of an Application Specific Integrated Circuit ASIC architecture (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.) and a computation unit of a General-Purpose Graphics Processing Unit GPGPU architecture (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.).

	Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Dash et al. (U.S. 2020/0043123), taught scheduling of graphics tasks
Li et al. (U.S. 2021/0064425), taught task queues and task processing.
Balakrishnan et al. (U.S. 9,841,998), taught task classification and class queues.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183