DETAILED ACTION
Claims 1-12 are pending.
The office acknowledges the following papers:
Drawings, specification, and remarks filed on 5/27/2020.

	Priority
The effective filing date for the subject matter defined in the pending claims in this application is 9/9/2019.

Drawings
The Examiner contends that the drawings submitted on 5/27/2020 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “On-chip heterogeneous AI processor with distributed tasks queues allowing for parallel task execution”.
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. See In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970);and, In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent is shown to be commonly owned with this application. See 37 CFR 1.130(b).
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Applicants can file an eTerminal Disclaimer (eTD) in utility applications filed under 35 U.S.C. 111(a) or in compliance with 35 U.S.C. 371, and design applications. Filing an eTD via EFS-Web is highly recommended due to an extensive backlog for processing paper TDs. However, applicants may still file a TD for manual review.
Claims 1-8 are rejected under the judicially created doctrine of obviousness-type double patenting as being unpatentable over claims 1-4 and 7-10 of U.S. Patent Application No. 16/812,832. Although the conflicting claims are not identical, they are not patentably distinct from each other because U.S. Patent Application No. 16/812,832 contains every element of claims 1-8 of the instant application and thus anticipates the claims of the instant application. Claims of the instant application therefore are not patently distinct from earlier patent claims and as such are unpatentable over obvious-type double patenting. A later application claim is not patently distinct from an earlier claim if the later claim is anticipated by the earlier claim.

Copending Application
1. An on-chip heterogeneous Artificial Intelligence (AI) processor, comprising:
1. A configurable heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units, each of the computation units being associated with a task queue configured to store computation subtasks to be executed by the computation unit;
at least two different architectural types of computation units, wherein each of the computation units is associated with a task queue; and a controller, wherein the controller comprises a task scheduling module, a task synchronization module and an access control module, and wherein:
a controller configured to partition a received computation graph associated with a neural network into a plurality of computation subtasks and distribute the plurality of computation subtasks to the respective task queues associated with the computation units; 
the task scheduling module is configured to partition, according to a configuration option indicating task allocation, a computation graph associated with a neural network into a plurality of computation subtasks, distribute the computation subtasks to the respective task queues of the computation units, and set a dependency among the computation subtasks;
a storage unit configured to store data associated with executing the plurality of computation subtasks; and
a storage unit;
an access interface configured to access an off-chip memory.
the task synchronization module is configured to realize the synchronization of the computation subtasks according to the set dependency; and
the access control module is configured to control access to data involved in the computation subtasks on the storage unit and an off-chip memory.

Dependent claims 2-8 are read upon by the dependent claims 2-3 and 7-10 of copending application 16/812,832.
This is a provisional obviousness-type double patenting rejection.  
Claims 9-12 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 7-8 of copending Application No. 
This is a provisional obviousness-type double patenting rejection.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521).
As per claim 1:
Che and Wu disclosed an on-chip heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.), each of the computation units being associated with a task queue configured to store 
a controller configured to partition a received computation graph associated with a neural network into a plurality of computation subtasks (Che: Figure 2-3 elements 210, 311-313, and 403, paragraphs 14, 32-33, 38-40, and 42)(The graph partitioner of the scheduler partitions a received computation graph into a plurality of subsets. Neural network models can be graphically represented by such computation graphs.) and distribute the plurality of computation subtasks to the respective task queues associated with the computation units (Wu: Figures 1, 3, 7, and 9 elements 620, S302-303, and S702-703, paragraphs 102, 111-117, and 158-159)(Che: Figures 2-3 element 210-220, paragraphs 29, 42-43, 49, and 55)(Che disclosed the task allocation generator & optimizer assign tasks of the computation graph to target devices for execution. Wu disclosed distributing tasks of a directed acyclic graph (DAG) to a distributed set of work queues for parallel execution on processor cores. The combination adds the work queues of Wu into the neural network processing unit of Che such that the target devices include work queues. The combination allows for the assigned tasks to be added to the work queues prior to execution on the target devices.); 
a storage unit configured to store data associated with executing the plurality of 
an access interface configured to access an off-chip memory (Che: Figure 1 element 106, paragraph 19-20)(The memory controller accesses the off-chip host memory.).
The advantage of work queues buffering tasks is that tasks waiting to be executed can be stored and load balanced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the work queues of Wu into the processing system of Che for the above advantage.
As per claim 2:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein the architectural types include at least one of the Application Specific Integrated Circuit (ASIC), General-Purpose Graphics Processing Unit (GPGPU), Field-Programmable Gate Array (FPGA), or Digital Signal Processor (DSP) (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.).
As per claim 7:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein the computation units are configured to support one or more of an independent parallel mode, a cooperative parallel mode, or an interactive cooperation 
in the independent parallel mode, at least two of the plurality of computation subtasks are executed independently and in parallel with each other (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(Che disclosed task allocation to target devices, but doesn’t explicitly state if task execution is performed sequentially or parallel. Wu disclosed distributing tasks that can be executed in parallel to parallel work queues for execution. The combination allows for executing a plurality of independent tasks in the computation graph of Che in parallel when the tasks aren’t dependent upon each other.);
in the cooperative parallel mode, at least two of the plurality of computation subtasks are executed cooperatively in a pipelined manner; and
in the interactive cooperation mode, a first one of the computation units, during the execution of a computation subtask distributed to the first one of the computation units, needs to waits for or depends on results generated by a second one of the computation units executing a computation subtask distributed to the second one of the computation units.
As per claim 8:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein the controller distributes the plurality of computation subtasks to the computation units according to the capabilities of the computation units (Che: Figure 2 element 210, paragraph 14, 29)(The scheduler takes into consideration target device capabilities to process received tasks for execution.).

Claims 3-6 and 9-12 are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521), in view of Official Notice.
As per claim 3:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein a first one of the computation units is a customized computation unit for a particular Al algorithm or operation (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators. Official notice is given that such accelerators can be used for processing AI algorithms/operations for the advantage of increased performance of such execution. Thus, it would have been obvious to one of ordinary skill in the art to implement execution of AI algorithms/operations on the accelerators of Che.), and a second one of the computation units is a programmable computation unit (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.).
As per claim 4:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 3, wherein the computation units comprise a computation unit of an Application Specific Integrated Circuit (ASIC) architecture (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.) and a computation unit of a General-Purpose Graphics Processing Unit (GPGPU) architecture (Che: Figures 1-2 
As per claim 5:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein the storage unit comprises a cache memory and a scratch-pad memory (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include cache and scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement off-chip cache and scratch-pad memories in Che.).
As per claim 6:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 1, wherein the scratch-pad memory is shared by the computation units (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement off-chip scratch-pad memories in Che. In view of the above official notice, the added scratch-pad memory is shared by the target devices for task data storage.).
As per claim 9:
Che and Wu disclosed the on-chip heterogeneous Al processor according to claim 
As per claim 10:
Che and Wu disclosed a on-chip heterogeneous Artificial Intelligence (AI) 
a plurality of computation clusters connected through an on-chip data exchange network (Wu: Figure 2 elements 710-720, paragraph 13)(Che: Figure 2 element 220, paragraph 29)(Che disclosed a single heterogeneous platform for executing tasks. Wu disclosed multiple processing clusters for parallel task execution. The combination allows for Che to include multiple heterogeneous platforms for task execution.), each of the computation clusters comprising:
at least two different architectural types of computation units (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.), each of the computation units being associated with a task queue configured to store computation subtasks to be executed by the computation unit (Wu: Figures 1, 3, 7, and 9 elements 620, S302-303, and S702-703, paragraphs 102, 111-117, and 158-159)(Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(Wu disclosed distributing tasks of a directed acyclic graph (DAG) to a distributed set of work queues for parallel execution on processor cores. The combination adds the work queues of Wu into the neural network processing unit of Che such that the target devices include work queues.);
an access control module (Wu: Figure 2 elements 710-720, paragraph 103)(Che: Figure 2 element 220, paragraph 29)(Che disclosed a single heterogeneous platform for executing tasks. Wu disclosed multiple processing clusters for parallel task execution, each including processor cores and caches. 
a cache and an on-chip memory shared by the computation units (Wu: Figure 2 elements 710-720, paragraph 13)(Che: Figure 2 element 220, paragraph 29)(Che disclosed a single heterogeneous platform for executing tasks. Wu disclosed multiple processing clusters for parallel task execution, each including processor cores and caches. The combination allows for Che to include multiple heterogeneous platforms for task execution. Official notice is given that processing systems can include multiple on-chip memories for the advantage of increased memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement added on-chip memories in the combination.);
a controller configured to partition a received computation graph associated with a neural network into a plurality of computation subtasks (Che: Figure 2-3 elements 210, 311-313, and 403, paragraphs 14, 32-33, 38-40, and 42)(The graph partitioner of the scheduler partitions a received computation graph into a plurality of subsets. Neural network models can be graphically represented by such computation graphs.) and distribute the plurality of computation subtasks to respective task queues associated with the computation units in each computation cluster (Wu: Figures 2-3, 7, and 9 elements 710-720, S302-303, and S702-703, paragraphs 102, 111-117, 151, and 158-159)(Che: Figures 2-3 element 210-220, paragraphs 29, 42-43, 49, and 55)(Che disclosed the task 
an access interface configured to access an off-chip memory (Che: Figure 1 element 106, paragraph 19-20)(The memory controller accesses the off-chip host memory.); and
a host interface configured to interact with an off-chip host processor (Che: Figure 2, paragraphs 29-30)(Official notice is given that accelerators can receive offloaded tasks for execution from a host processor and host interface for the advantage of allowing parallel execution on a host processor and faster execution of offloaded tasks on an accelerator. Thus, it would have been obvious to one of ordinary skill in the art to implement a host processor and host interface in Che for the above advantages.).
The advantage of work queues buffering tasks is that tasks waiting to be executed can be stored and load balanced. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the work queues of Wu into the processing system of Che for the above advantage.
As per claim 11:
The additional limitation(s) of claim 11 basically recite the additional limitation(s) of 
As per claim 12:
The additional limitation(s) of claim 12 basically recite the additional limitation(s) of claim 2. Therefore, claim 12 is rejected for the same reason(s) as claim 2.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Dash et al. (U.S. 2020/0043123), taught scheduling of graphic tasks.
Li et al. (U.S. 2021/0064425), taught task queues and task processing.
Balakrishnan et al. (U.S. 9,841,998), taught task classification and class queues.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183