DETAILED ACTION
Claims 1-7 and 9-10 are pending.
The office acknowledges the following papers:
Claims, specification, and remarks filed on 12/21/2021.

	Withdrawn objections and rejections
The specification objections have been withdrawn due to amendment.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. See In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970);and, In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent is shown to be commonly owned with this application. See 37 CFR 1.130(b).
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Applicants can file an eTerminal Disclaimer (eTD) in utility applications filed under 35 U.S.C. 111(a) or in compliance with 35 U.S.C. 371, and design applications. Filing an eTD via EFS-Web is highly recommended due to an extensive backlog for processing paper TDs. However, applicants may still file a TD for manual review.
Claims 1-2, 5, 7, and 10 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-10 of copending Application No. 
Instant Application
Copending Application
1. A configurable heterogeneous Artificial Intelligence (AI) processor, comprising:
1. An on-chip heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units, wherein each of the computation units is associated with a task queue, at least one of the computation units is a customized computation unit for a particular Al algorithm or operation, and at least another one of the computation units is a programmable computation unit;
at least two different architectural types of computation units, each of the computation units being associated with a task queue configured to store computation subtasks to be executed by the computation unit, wherein a first one of the computation units is a customized computation unit for a particular Al algorithm or operation, and a second one of the computation units is a programmable computation unit;
a storage unit;
a storage unit configured to store data associated with executing the plurality of computation subtasks; and
a controller, wherein the controller comprises a task scheduling module, a task synchronization module and an access control module, and wherein:

the task scheduling module is configured to partition, according to a configuration option indicating one or more task allocations, a computation graph associated with a neural network into a plurality of computation subtasks, distribute the computation subtasks to the respective task queues of the computation units, and set a dependency among the computation subtasks, wherein at least one of the task allocations is based on type matching and the controller is configured to distribute each of the computation subtasks according to a type of the computation subtask to the task queue associated with a computation unit that is configured to 

the task synchronization module is configured to realize the synchronization of the computation subtasks according to the set dependency; and

the access control module is configured to control access to data involved in the computation subtasks on the storage unit and an off-chip memory.
an access interface configured to access an off-chip memory.

The limitations not shown by the claims in copending application 16/812,817 are read upon as specified by the rejection below.
This is a provisional obviousness-type double patenting rejection.  
Claims 3-4, 6, and 9 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-10 of copending Application No. 16/812,817 in view of Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521), in view of Official Notice.
The limitations not shown by the claims in copending application 16/812,817 are read upon as specified by the rejection below.
This is a provisional obviousness-type double patenting rejection.  

New Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7 and 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Che et al. (U.S. 2020/0249998), in view of Wu et al. (U.S. 2020/0012521), in view of Official Notice.
As per claim 1:
Che and Wu disclosed a configurable heterogeneous Artificial Intelligence (AI) processor, comprising:
at least two different architectural types of computation units (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.), wherein each of the computation units is associated with a task queue (Wu: Figures 1, 3, 7, and 9 elements 620, S302-303, and S702-703, paragraphs 102, 111-117, and 158-159)(Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(Wu disclosed distributing tasks of a directed acyclic graph (DAG) to a distributed set of work queues for parallel execution on processor cores. The combination adds the work queues of Wu into the neural network processing unit of Che such that the target devices include work queues.), at least one of the computation units is a customized computation unit for a particular Al algorithm or operation (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators. Official notice is given that such 
a storage unit (Wu: Figure 1 element 620, paragraph 102)(Che: Figures 1-2 elements 116 and 220, paragraphs 19-20 and 28-29)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In addition, the combination allows for Che to include caches for each target device.); and 
a controller, wherein the controller comprises a task scheduling module (Che: Figure 2 element 210, paragraph 29), a task synchronization module (Che: Figures 2-3 elements 210 and 214-215, paragraphs 29, 42-43, and 55) and an access control module (Che: Figure 1 element 106, paragraph 19-20), and wherein:
the task scheduling module is configured to partition, according to a configuration option indicating one or more task allocations, a computation graph associated with a neural network into a plurality of computation subtasks (Che: Figure 2-3 elements 210, 311-313, and 403, paragraphs 14, 32-33, 35, 37-40, and 42)(The graph partitioner of the scheduler partitions a received computation graph into a plurality of subsets. Neural network models can be graphically represented by such computation graphs. The scheduler can be configured to create supernodes that offloaded to target devices.), 
the task synchronization module is configured to realize the synchronization of the computation subtasks according to the set dependency (Wu: Figures 3 and 7 elements S303 and S703, paragraphs 116-117, 153-156, and 159-160)(Che: Figure 3 elements 214-215, paragraphs 42-43 and 55)(The combination allows for Che to set references counts (i.e. set a dependency) based on the number of task dependencies within the computation graph. Wu disclosed execution and finishing of tasks updates reference counts for remaining dependent tasks to be executed. The combination further allows for Che to schedule execution of a task based on a reference count of the task reaching zero.); and
the access control module is configured to control access to data involved in the computation subtasks on the storage unit and an off-chip memory (Wu: Figure 1 element 620, paragraph 102)(Che: Figure 1 element 106, paragraph 19-20)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In addition, the combination allows for Che to include caches for each target device (i.e. local memory).).

As per claim 2:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to set, according to a configuration option indicating an operation mode, an operation mode for the computation units, the operation mode comprising an independent parallel mode, a cooperative parallel mode or an interactive cooperation mode (Wu: Figures 1, 3, 7, and 9 elements 620, S303, and S703, paragraphs 102, 116-117, 153-156, and 159-160)(Che: Figures 1-3 elements 100, 214-215, and 220, paragraphs 14, 26, 30, 42-43 and 55)(The combination adds the work queues of Wu into the neural network processing unit of Che such that the target devices include work queues. The combination allows for Che to set references counts (i.e. set a dependency) based on the number of task dependencies within the computation graph. Wu disclosed execution and finishing of tasks updates reference counts for remaining dependent tasks to be executed. The combination further allows for Che to schedule execution of a task based on a reference count of the task reaching zero. The operation mode for the target devices is based on the tasks in queue and their dependencies.), wherein:
in the independent parallel mode, the computation subtasks of the computation units are executed independently and in parallel with each other (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 
in the cooperative parallel mode, the computation subtasks of the computation units are executed cooperatively in a pipelined manner (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(Che disclosed task allocation to target devices, but doesn’t explicitly state if task execution is performed sequentially or parallel. Wu disclosed distributing tasks that can be executed in parallel to parallel work queues for execution. The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other.); and
in the interactive cooperation mode, a first one of the computation units, during the execution of a computation subtask distributed to the first one of the computation units, waits for or depends on results generated by a second one of the computation units executing a computation subtask distributed to the second one of the computation unit (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(Che disclosed task allocation to target devices, but doesn’t explicitly state if task execution is performed sequentially or parallel. Wu disclosed distributing tasks that can be executed in parallel to parallel work queues for execution. The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent 
As per claim 3:
Che and Wu disclosed the heterogeneous Al processor according to claim 2, wherein the storage unit comprises a cache memory and a scratch-pad memory (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include shared cache and scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement shared on-chip cache and scratch-pad memories in Che.).
As per claim 4:
Che and Wu disclosed the heterogeneous Al processor according to claim 3, wherein the access control module is configured to set, according to the set operation mode for the computation units, a storage location for data shared among the computation units (Wu: Figure 1 element 620, paragraph 102)(Che: Figure 1 element 106, paragraph 19-20)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip.), wherein:
in the independent parallel mode, the storage location is set on the off-chip memory (Wu: Figures 1, 3, and 8 elements 104 and S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 19, 29, and 48)(The combination allows for executing a plurality of independent tasks in the computation graph of Che in parallel when the tasks aren’t dependent upon each other. 
in the cooperative parallel mode, the storage location is set on the scratch-pad memory (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 29 and 48)(The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other. In view of the above official notice, the heterogeneous system includes a shared scratch-pad memory.); and
in the interactive cooperation mode, the storage location is set on the cache memory (Wu: Figures 3 and 8 elements S302-S303, paragraphs 111-116, 142-145)(Che: Figures 2 and 4 elements 210-220 and 403, paragraphs 19, 29, and 48)(The combination allows for executing a plurality of dependent tasks in the computation graph of Che in sequential order when the tasks are dependent upon each other. Task data and instructions are stored in the global memory. Official notice is given that the global memory can be implemented as a higher-level cache memory for the advantage of increased memory access speeds. Thus, it would have been obvious to one of ordinary skill in the art to implement the global memory as a cache.).
As per claim 5:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to perform, according to a configuration option indicating operator fusion, operator fusion on the computation subtasks allocated to a computation unit (Che: Figures 3-4 elements 212 and 411, paragraph 37)(The graph optimizer allows for fusing subgraphs of the computation graph for scheduling to a single target device.).
As per claim 6:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the storage unit comprises a scratch-pad memory (Che: Figure 1 element 104, paragraphs 19 and 28)(The host memory is used to load instructions and data of tasks to the cores of the neural network processing unit for task execution. Official notice is given that storage elements can include scratch-pad memories for the advantage of faster memory access. Thus, it would have been obvious to one of ordinary skill in the art to implement off-chip scratch-pad memories in Che.) and the task scheduling module is further configured to notify, according to a configuration option indicating inter-layer fusion, the access control module to store outputs from one or more intermediate layers of the neural network in the scratch-pad memory (Wu: Figure 1 element 620, paragraph 102)(Che: Figure 1 element 106, paragraph 19-20 and 28-29)(The memory controller is used to read/write data from/to the global memory from/to external memory outside of the chip and local internal memory within the chip. In view of the above official notice, a scratch-pad memory is added to Che to store data. Official notice is given that task execution results can be written back to memory for storage for the advantage of saving results for later processing. Thus, it would have been obvious to one of ordinary skill in the art to implement task execution result writeback to memory, including the added scratch-pad memory.).
As per claim 7:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the architectural types of the computation units include one Application Specific Integrated Circuit (ASIC), General-Purpose Graphics Processing Unit (GPGPU), Field-
As per claim 9:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the computation units comprise a computation unit of an Application Specific Integrated Circuit ASIC architecture (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.) and a computation unit of a General-Purpose Graphics Processing Unit GPGPU architecture (Che: Figures 1-2 elements 100 and 220, paragraphs 14, 26, and 30)(The neural network processing unit is a heterogeneous platform that can include SIMD/GPU/FPGA/ASIC accelerators.).
As per claim 10:
Che and Wu disclosed the heterogeneous Al processor according to claim 1, wherein the task scheduling module is further configured to distribute the computation subtasks to the respective computation units according to the capabilities of the computation units (Che: Figure 2 element 210, paragraph 14 and 29)(The scheduler takes into consideration target device capabilities to process received tasks for execution.).

Response to Arguments
The arguments presented by Applicant in the response, received on 12/21/2021 are not considered persuasive.
Applicant argues for claim 1:
“Che may generally mention that a heterogeneous platform may include various accelerators such as GPUs, FPGAs, and ASICs, and that each of the accelerators may be used to process operations associated with machine-learning (ML) or a deep-learning (DL) model. All of accelerators in Che, however, are specific to the ML or DL operations. This is evident from at least Figure 1 of Che, which shows that the top layer of cores provides circuitry representing an input layer to a neural network, while the second layer of cores provides circuitry representing a hidden layer of the neural network. Accordingly, Che merely contemplates that the various types of accelerators are all used for Al related tasks such as machine-learning, deep-learning, graphics processing, etc. 
In contrast, the solutions recited by amended claim 1 provide not only that two different architectural types of computation units are included in a processor, but also that at least one of the computation units is a customized computation unit for a particular AI algorithm or operation, and at least another one of the computation units is a programmable computation unit (e.g., such as a computation unit configured to perform various common computation tasks). Che never teaches or suggests that a processor may include computation units of different architectural types, let alone that at least one of the computation units is a customized computation unit for a particular Al algorithm or operation while at least another one of the computation units is a programmable computation unit. Therefore, Che cannot render claim 1 obvious under 35 U.S.C. 103.”  

This argument is not found to be persuasive for the following reason. Che disclosed a heterogeneous platform that includes various accelerators (e.g. GPUs, FPGAs, ASICs, etc.). Che alone doesn’t specify what types of applications/software programs are executed by the various accelerators. Official notice was given that it’s well-known to one of ordinary skill in the art that these types of accelerators (e.g. GPU/ASIC) execute AI algorithms/operations. As such, an accelerator executing such a program reads upon the customized computation unit. The FPGA of the heterogeneous platform reads upon the programmable computation unit. Thus, reading upon the claimed limitations.
Applicant argues for claim 1:
“The deficiencies of Che cannot be cured by Wu. Wu may generally discuss distributing a plurality of tasks to a plurality of work queues of a processor 

This argument is not found to be persuasive for the following reason. Wu disclosed distributing tasks to work queues of processor cores with the capability of executing a particular algorithm (i.e. type). The combination adds the work queues and the task capability distribution method of Wu into Che. Thus, reading upon the newly claimed limitation.
Applicant argues for claim 1:
“Last but not least, Che is directed towards producing a sequence of nodes for representing an execution order of operations and a sequence of target devices corresponding to the sequence of nodes. No work queue is required in Che. So, those skilled in the art would not have any motivation to combine the work queue of Wu with the heterogeneous platform of Che.” 

This argument is not found to be persuasive for the following reason. Proper motivation was given for the advantage of implementing work queues (i.e. buffering work and load balancing). Thus, the combination is proper.

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183