DETAILED ACTION
1.	This communication is in response to the amendments filed on June 29, 2022 for Application No. 15/971,872 in which claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
3.	The amendments filed on June 29, 2022 have been considered. Claims 1-18 and 20 have been amended. Claims 1-20 are pending and presented for examination.

4.	Applicant’s arguments with respect to the 35 U.S.C. 102(a)(1) rejection of Claims 1-2, 6, 8-12, 16, and 18-20 and Applicant’s arguments with respect to the 35 U.S.C. 103 rejection of Claims 3-5, 7, 13-15, and 17 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. While the Henry reference did not explicitly disclose “a plurality of task queue circuits referencing a plurality of tasks”, the Chung reference of record (US PG-PUB 20160379109) teaches a plurality of task queue circuits referencing a plurality of tasks (Chung, Par. [0248], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804. Each queue is associated with a different respective model. For example, queue 1 is associated with model 1, queue 2 is associated with model 2, queue 3 is associated with model 3, and so on.”, therefore, the queue manager component consists of a plurality of task queue circuits which reference a plurality of tasks). Further, the amendment which featured the newly added limitation of “retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits” is taught by the newly introduced Talpes reference (US PG-PUB 20190026237) (Talpes, Par. [0032], “Arbiter 123 queues the read requests and determines when each read request may be granted access to memory 102. In various embodiments, the request are queued in a first-in-first-out manner by arbiter 123. In some embodiments, the requests are queued by arbiter 123 by associating a priority with each request.”, therefore, the task arbiter is able to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits). Similarly, the amendment which featured the newly added limitation of “dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks” is also taught by the Talpes reference (Talpes, Par. [0150], “At 1109, a read request is dequeued from the read queue. In various embodiments, the read request corresponds to a read request queued at 1103. For example, one or more read requests are queued in a read queue at 1103 and the first arrived request is dequeued at 1109. The first arrived request corresponds to the request that arrived the earliest. In some embodiments, the request with the highest priority is dequeued and may not correspond to the request that arrived the earliest. In some embodiments, the request is a memory request for a subset of elements located consecutively in memory. In various embodiments, once a read request is dequeued, the read corresponding to the request is performed to retrieve the data requested from memory.”, thus, the arbiter is configured to dequeue one or more tasks from the task queue circuits according to their priority). The combination of Henry in view of Chung further in view of Talpes teaches the limitations of newly amended Claims 1-20. Thus, Claims 1-20 are rejected by 35 U.S.C. 103. The updated claim limitation mapping and motivation to combine has been presented in the subsequent 35 U.S.C. 103 section below. 

5.	Examiner acknowledges the applicant’s response regarding the provisional rejection for double patenting of Claims 1-6, 8-9, 11-15, and 20 as being unpatentable over Claims 1-5, 7, 10-16, and 20 of copending Application No. 15971276 (reference application). The Double Patenting rejection below has been updated to consider the amendments filed for both the instant and copending application. However, the provisional double patenting rejection is maintained as the pending claims 1-20 are still rejected under 35 U.S.C. 103 in the subsequent section below. 

Double Patenting
5. 	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
6. 	Claims 1-6, 8, 9, 11-15, and 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-5, 7, 10-16, and 20 of copending Application No. 15971276 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of the instant application are anticipated by the claims of the copending application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
	The subject matter claimed in the instant application is fully disclosed in the copending application and is covered by the copending application, since the instant and copending applications are claiming common subject matter. With respect to the claims of the instant application, please refer to the following table. The bolded portions below highlight the differences between the instant and copending application, which illustrates the obvious and anticipatory relationship of the claim limitations at issue:

Instant Application (15971872)
Copending Application (15971276)
Claim 1: A neural processor circuit, comprising: 
a neural engine circuit; and 

neural task manager circuit coupled to the neural engine, the neural task manager including: 


a plurality of task queue circuits referencing a plurality of tasks, a task queue circuit configured to store a reference to a task list including tasks that instantiate a neural network, a task comprising configuration data of the task stored in a location of a memory external to the neural processor circuit; and 

a task arbiter circuit coupled to the plurality of task queue circuits, the task arbiter circuit configured to: 


retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits;
dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks, dequeuing the one or more tasks comprising causing the neural task manager circuit to retrieve the configuration data of a dequeued task from the location of the memory external to the neural processor circuit based on the reference to the task list stored in the task queue circuit from which the dequeued task is dequeued; and 

cause the neural task manager circuit to provide a portion of the configuration data of the dequeued task to the neural engine circuit, the portion of the configuration data programming the neural engine circuit to execute the dequeued task.
Claim 1: A neural processor circuit, comprising:
a neural engine circuit configured to perform neural operations; and 
a neural task manager circuit coupled to the neural engine circuit to program the neural engine circuit, the neural task manager circuit including: 
a first task queue circuit configured to store a reference to a first task list of first tasks for instantiating a first neural network by the neural engine circuit, a second task queue circuit configured to store a reference to a second task list of second tasks for instantiating a second neural network by the neural engine circuit, and 

a task arbiter circuit configured to perform a task switch, the task arbiter circuit configured to, 


during execution of one of the first tasks by the neural engine circuit and prior to execution of another one of the first tasks by the neural engine circuit, 










send configuration data for one of the second tasks from a memory external to the neural processor circuit to the neural engine circuit for programming the neural engine circuit to instantiate at least a portion of the second neural network by executing the one of the second tasks.
Claim 2: The neural processor circuit of claim 1, wherein the dequeued task when executed 
instantiates a single network layer of the neural network, multiple network layers of the neural network, or a portion of a network layer of the neural network.
Claim 2: The neural processor circuit of claim 1, wherein: each of the first tasks, when executed, instantiates a single network layer of the first neural network, multiple network layers of the first neural network, or a portion of a network layer of the first neural network; and each of the second tasks, when executed, instantiates a single network layer of the second neural network, multiple network layers of the second neural network, or a portion of a network layer of the second neural network.
Claim 3: The neural processor circuit of claim 1, wherein: 
the task arbiter circuit is further configured to store the configuration data in a configuration queue of the neural task manager circuit, the configuration queue coupled to the neural engine circuit and configured to provide the portion of the configuration data to the neural engine circuit; and 

the neural processor circuit further includes: 
a kernel direct memory access (DMA) configured to retrieve kernel data of the dequeued task from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue; and 
a buffer direct memory access (DMA) configured to retrieve input data of the dequeued task from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue.
Claim 4: The neural processor circuit of claim 1, wherein:
task arbiter circuit is further configured to 
store the configuration data in a configuration queue of the neural task manager circuit, the configuration queue coupled to the neural engine circuit and configured to provide the configuration data to the neural engine circuit; and 

the neural processor circuit further includes: a kernel direct memory access (DMA) configured to retrieve kernel data of the one of the second tasks from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue; and 49 32685/39166/FW/9994596.16 
a buffer direct memory access (DMA) configured to retrieve input data of the one of the second tasks from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue.
Claim 4: The neural processor circuit of claim 3, wherein: the neural task manager circuit further includes: 
a fetch queue coupled to the configuration queue; and 

a task manager direct memory access (DMA) coupled to the fetch queue and the task arbiter circuit; and 

the task arbiter circuit is further configured to retrieve the configuration data of the dequeued task from the location of the memory external to the neural processor circuit via the task manager DMA and store the configuration data in the fetch queue, the fetch queue providing the configuration data to the configuration queue when second configuration data of an executed task is removed from the configuration queue.
Claim 5: The neural processor circuit of claim 4, wherein: the neural task manager circuit further includes: 
a fetch queue coupled to the configuration queue; and 

a task manager direct memory access (DMA) coupled to the fetch queue and the task arbiter circuit; and 

the task arbiter circuit is further configured to retrieve the configuration data of the one of the second tasks from the memory external to the neural processor circuit via the task manager DMA and store the configuration data in the fetch queue, the fetch queue providing the configuration data to the configuration queue when second configuration data of an executed task is removed from the configuration queue.
Claim 5: The neural processor circuit of claim 4, wherein: 
The plurality of task queue circuits comprise a first task queue circuit and a second task queue circuit, the first task queue circuit having a first priority and the second task queue circuit having a second priority; and 

the task arbiter circuit is configured to store one of (i) the configuration data of the dequeued task or (ii) another configuration data of another task in the fetch queue, based on a comparison of the first priority and the second priority.
Claim 3: The neural processor circuit of claim 1, wherein:
the second task list includes the configuration data of the one of the second tasks stored in a location of the memory external to the neural processor circuit; and 


the task arbiter circuit retrieves the configuration data of the one of the second tasks from the memory external to the neural processor circuit by referencing the second task list stored in the second task queue circuit according to the second task queue having a higher priority than the first task queue.
Claim 6: The neural processor circuit of claim 1, further comprising a data buffer coupled to the memory external to the neural processor circuit and the neural engine circuit, and wherein the task arbiter circuit is further configured to provide another portion of the configuration data to the data buffer, the other portion of the configuration data programming the data buffer to broadcast a work unit of input data of the dequeued task to the neural engine circuit.
Claim 7: The neural processor circuit of claim 1, wherein the neural engine circuit is configured to: receive input data of the one of the first tasks from a data buffer coupled to the neural engine circuit; 50 32685/39166/FW/9994596.16 generate output data of the one of the first tasks from the input data; provide the output data to the memory external to the neural processor circuit; receive second input data of the one of the second tasks from the memory external to the neural processor circuit; generate second output data of the one of the second tasks from the second input data; and provide the second output data to the data buffer.
Claim 8: The neural processor circuit of claim 1, wherein: the neural engine circuit includes: 
an input buffer circuit coupled to the neural task manager; and 
a multiply-add (MAD) circuit coupled to the input buffer circuit; and 

the portion of the configuration data programs the input buffer circuit to provide a portion of input data of the dequeued task stored in the input buffer circuit to the MAD circuit.
Claim 10: The neural processor circuit of claim 1, wherein: the neural engine circuit includes: 
an input buffer circuit coupled to the neural task manager circuit; and 
a multiply-add (MAD) circuit coupled to the input buffer circuit; and 

the configuration data programs the input buffer circuit to provide a portion of input data of the one of the second tasks stored in the input buffer circuit to the MAD circuit.
Claim 9: The neural processor circuit of claim 8, further comprising a data buffer coupled to the memory external to the neural processor circuit and the neural engine circuit. wherein the neural engine further includes an output circuit, and wherein the portion of the configuration data programs the output circuit to provide output data from the MAD circuit to the data buffer.
Claim 11: The neural processor circuit of claim 10, further comprising a data buffer coupled to the memory external to the neural processor circuit and the neural engine circuit, wherein the neural engine further includes an output circuit, and wherein the configuration data programs the output circuit to provide output data from the MAD circuit to the data buffer circuit.
Claim 11: A method of managing tasks in a neural processor circuit, comprising: 
referencing a plurality of tasks in a plurality of task queue circuits;
storing, in a task queue circuit of the neural processor circuit, a reference to a task list of tasks that instantiates a neural network, a task comprising configuration data of the task stored in a location of a memory external to the neural processor circuit; 

retrieving, at a task arbiter circuit coupled to the plurality of task queue circuits, priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits: 


6dequeuing one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks, dequeuing the one or more tasks comprising causing the neural task manager circuit to retrieve the configuration data of a dequeued task from the location of the memory external to the neural processor circuit based on the reference to the task list stored in the task queue circuit; and 

cause the neural task manager circuit to provide a portion of the configuration data of the dequeued task to a neural engine circuit of the neural processor circuit, the portion of the configuration data programming the neural engine circuit to execute the dequeued task.
Claim 12: A method of task switching in a neural processor circuit, comprising:


storing, in a first task queue circuit of a neural task manager circuit of the neural processor circuit, a reference to a first task list of first tasks that instantiate a first neural network by a neural engine circuit; 


storing, in a second task queue circuit of the neural task manager circuit, a reference to a second task list of second tasks that instantiate a second neural network by the neural engine circuit; and 

performing a task switch comprising, during execution of one of the first tasks by the neural engine circuit and prior to execution of another one of the first tasks by the neural engine circuit, 






sending, by a task arbiter circuit of the neural task manager circuit coupled to the first and second task queue circuits, configuration data for one of the second tasks, from a memory external to the neural processor circuit, for programming the neural engine circuit 52 32685/39166/FW/9994596.16 to instantiate at least a portion of the second neural network by executing the one of the second tasks.
Claim 12: The method of claim 11, wherein the  dequeued task when executed instantiates a single network layer of the neural network, multiple network layers of the neural network, or a portion of a network layer of the neural network.
Claim 13: The method of claim 12, wherein: each of the first tasks, when executed, instantiates a single network layer of the first neural network, multiple network layers of the first neural network, or a portion of a network layer of the first neural network; and each of the second tasks, when executed, instantiates a single network layer of the second neural network, multiple network layers of the second neural network, or a portion of a network layer of the second neural network.
Claim 13: The method of claim 11, further comprising: 
storing the configuration data in a configuration queue of the neural processor circuit; and 


providing the portion of the configuration data to the neural engine circuit from the configuration queue; 

retrieving kernel data of the dequeued task from the external memory when the configuration data is stored in the configuration queue; and 



retrieving input data of the dequeued task from the external memory when the configuration data is stored in the configuration queue.
Claim 15: The method of claim 12, further comprising: 
storing, by the task arbiter circuit, the configuration data in a configuration queue of the neural task manager circuit, the configuration queue coupled to the neural engine circuit and configured to provide the configuration data to the neural engine circuit; 53 32685/39166/FW/9994596.16 


retrieving, by a kernel direct memory access (DMA), kernel data of the one of the second tasks from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue; and 

retrieving, by a buffer direct memory access (DMA), input data of the one of the second tasks from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue.
Claim 14: The method of claim 13, further comprising: 
retrieving the configuration data of the dequeued task from the location of the external memory via a task manager direct memory access (DMA) of the neural processor circuit; and 

storing the configuration data in a fetch queue of the neural processor circuit; and 

providing the configuration data from the fetch queue to the configuration queue when another configuration data of an executed task is removed from the configuration queue.
Claim 16: The method of claim 15, wherein 

the task arbiter circuit retrieves the configuration data of the one of the second tasks via a task manager direct memory access (DMA), and further comprising:

storing the configuration data in a fetch queue, the fetch queue coupled to the configuration queue; and 
providing, by the fetch queue, the configuration data to the configuration queue when second configuration data of an executed task is removed from the configuration queue.
Claim 15: The method of claim 14, further comprising: 
storing, a first priority corresponding to a first task queue circuit and a second priority corresponding to a second task queue circuit; and 


storing one of (i) the configuration data of the dequeued task or (ii) other configuration data of another task in the fetch queue based on a comparison of the first priority and the second priority.
Claim 14: The method of claim 12, wherein: 

the second task list includes the configuration data of the one of the second tasks stored in a location of the memory external to the neural processor circuit; and 

the method further includes retrieving, by the task arbiter circuit, the configuration data of the one of the second tasks from the memory external to the neural processor circuit by referencing the second task list stored in the second task queue circuit according to the second task list having a higher priority than the first task list.
Claim 20: An integrated circuit (IC) system comprising a neural processor circuit, the neural processor circuit comprising: 
a neural engine circuit; and 

a neural task manager circuit coupled to the neural engine circuit, the neural task manager circuit including: 


a plurality of task queue circuits referencing a plurality of tasks, a task queue circuit configured to store a reference to a task list of tasks that instantiates a neural network, a task comprising configuration data of the task stored in a location of a memory external to the neural processor circuit; and 

a task arbiter circuit coupled to the plurality of task queue circuits, the task arbiter circuit configured to: 

retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits;
dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks, dequeuing the one or more tasks comprising causing the neural task manager circuit to retrieve the configuration data of a dequeued task from the location of the memory external to the neural processor circuit based on the reference to the task list stored in the task queue circuit from which the dequeued task is dequeued; and 

cause the neural task manager circuit to provide at least a portion of the configuration data of the dequeued task to the neural engine circuit, the at least a portion of the configuration data programming the neural engine circuit to execute the dequeued task.
Claim 20: An integrated circuit (IC) system comprising a neural processor circuit, the neural processor circuit comprising:
a neural engine circuit configured to preform neural operations; and 
a neural task manager circuit coupled to the neural engine circuit to program the neural engine circuit, the neural task manager circuit including: 55 32685/39166/FW/9994596.16 

a first task queue circuit configured to store a first task list of first tasks for instantiating a first neural network by the neural engine circuit, a second task queue circuit configured to store a second task list of second tasks for instantiating a second neural network by the neural engine circuit, and 

a task arbiter circuit configured to perform a task switch, the task arbiter configured to, during execution of one of the first tasks by the neural engine circuit and prior to execution of another one of the first tasks by the neural engine circuit, 












send configuration data for one of the second tasks from a memory external to the neural processor circuit, to the neural engine circuit for programming the neural engine circuit to 
instantiate at least a portion of the second neural network by executing the one of the second tasks.


Claim Rejections - 35 USC § 103
 7.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

8.	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Henry et al. (hereinafter Henry) (US PG-PUB 20180225116), in view of Chung et al. (hereinafter Chung) (US PG-PUB 20160379109), further in view of Talpes et al. (hereinafter Talpes) (US PG-PUB 20190026237).
Regarding Claim 1, Henry teaches a neural processor circuit, comprising: 
a neural engine circuit (Henry, Figs. 1 & 2, N Neural Processing Units (NPU) 126); 
and a neural task manager circuit coupled to the neural engine, the neural task manager including (Henry, Fig. 1, comprising Instruction Fetch Unit 101 & Instruction Cache 102): 
a task queue circuit configured to store a reference to a task list including tasks that instantiate a neural network, a task comprising configuration data of the task stored in a location of a memory external to the neural processor circuit (Henry, Par. [0096], “The instruction cache 102 caches the architectural instructions 103 fetched from a system memory that is coupled to the processor 100. The architectural instructions 103 include a move to neural network (MTNN) instruction and a move from neural network (MFNN) instruction…”, thus, the instruction cache/task queue circuit is able to store a reference to a task list to instantiate a neural network & these instruction are located in the system memory, external to the neural processor circuit); 
and a task arbiter circuit (Henry, Fig. 1, Instruction Fetch Unit 101 which functions analogously to the task arbiter circuit), coupled to the plurality of task queue circuits (See introduction of Chung reference below, which teaches a plurality of task queue circuits), the task arbiter circuit configured to: 
dequeuing the one or more tasks (See introduction of Talpes reference below, which teaches a task arbiter able to dequeue one or more tasks according to retrieved priority parameters) comprising  causing the neural task manager circuit to retrieve the configuration data of a dequeued task from the location of the memory external to the neural processor circuit based on the reference to the task list stored in the task queue circuit from which the dequeued task is dequeued (Henry, Par. [0095], “The instruction fetch unit 101 controls the fetching of architectural instructions 103 from system memory (not shown) into the instruction cache 102. The instruction fetch unit 101 provides a fetch address to the instruction cache 102 that specifies a memory address at which the processor 100 fetches a cache line of architectural instruction bytes into the instruction cache”, thus, the instruction fetch unit is able to retrieve the architectural instructions from the system memory by interacting with the instruction cache) and 
cause the neural task manager circuit to provide a portion of the configuration data of the dequeued task (See introduction of Talpes reference below, which teaches a task arbiter able to dequeue one or more tasks according to retrieved priority parameters) to the neural engine circuit, the portion of the configuration data programming the neural engine circuit to execute the dequeued task (Henry, Par. [0101], “The reservation stations 108 hold microinstructions 105 until they are ready to be issued to an execution unit 112/121 for execution. A microinstruction 105 is ready to be issued when all of its source operands are available and an execution 112/121 is available to execute it”, referencing Fig. 1 also shows that the reservation stations 108 take cascaded input from the instruction fetch unit 101 & feed into the neural network unit (NNU) 121 for execution).

Despite Henry disclosing a task queue circuit, Henry does not disclose a plurality of task queue circuits referencing a plurality of tasks.
However, Chung discloses a plurality of task queue circuits referencing a plurality of tasks (Chung, Par. [0248], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804. Each queue is associated with a different respective model. For example, queue 1 is associated with model 1, queue 2 is associated with model 2, queue 3 is associated with model 3, and so on.”, therefore, the queue manager component consists of a plurality of task queue circuits which reference a plurality of tasks),

Henry does not disclose the task arbiter circuit configured to retrieve priority parameters associated with the plurality of tasks in the plurality of task queue circuits
However, Talpes teaches the task arbiter circuit configured to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits (Talpes, Par. [0032], “Arbiter 123 queues the read requests and determines when each read request may be granted access to memory 102. In various embodiments, the request are queued in a first-in-first-out manner by arbiter 123. In some embodiments, the requests are queued by arbiter 123 by associating a priority with each request.”, therefore, the task arbiter is able to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits).

Henry does not disclose the task arbiter circuit configured to dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks
However, Talpes teaches the task arbiter circuit configured to dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks (Talpes, Par. [0150], “At 1109, a read request is dequeued from the read queue. In various embodiments, the read request corresponds to a read request queued at 1103. For example, one or more read requests are queued in a read queue at 1103 and the first arrived request is dequeued at 1109. The first arrived request corresponds to the request that arrived the earliest. In some embodiments, the request with the highest priority is dequeued and may not correspond to the request that arrived the earliest. In some embodiments, the request is a memory request for a subset of elements located consecutively in memory. In various embodiments, once a read request is dequeued, the read corresponding to the request is performed to retrieve the data requested from memory.”, thus, the arbiter is configured to dequeue one or more tasks from the task queue circuits according to their priority).

While Henry discloses a singular task queue circuit, Henry does not explicitly disclose a plurality of task queue circuits referencing a plurality of tasks. However, Chung teaches a hardware acceleration component for implementing a neural network, which includes a queue manager component, comprising a request processing component and a model loading component which contain a plurality of task queue circuits referencing a plurality of tasks (Chung, Par. [0248], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804. Each queue is associated with a different respective model. For example, queue 1 is associated with model 1, queue 2 is associated with model 2, queue 3 is associated with model 3, and so on.”, therefore, the queue manager component consists of a plurality of task queue circuits which reference a plurality of tasks). It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network unit disclosed by Henry to include the queue manager component with a plurality of task queue circuits referencing a plurality of tasks of Chung. One of ordinary skill in the art would have been motivated to make this modification to produce a neural processing circuit with enhanced queue management to allow the circuit to efficiently handle multiple machine learning operations at once and allow queues to be processed on a priority basis to increase efficiency (Chung, Par. [0248 - 0249], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804” & “Request processing component 3806 also selects among the queues to process based on any policy, such as by selecting among queues on a round-robin basis, queue-fullness basis, priority basis, etc., or any combination thereof. Such a policy may generally seek to fairly arbitrate among queues and requests, while also reducing the frequency at which new queues are selected (and consequently, the frequency at which new models are loaded)”). 

Henry in view of Chung does not explicitly disclose the task arbiter circuit configured to retrieve priority parameters associated with the plurality of tasks in the plurality of task queue circuits and the task arbiter circuit configured to dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks. However, Talpes teaches the task arbiter circuit configured to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits (Talpes, Par. [0032], “Arbiter 123 queues the read requests and determines when each read request may be granted access to memory 102. In various embodiments, the request are queued in a first-in-first-out manner by arbiter 123. In some embodiments, the requests are queued by arbiter 123 by associating a priority with each request.”, therefore, the task arbiter is able to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits) and the task arbiter circuit configured to dequeue one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks (Talpes, Par. [0150], “At 1109, a read request is dequeued from the read queue. In various embodiments, the read request corresponds to a read request queued at 1103. For example, one or more read requests are queued in a read queue at 1103 and the first arrived request is dequeued at 1109. The first arrived request corresponds to the request that arrived the earliest. In some embodiments, the request with the highest priority is dequeued and may not correspond to the request that arrived the earliest. In some embodiments, the request is a memory request for a subset of elements located consecutively in memory. In various embodiments, once a read request is dequeued, the read corresponding to the request is performed to retrieve the data requested from memory.”, thus, the arbiter is configured to dequeue one or more tasks from the task queue circuits according to their priority). It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the neural network unit with a plurality of task queue circuits as disclosed by Henry in view of Chung, to include the retrieval of priority parameters and dequeuing of tasks according to the priority parameters as disclosed by Talpes. One of ordinary skill in the art would have been motivated to make this modification to produce a neural network unit with a task arbiter that is able to queue and dequeue tasks on a priority basis to reduce power consumption and increase efficiency (Talpes, Par. [0002], “Traditionally, these operations may be implemented using a generic microprocessor system that loads the computation data from memory before performing a computational array instruction. While the data is loading, the microprocessor system often sits idle. The software platform running these applications will initiate the computational array instruction once the data has completed loading. The length of stalls and the time required to synchronize the computational operation with the retrieved data can be particularly long for when accessing variable latency memory. Stalls and synchronization efforts by the software platform reduce the efficiency of the microprocessor system and result in higher power consumption and lower throughput. Therefore, there exists a need for a microprocessor system with increased throughput that performs array computational operations using variable latency memory access.”)

Regarding Claim 2, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 1, wherein the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) when executed instantiates a single network layer of the neural network, multiple network layers of the neural network, or a portion of a network layer of the neural network (Henry, Par. [0029] & [0030], “FIG. 14 is a block diagram illustrating a move to neural network (MTNN) architectural instruction and its operation with respect to portions of the NNU of FIG. 1. FIG. 15 is a block diagram illustrating a move from neural network (MFNN) architectural instruction and its operation with respect to portions of the NNU of FIG. 1.”, thus, the architectural instructions may be configured such that a single network layer, multiple layers, or a portion of the network can be instantiated.)
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 3, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 1, wherein: 
the task arbiter circuit is further configured to store the configuration data in a configuration queue of the neural task manager circuit, the configuration queue coupled to the neural engine circuit and configured to provide the portion of the configuration data to the neural engine circuit (Chung, Par. [0249], “Queue manager component 3802 includes a request processing component 3806 and a model loading component 3808…In operation, request processing component 3806 adds each incoming request to an appropriate queue…”, thus, the request processing component manages adding requests to appropriate queues, similar to the configuration queue); 
and the neural processor circuit further includes: 
a kernel direct memory access (DMA) configured to retrieve kernel data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue (Henry, Par. [0596], “The store queue 6324 receives master store requests from the NNU 121 (e.g., from a DMAC 6602) to store data to a ring bus 4024 agent (e.g., system memory) from the data RAM 122 or weight RAM 124”, thus, a DMAC is used in the interaction between the NNU and system memory); 
and a buffer direct memory access (DMA) configured to retrieve input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the memory external to the neural processor circuit when the configuration data is stored in the configuration queue (Henry, Par. [0597], “The NNU 121 also includes a first direct memory access controller (DMAC0) 6602-0, a second direct memory access controller (DMAC1) 6602-1.”, & Fig. 67 depicts the interaction between the DMAC and the NNU in which input data may be retrieved from system memory or sent to the NNU).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 4, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 3, wherein: 
the neural task manager circuit further includes: 
a fetch queue coupled to the configuration queue (Chung, Par. [0249], “Upon switching to a new queue (e.g., having z unprocessed requests therein), model loading component 3808 loads the model associated with that queue into acceleration components 3810…”, thus, the model loading component fetches the data associated with a particular queue); and 
a task manager direct memory access (DMA) coupled to the fetch queue and the task arbiter circuit (Chung, Par. [0249], “Model loading component 3808 may be implemented with one or computer processors with memory store instructions, or dedicated logic gate arrays implemented, for example, in an FPGA, ASIC, or other similar device”, the model loading component may also be coupled with devices that access memory store instructions, such as a DMAC which is utilized throughout the NNU); and 
the task arbiter circuit is further configured to retrieve the configuration data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the location of the memory external to the neural processor circuit via the task manager DMA and store the configuration data in the fetch queue, the fetch queue providing the configuration data to the configuration queue when second configuration data of an executed task is removed from the configuration queue (Chung, Par. [0249], “Upon switching to a new queue (e.g., having z unprocessed requests therein), model loading component 3808 loads the model associated with that queue into acceleration components 3810, and then submits the requests in the queue to acceleration components 3810 for processing based on the loaded new model”, the model loading component is capable of loading models according to the status of each request that is in the queue & the model associated with it).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 5, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 4, wherein: 
the plurality of task queue circuits comprise a first task queue circuit and a second task queue circuit (Chung, Par. [0248], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804. Each queue is associated with a different respective model. For example, queue 1 is associated with model 1, queue 2 is associated with model 2, queue 3 is associated with model 3, and so on.”), the first task queue circuit having a first priority and the second task queue circuit having a second priority (Talpes, Par. [0162], “At 1217, a read is dequeued and the corresponding data element(s) are retrieved from memory. In various embodiments, the read corresponds to the next data read in a read queue. In some embodiments, the next read to be dequeued corresponds to the data read that arrives first. For example, the next read is based on the time the data read is queued in the read queue. In some embodiments, the next read is based on the data read with the highest priority.”, therefore, each task queue is associated with a different priority, where the first queue circuit has a first priority and a second queue circuit has a second priority); and 
the task arbiter circuit is configured to store one of (i) the configuration data of the dequeued task or (ii) another configuration data of another task in the fetch queue, based on a comparison of the first priority and the second priority (Talpes, Par. [0019], “To address these limitations, a microprocessor system for performing high throughput array computational operations is disclosed. In some embodiments, a microprocessor system includes a hardware arbiter to manage memory requests and is in communication with a control unit and a control queue to synchronize computational operations associated with the memory requests. The hardware arbiter queues memory read requests to retrieve data from memory with variable access latency. Each request is queued until the request is granted access to memory and the request can be serviced. A control queue queues a control operation that corresponds to the memory request and describes a computational operation. The dequeueing of the control operation is synchronized with the availability of the data retrieved via the memory read request. The synchronization allows the data retrieved from memory and the control operation to be synchronized and provided to a computational array together to perform a computational operation.”, therefore, the arbiter is configured to store configuration data of the dequeued task or another configuration data of another task based on comparing first and second priority).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 6, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 1, further comprising a data buffer coupled to the memory external to the neural processor circuit and the neural engine circuit, and wherein the task arbiter circuit is further configured to provide another portion of the configuration data to the data buffer, the other portion of the configuration data programming the data buffer to broadcast a work unit of input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) to the neural engine circuit (Henry, Par. [0102] & Fig. 1, “The execution units 112 include one or more load/store units (not shown) that load data from the memory subsystem 114 and store data to the memory subsystem 114. Preferably, the memory subsystem 114 includes a memory management unit (not shown), which may include, e.g., translation lookaside buffers…”, thus, a data buffer may be included as part of execution unit 112 that is coupled to the memory subsystem 114 & may interact with program memory 129 inside of NNU 121 to send and receive configuration data.)
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 7, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 6, further comprising a buffer direct memory access (DMA) coupled to the data buffer and the memory external to the neural processor circuit, and wherein (Chung, Par. [0196], “FIG. 27 shows functionality by which a local host component 2702 may forward information to its local acceleration component 2704 via host interface 2524 shown in FIG. 25 (e.g., using PCIe in conjunction with DMA memory transfer”): 
the task arbiter circuit is further configured to provide a third portion of the configuration data to the buffer DMA, the third portion of the configuration data programming the buffer DMA to retrieve a tile of the input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the memory external to the neural processor circuit and store the tile in the data buffer; and the tile includes multiple work units (Chung, Par. [0267] & Figs. 44-45, “Input buffer array 4404 includes input data buffers IPD.sub.0, IPD.sub.1, . . . , IPD.sub.N-1 for receiving N slices Slice.sub.0, Slice.sub.1, . . . , Slices.sub.N-1, respectively, of D-dimensional input data. For simplicity, the remaining discussion will assume that D=3. In an implementation, input data are segmented into horizontal slices of input data, such as graphically illustrated in FIG. 45.”, the figures depict an input buffer that is used to send and retrieve slices of input data across multiple work units. Furthermore, splitting input data into tiles is straightforward in the case of neural networks, and is known in the art).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 8, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 1, wherein: 
the neural engine circuit includes: 
an input buffer circuit coupled to the neural task manager (Henry, Fig. 2, depicts a register 205 and mux-reg 208 which in conjunction hold inputs from the weight ram and data ram respectively); and 
a multiply-add (MAD) circuit coupled to the input buffer circuit (Henry, Fig. 2, depicts a multiplier 242 and adder 244 comprising arithmetic logic unit (ALU) 204 located within neural processing unit 126); and 
the portion of the configuration data programs the input buffer circuit to provide a portion of input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) stored in the input buffer circuit to the MAD circuit (Henry, Fig. 2, depicts inputs 203 and 209 which feed from the register 205 and mux-reg 208 into the ALU 204).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 9, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 8, further comprising a data buffer coupled to the memory external to the neural processor circuit and the neural engine circuit. wherein the neural engine further includes an output circuit, and wherein the portion of the configuration data programs the output circuit to provide output data from the MAD circuit to the data buffer (Henry, Figs. 1 & 2, depicts the output of the multiplier-adder feeding into an accumulator 202, correspondingly into an activation function unit (AFU) 212, and respectively to the data ram and weight ram from here, referencing Fig. 1 data ram 122 and weight ram 124 are able to interact with media registers 118 and exec units 112 outside of the NNU).

Regarding Claim 10, Henry in view of Chung further in view of Talpes teaches the neural processor circuit of claim 1, further comprising a kernel direct memory access (DMA) coupled to the memory external to the neural processor circuit and the neural engine circuit, and wherein (Henry, Par. [0596], “The store queue receives master store requests from the NNU 121 (e.g., from a DMAC 6602) to store data to a ring bus 4024 agent (e.g. system memory) from the data RAM 122 or weight RAM 124.”, thus, a DMAC is present to interact with the system memory): 
the task arbiter circuit is further configured to provide another portion of the configuration data to the kernel DMA, the other portion of the configuration data programming the kernel DMA to retrieve kernel data from the memory external to the neural processor circuit and provide the kernel data to the neural engine circuit to execute the dequeued (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) task (Henry, Par. [0596], “When the ring bus 4024 responds with an acknowledge (e.g., from system memory) of the data, it is received in buffer 6564. The store queue 6324 then provides the acknowledge to the NNU 121 to notify it that the store has been performed, and the FSM updates the entry 6522 state to available. Preferably, the store queue 6324 does not have to arbitrate to provide the acknowledge to the NNU 121 (e.g., there is a DMAC 6602 for each store queue 6324, as in the embodiment of FIG. 66”, thus, the DMA interacts with the system memory & NNU to retrieve and send kernel data for task execution).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 11, Henry teaches a method of managing tasks in a neural processor circuit, comprising: 
storing, in a task queue circuit of the neural processor circuit, a reference to a task list of tasks that instantiates a neural network, a task comprising configuration data of the task stored in a location of a memory external to the neural processor circuit (Henry, Par. [0096], “The instruction cache 102 caches the architectural instructions 103 fetched from a system memory that is coupled to the processor 100. The architectural instructions 103 include a move to neural network (MTNN) instruction and a move from neural network (MFNN) instruction…”, thus, the instruction cache/task queue circuit is able to store a reference to a task list to instantiate a neural network & these instruction are located in the system memory, external to the neural processor circuit); 
dequeuing the one or more tasks (See below for Talpes teaching of dequeuing tasks according to priority parameters) comprising causing the neural task manager circuit to retrieve the configuration data of a dequeued task from the location of the memory external to the neural processor circuit based on the reference to the task list stored in the task queue circuit (Henry, Par. [0095], “The instruction fetch unit 101 controls the fetching of architectural instructions 103 from system memory (not shown) into the instruction cache 102. The instruction fetch unit 101 provides a fetch address to the instruction cache 102 that specifies a memory address at which the processor 100 fetches a cache line of architectural instruction bytes into the instruction cache”, thus, the instruction fetch unit is able to retrieve the architectural instructions from the system memory by interacting with the instruction cache); and 
cause the neural task manager circuit to provide a portion of the configuration data of the dequeued task (See introduction of Talpes reference below, which teaches a task arbiter able to dequeue one or more tasks according to retrieved priority parameters) to a neural engine circuit of the neural processor circuit, the portion of the configuration data programming the neural engine circuit to execute the dequeued task (Henry, Par. [0101], “The reservation stations 108 hold microinstructions 105 until they are ready to be issued to an execution unit 112/121 for execution. A microinstruction 105 is ready to be issued when all of its source operands are available and an execution 112/121 is available to execute it”, referencing Fig. 1 also shows that the reservation stations 108 take cascaded input from the instruction fetch unit 101 & feed into the neural network unit (NNU) 121 for execution).

Despite Henry disclosing a task queue circuit, Henry does not disclose referencing a plurality of tasks in a plurality of task queue circuits.
However, Chung teaches referencing a plurality of tasks in a plurality of task queue circuits (Chung, Par. [0248], “More specifically, queue manager component 3802 may maintain multiple queues in local memory 3804. Each queue is associated with a different respective model. For example, queue 1 is associated with model 1, queue 2 is associated with model 2, queue 3 is associated with model 3, and so on.”, therefore, the queue manager component consists of a plurality of task queue circuits which reference a plurality of tasks),

Henry does not disclose retrieving, at a task arbiter circuit coupled to the plurality of task queue circuits, priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits
However, Talpes teaches retrieving, at a task arbiter circuit coupled to the plurality of task queue circuits, priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits (Talpes, Par. [0032], “Arbiter 123 queues the read requests and determines when each read request may be granted access to memory 102. In various embodiments, the request are queued in a first-in-first-out manner by arbiter 123. In some embodiments, the requests are queued by arbiter 123 by associating a priority with each request.”, therefore, the task arbiter is able to retrieve priority parameters associated with the plurality of tasks referenced in the plurality of task queue circuits): 

Henry does not disclose dequeuing one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks.
6However, Talpes teaches dequeuing one or more tasks from one or more task queue circuits according to the priority parameters of the plurality of tasks (Talpes, Par. [0150], “At 1109, a read request is dequeued from the read queue. In various embodiments, the read request corresponds to a read request queued at 1103. For example, one or more read requests are queued in a read queue at 1103 and the first arrived request is dequeued at 1109. The first arrived request corresponds to the request that arrived the earliest. In some embodiments, the request with the highest priority is dequeued and may not correspond to the request that arrived the earliest. In some embodiments, the request is a memory request for a subset of elements located consecutively in memory. In various embodiments, once a read request is dequeued, the read corresponding to the request is performed to retrieve the data requested from memory.”, thus, the arbiter is configured to dequeue one or more tasks from the task queue circuits according to their priority).
The reasons of obviousness have been noted in the rejection of Claim 1 above and are applicable herein.

Regarding Claim 12, Henry in view of Chung further in view of Talpes teaches the method of claim 11, wherein the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) when executed instantiates a single network layer of the neural network, multiple network layers of the neural network, or a portion of a network layer of the neural network (Henry, Par. [0029] & [0030], “FIG. 14 is a block diagram illustrating a move to neural network (MTNN) architectural instruction and its operation with respect to portions of the NNU of FIG. 1. FIG. 15 is a block diagram illustrating a move from neural network (MFNN) architectural instruction and its operation with respect to portions of the NNU of FIG. 1.”, thus, the architectural instructions may be configured such that a single network layer, multiple layers, or a portion of the network can be instantiated.)
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 13, Henry in view of Chung further in view of Talpes teaches the method of claim 11, further comprising: 
storing the configuration data in a configuration queue of the neural processor circuit; and providing the portion of the configuration data to the neural engine circuit from the configuration queue (Chung, Par. [0249], “Queue manager component 3802 includes a request processing component 3806 and a model loading component 3808…In operation, request processing component 3806 adds each incoming request to an appropriate queue…”, thus, the request processing component manages adding requests to appropriate queues, similar to the configuration queue); 
retrieving kernel data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the external memory when the configuration data is stored in the configuration queue (Henry, Par. [0596], “The store queue 6324 receives master store requests from the NNU 121 (e.g., from a DMAC 6602) to store data to a ring bus 4024 agent (e.g., system memory) from the data RAM 122 or weight RAM 124”, thus, a DMAC is used in the interaction between the NNU and system memory); and 
retrieving input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the external memory when the configuration data is stored in the configuration queue (Henry, Par. [0597], “The NNU 121 also includes a first direct memory access controller (DMAC0) 6602-0, a second direct memory access controller (DMAC1) 6602-1.”, & Fig. 67 depicts the interaction between the DMAC and the NNU in which input data may be retrieved from system memory or sent to the NNU).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 14, Henry in view of Chung further in view of Talpes teaches the method of claim 13, further comprising: 
retrieving the configuration data of dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the location of the external memory via a task manager direct memory access (DMA) of the neural processor circuit (Chung, Par. [0249], “Model loading component 3808 may be implemented with one or computer processors with memory store instructions, or dedicated logic gate arrays implemented, for example, in an FPGA, ASIC, or other similar device”, the model loading component may also be coupled with devices that access memory store instructions, such as a DMAC which is utilized throughout the NNU); and 
storing the configuration data in a fetch queue of the neural processor circuit (Chung, Par. [0249], “Upon switching to a new queue (e.g., having z unprocessed requests therein), model loading component 3808 loads the model associated with that queue into acceleration components 3810…”, thus, the model loading component fetches the data associated with a particular queue); and 
providing the configuration data from the fetch queue to the configuration queue when another configuration data of an executed task is removed from the configuration queue (Chung, Par. [0249], “Upon switching to a new queue (e.g., having z unprocessed requests therein), model loading component 3808 loads the model associated with that queue into acceleration components 3810, and then submits the requests in the queue to acceleration components 3810 for processing based on the loaded new model”, the model loading component is capable of loading models according to the status of each request that is in the queue & the model associated with it).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 15, Henry in view of Chung further in view of Talpes teaches the method of claim 14, further comprising: 
storing, a first priority correspond to a first task queue circuit and a second priority corresponding to a second task queue circuit (Talpes, Par. [0162], “At 1217, a read is dequeued and the corresponding data element(s) are retrieved from memory. In various embodiments, the read corresponds to the next data read in a read queue. In some embodiments, the next read to be dequeued corresponds to the data read that arrives first. For example, the next read is based on the time the data read is queued in the read queue. In some embodiments, the next read is based on the data read with the highest priority.”, therefore, each task queue is associated with a different priority, where the first queue circuit has a first priority and a second queue circuit has a second priority); and 
storing one of (i) the configuration data of the dequeued task or (ii) other configuration data of another task in the fetch queue based on a comparison of the first priority and the second priority (Talpes, Par. [0019], “To address these limitations, a microprocessor system for performing high throughput array computational operations is disclosed. In some embodiments, a microprocessor system includes a hardware arbiter to manage memory requests and is in communication with a control unit and a control queue to synchronize computational operations associated with the memory requests. The hardware arbiter queues memory read requests to retrieve data from memory with variable access latency. Each request is queued until the request is granted access to memory and the request can be serviced. A control queue queues a control operation that corresponds to the memory request and describes a computational operation. The dequeueing of the control operation is synchronized with the availability of the data retrieved via the memory read request. The synchronization allows the data retrieved from memory and the control operation to be synchronized and provided to a computational array together to perform a computational operation.”, therefore, the arbiter is configured to store configuration data of the dequeued task or another configuration data of another task based on comparing first and second priority).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 16, Henry in view of Chung further in view of Talpes teaches the method of claim 11, further comprising providing another portion of the configuration to a data buffer of the neural processor circuit, the other portion of the configuration data programming the data buffer to broadcast a work unit of input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) to the neural engine circuit (Henry, Par. [0102] & Fig. 1, “The execution units 112 include one or more load/store units (not shown) that load data from the memory subsystem 114 and store data to the memory subsystem 114. Preferably, the memory subsystem 114 includes a memory management unit (not shown), which may include, e.g., translation lookaside buffers…”, thus, a data buffer may be included as part of execution unit 112 that is coupled to the memory subsystem 114 & may interact with program memory 129 inside of NNU 121 to send and receive configuration data.)
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 17, Henry in view of Chung further in view of Talpes teaches the method of claim 16, further comprising providing a third portion of the configuration data to a buffer direct memory access (DMA) of the neural processor circuit coupled to the data buffer and the external memory (Chung, Par. [0196], “FIG. 27 shows functionality by which a local host component 2702 may forward information to its local acceleration component 2704 via host interface 2524 shown in FIG. 25 (e.g., using PCIe in conjunction with DMA memory transfer”), the third portion of the configuration data programming the buffer DMA to retrieve a tile of the input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) from the external memory and store the tile in the data buffer, the tile including multiple work units (Chung, Par. [0267] & Figs. 44-45, “Input buffer array 4404 includes input data buffers IPD.sub.0, IPD.sub.1, . . . , IPD.sub.N-1 for receiving N slices Slice.sub.0, Slice.sub.1, . . . , Slices.sub.N-1, respectively, of Ddimensional input data. For simplicity, the remaining discussion will assume that D=3. In an implementation, input data are segmented into horizontal slices of input data, such as graphically illustrated in FIG. 45.”, the figures depict an input buffer that is used to send and retrieve slices of input data across multiple work units. Furthermore, splitting input data into tiles is straightforward in the case of neural networks, and is known in the art).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 18, Henry in view of Chung further in view of Talpes teaches the method of claim 11, wherein: 
the portion of the configuration data programs an input buffer circuit (Henry, Fig. 2, depicts a register 205 and mux-reg 208 which in conjunction hold inputs from the weight ram and data ram respectively) of the neural engine circuit  to provide a portion of input data of the dequeued task (See Claim 1 for Talpes teaching of dequeuing tasks according to priority parameters) stored in the input buffer circuit to a multiply-add (MAD) circuit (Henry, Fig. 2, depicts a multiplier 242 and adder 244 comprising arithmetic logic unit (ALU) 204 located within neural processing unit 126) of the neural engine circuit (Henry, Fig. 2, depicts inputs 203 and 209 which feed from the register 205 and mux-reg 208 into the ALU 204); and 
the portion of the configuration data programs an output circuit of the neural engine circuit to provide output data from the MAD circuit to a data buffer of the neural processor circuit (Henry, Figs. 1 & 2, depicts the output of the multiplier-adder feeding into an accumulator 202, correspondingly into an activation function unit (AFU) 212, and respectively to the data ram and weight ram from here, referencing Fig. 1 data ram 122 and weight ram 124 are able to interact with media registers 118 and exec units 112 outside of the NNU).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Regarding Claim 19, Henry in view of Chung teaches the method of claim 11, further comprising providing another portion of the configuration data to a kernel direct memory access (DMA) coupled to the memory external to the neural processor circuit and the neural engine circuit, the other portion of the configuration data programming the kernel DMA to retrieve kernel data from the memory external to the neural processor circuit and provide the kernel data to the neural engine circuit to execute the task (Henry, Par. [0596], “When the ring bus 4024 responds with an acknowledge (e.g., from system memory) of the data, it is received in buffer 6564. The store queue 6324 then provides the acknowledge to the NNU 121 to notify it that the store has been performed, and the FSM updates the entry 6522 state to available. Preferably, the store queue 6324 does not have to arbitrate to provide the acknowledge to the NNU 121 (e.g., there is a DMAC 6602 for each store queue 6324, as in the embodiment of FIG. 66”, thus, the DMA interacts with the system memory & NNU to retrieve and send kernel data for task execution).
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Claim 20 recites substantially the same limitations as Claim 1 in the form of an integrated circuit (IC) system, therefore it is rejected under the same rationale (Henry, Par. [0566], “Preferably, the multi-core processor and NNU 121 are fabricated on a single integrated circuit.”). 
The reasons of obviousness have been noted in the rejection of Claim 1 above and applicable herein.

Conclusion
 9.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Dally et al. (US PG-PUB 20180046900) disclosed a sparse convolutional neural network accelerator that implements queuing based on first in first out (FIFO).
Lie et al. (US PG-PUB 20190258920) disclosed a deep learning accelerator with a scheduler that implements scheduling policies for priority processing.
Wang et al. (US PG-PUB 20160028635) disclosed a traffic control method that allocates a queue for a plurality of data packets and determines a priority of each queue.
Rossbach et al. (US PG-PUB 20150007182) disclosed an accelerator with a plurality of accelerator tasks which may dequeued according to assigned priority.

10.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

11.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Devika S Maharaj whose telephone number is (571)272-0829. The examiner can normally be reached Monday - Thursday 7:30am - 4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/D.S.M./Examiner, Art Unit 2123                                                                                                                                                                                                        /ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123