DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s response and amendment on 10/21/2021 have been fully considered but are not persuasive. The rejections in the Non-Final Action are maintained. All previous minor claim objections have been corrected by applicant in the same response. Burns et al. 20100229172 was used to supplement the teaching of the reorder buffer to schedule the command packets based on an original sequence of the command packets at each of the task queues and the AFU to return the executed command packets to the core based on the original sequence. (See Page 9 of the Non-Final Action on 8/26/2021). No specific argument regarding Burn can be found in applicant’s response. This action addresses all claims including amended claims and the examiner’s response to applicant’s remarks.
In the remarks, applicant argued that:1) As now recited in claims 1 and 11, the acceleration interface instruction is transmitted after the core generates the new command packets and pushes them into the corresponding task queues, which is not disclosed by Chang and Qi.
Examiner’s Response:
As to applicant’s remark 1) above, applicant is referring to the amended feature of:
“wherein the acceleration interface instruction is transmitted after the core generates the new command packets and pushes them into the corresponding task queues.” (see amended claim 1)

“wherein the acceleration interface instruction [one of the TLVS_CMDs] is transmitted (e.g. by the remote DMA through the arbitrator) after the core [one of the Core_1, Core_2, Core_3, Core_4] generates the new command packets [TLVS_CMD from Core_1] [TLVS_CMD from Core_2] [TLVS_CMD from Core_3] [TLVS_CMD from Core_4]  and pushes them into the corresponding task queues.[CQ1, CQ2, CQ3, CQ4] ” 
(see fig.7, para [0051], after the commands are generated from the corresponding cores and stored in the corresponding queues, the remote DMA function of the control circuit 122 (e.g., device's micro-processor) can get one compound command frame TLVS_CMD from one of the command queues CQ1-CQ4 in the storage device 114 through an arbitrator 702)
Applicant is reminded that claim 1 (and claim 11) although amended with a partial feature of the objected dependent claim 2 (and claim 12), it does not include all the limitations of the objected dependent claim 2 (and claim 12) as indicated in the Non-Final Action on 8/26/2021. Upon further review and consideration, Chang teaches the partial feature as indicated by examiner above, but does not teach all the features of the objected dependent claim 2.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. 20160232111 in view of Qi et al. (or “WANG ZHENG” as cited in 892 Form. Qi is the name of one of the inventors and is used in this action) CN107423135A, published 12/1/2017 (English translation of the Abstract and Description included).
As to amended claim 1, Chang teaches a processing system for dispatching tasks [commands] (see the system in fig.7, [0051], see also the algorithm for fetching, decoding, and executing the current command in fig.6, [0038-0048]), comprising: 
at least one core (see of the [Core_1] – [Core_4] in fig.7), used to run a plurality of processes [TLVS_CMD] and develop at least one task queue [CQ1]-[CQ4] corresponding to each of the processes [TLVS_CMD], 
wherein the core (see of the [Core_1] – [Core_4] in fig.7) generates several command packets [TLVS_CMD] and pushes them into the corresponding task queues [CQ1]-[CQ4] (see [0051], the four command queues CQ1-CQ4 allocated for the processor cores Core_1-Core_4, respectively); 

an acceleration interface [Arbiter 702], arranged between the core [112] and the AFU [control circuit 122] to receive an acceleration interface instruction [TLVS_CMD] from the core [112] (see [0051], the remote DMA function of the control circuit 122 (e.g., device's micro-processor) can get one compound command frame TLVS_CMD from one of the command queues CQ1-CQ4 in the storage device 114 through an arbitrator 702), and 
wherein the acceleration interface instruction [one of the TLVS_CMDs] is transmitted (e.g. by the remote DMA through the arbitrator) after the core [one of the Core_1, Core_2, Core_3, Core_4] generates the new command packets [TLVS_CMD from Core_1] [TLVS_CMD from Core_2] [TLVS_CMD from Core_3] [TLVS_CMD from Core_4]  and pushes them into the corresponding task queues [CQ1, CQ2, CQ3, CQ4].
(see fig.7, para [0051], after the commands are stored in the corresponding queues, the remote DMA function of the control circuit 122 (e.g., device's micro-processor) can get one compound command frame TLVS_CMD from one of the command queues CQ1-CQ4 in the storage device 114 through an arbitrator 702)
Chang does not but Qi teaches:
establish (e.g. by modify the bitmap synchronously) a bit map [P bitmap 542, Q bitmap 543] based on the acceleration interface instruction [instruction packet] (see after command decoder interprets the instruction packet, the interpreted microinstructions are placed in each 
the bit map [P bitmap 542, Q bitmap 543] is used to indicate (e.g. synchronized with the microinstructions placed in each command queue) which task queue [command queue] contains the command packets [microinstructions] that have been generated (see fig.5, Page 4, the command decoder 530 interprets the instruction packet and performs a permission check in the permission table 541. After the command decoder 530 interprets the instruction packet as a plurality of micro instructions, the interpreted plurality of microinstructions are placed in each command queue and the P bitmap 542 and the Q bitmap 543 are modified synchronously).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the clamed invention to establish a bit map based on the acceleration interface instruction, wherein the bit map is used to indicate which task queue contains the command packets that have been generated, as claimed (see the details of the claim mapping above), because one of ordinary skill in the art should be able to recognize the application of a known technique, such as the synchronously modified bitmap with the microinstructions placed in each command queue of Qi, to a known device/method, such as the plurality of the processing cores with respective task queues for fetching, and executing the command of Chang, for the purpose of dispatching the corresponding command queue to the hardware accelerator according to the bitmap (See Qi, Page 5, second paragraph from the bottom.  MPEP 2143 KSR Example D).

running a plurality of processes [TLVS_CMD] and developing at least one task queue [CQ1]-[CQ4] corresponding to each of the processes [TLVS_CMD] (see fig.7 for queues and the corresponding commands; see also the algorithm for fetching, decoding, and executing the current command in fig.6, [0038-0048]); 
generating several command packets [TLVS_CMD] and pushing them into the corresponding task queues [CQ1]-[CQ4] (see [0051], the four command queues CQ1-CQ4 allocated for the processor cores Core_1-Core_4, respectively); 
receiving an acceleration interface instruction [TLVS_CMD] about the task queue from the core [112] (see [0051], the remote DMA function of the control circuit 122 (e.g., device's micro-processor) can get one compound command frame TLVS_CMD from one of the command queues CQ1-CQ4 in the storage device 114 through an arbitrator 702; the command queues receive the commands from the cores, [0051]); and 
wherein the acceleration interface instruction [one of the TLVS_CMDs] is transmitted (e.g. by the remote DMA through the arbitrator) after the core [one of the Core_1, Core_2, Core_3, Core_4] generates the new command packets [TLVS_CMD from Core_1] [TLVS_CMD   and pushes them into the corresponding task queues [CQ1, CQ2, CQ3, CQ4].
(see fig.7, para [0051], after the commands are stored in the corresponding queues, the remote DMA function of the control circuit 122 (e.g., device's micro-processor) can get one compound command frame TLVS_CMD from one of the command queues CQ1-CQ4 in the storage device 114 through an arbitrator 702)
Chang does not but Qi teaches:
establish (e.g. by modify the bitmap synchronously) a bit map [P bitmap 542, Q bitmap 543] based on the acceleration interface instruction [instruction packet] (see after command decoder interprets the instruction packet, the interpreted microinstructions are placed in each command queue and the P bitmap 542 and the Q bitmap 543 are modified synchronously in Page 4, second paragraph from the bottom), wherein 
the bit map [P bitmap 542, Q bitmap 543] is used to indicate (e.g. synchronized with the microinstructions placed in each command queue) which task queue [command queue] contains the command packets [microinstructions] that have been generated (see fig.5, Page 4, the command decoder 530 interprets the instruction packet and performs a permission check in the permission table 541. After the command decoder 530 interprets the instruction packet as a plurality of micro instructions, the interpreted plurality of microinstructions are placed in each command queue and the P bitmap 542 and the Q bitmap 543 are modified synchronously), as claimed.
.
Claims 5, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. 20160232111 in view of Qi et al.  (or “WANG ZHENG” as cited in 892 Form. Qi is the name of one of the inventors and is used in this action) CN107423135A, published 12/1/2017 (English translation of the Abstract and Description included), as applied to claims 1, 11 above, and in further view of Burns et al. 20100229172.
As to claim 5, neither Chang nor Qi but Burns teaches wherein each of the task queues [ROB 464] has a reorder buffer [ROB 464] to schedule the command packets [instructions] based on an original sequence [original program order] of the command packets [instructions] at each of the task queues [ROB] (see the retirement logic of the processing core 430 reorders the instructions, executed in an out-of-order manner, back to the original program order, and the execution core may include more than one reorder buffer 464 in [0050]), and the AFU [core 430: retirement logic] returns the executed command packets [instructions] to the core based on the original sequence. (see the execution core 430 may include retirement logic reorders the instructions, executed in an out-of-order manner, back to the original program order, and the execution core may include more than one reorder buffer 464 in [0050]), as claimed. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include wherein each of the task queues has a reorder buffer to schedule the command packets based on an original sequence of the command packets at each of the task queues, and the AFU returns the executed command packets to the core based 
As to claim 15, claim 15 includes and corresponds to claim 5, and is rejected under the same reason as set forth in claim 5 above.
Allowable Subject Matter
Claims  2, 3, 4, 6,7, 8,9,10,12,13,14,16,17, 18,19,20  are  objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
None of the prior art of record teaches: a) The accelerator interface front end receiving and decoding the acceleration interface instruction to set the bit map; the accelerator interface back end updating an active list based on the bit map, the accelerator interface back end selects one of the command packets from one of the task queues based on the active list, and dispatches the selected command packet to the corresponding AFU for execution.  (Claim 2. See also similarly recited claim 12)
b) The time at which each of the command packets is executed by the AFU is different. (Claims 6, 16)
c) The ROB is arranged at a SRAM of the acceleration interface. (Claims 7, 17)

e) The ROB further includes a release indicator and a return indicator, the release indicator is used to indicate the next command packet to be dispatched to the AFU for execution, and the return indicator is used to indicate the next command packet to be returned to the core. (Claims 9, 19)
The prior art made of record cited in the previous action and not relied upon is considered pertinent to applicant's disclosure.  
a)  Duran et al. 20160283278 is cited for the teaching of a plurality of cores, each has a buffer and a thread mapping unit (see fig.1, [0031][0032]) and the teaching of a pipeline scheduling (also known as a dispatch or issue) in [0060].
b) Lahteenmaki 20150205614 is cited for the plurality of task queues for optimizing the tasks scheduling and execution of the threads/tasks by the processing cores (see fig.3 [0099],  each queue of the Q1 and Q2 has a group of threads; see each group of threads in a queue as a process).
c) Matsuzaki et al. 20080077928   is cited for the teaching of registering the task into the task information table which is referenced by a task management section receiving a request for execute a new task from a scheduler. (See [0104, [0105])
d) Qi et al. 10331494 is cited for the teaching of the bitmap of the command and queue (see fig.5, col.9, lines 61-67, col.10, lines 1-3). The published Chinese patent CN107423135A (12/1/2017) is being used in the rejection to claims 1, 11 in this action.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL H PAN whose telephone number is (571)272-4172. The examiner can normally be reached M-F 8:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571 270 3995. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


DANIEL H. PAN
Examiner
Art Unit 2182



/DANIEL H PAN/Primary Examiner, Art Unit 2182