DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 15, and 18 have been amended.
Claims 1-8, 15, 16, 18, 19, and 26 have been examined.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 12, 2022 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7, 8, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over US Publication No. 2010/0333075 by Asai et al. (hereinafter referred to as “Asai”) in view of Cell Broadband Engine Programming Handbook (hereinafter referred to as “CBE”). 
Regarding claim 1, Asai discloses:
an apparatus comprising: a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA) (Asai discloses, at ¶ [0011], a processor having a plurality of cores having different ISAs.); 
…instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0013], code units, which are instructions (see ¶ [0011]) that include an identifier indicating which core should execute the code unit.); 
…translate the one or more decoded instructions into one or more instructions or operations of the ISA corresponding to the specified accelerator core, collate the one or more instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet, and issue the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more decoded instructions… (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core, which is interpreted as translating, and also discloses, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core. As disclosed at ¶ [0011], the ISA of the secondary core is different from the ISA of the primary core that is packaging and offloading the code.). 
Asai does not explicitly disclose a fetch circuit to fetch one or more instructions, a decode circuit to decode the one or more fetched instructions; an issue circuit, and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG), a collectives engine (CENG) to perform collective operations, a queue engine (QENG), and a chain management unit (CMU).
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
a fetch circuit to fetch one or more instructions, a decode circuit to decode the one or more fetched instructions, and an issue circuit (CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue.);
and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG) (CBE discloses, at p. 75, a direct memory access controller (DMAC), which is interpreted as a memory engine.), 
a collectives engine (CENG) to perform collective operations (CBE discloses, at p. 72, a memory flow controller (MFC), which supports synchronization between synergistic processing elements (SPEs) and is considered a collectives engine). As disclosed at p. 518, the MFC uses commands including barrier, which is a collective operation.), 
a queue engine (QENG) (CBE discloses, at p. 73, a SPE command queue, which is considered a queue engine.), and 
a chain management unit (CMU) (CBE discloses, at p. 74, the MFC uses software to use tag group IDs to group commands, which is considered a chain management unit.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 2, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein each of the plurality of accelerator cores is memory- mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core .
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein each of the plurality of accelerator cores is memory- mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 3, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai also discloses:
an execution circuit (Asai discloses, at ¶ [0010], a primary core that executes code, and is therefore an execution circuit.); and
…another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).).
Asai does not explicitly disclose wherein the fetch circuit further fetches the other instuction, wherein the one or more instructions specifying the one accelerator core are non- blocking, wherein the decode circuit is further to decode the other fetched instruction; and wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
the fetch circuit fetches instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.); 
non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.), 
the decode circuit decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.); and 
executing decoded instructions without awaiting completion of execution of other instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 4, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dual_read_write, Dual_write_write, Dual_xchg_read, Dual_xchg_write, Dual_cmpxchg_read, Dual_cmpxchg_write, Dual_compare&read_read, and Dual_compare&read_write.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dual_read_write, Dual_write_write, Dual_xchg_read, Dual_xchg_write, Dual_cmpxchg_read, Dual_cmpxchg_write, Dual_compare&read_read, and Dual_compare&read_write (CBE discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual write write instruction.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 7, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO).
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 8, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 15, Asai discloses:
a system comprising: a memory; a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA) (Asai discloses, at ¶ [0011], a processor (which encompasses a memory, having a plurality of cores having different ISAs.);  
…one or more instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0013], code units, which are instructions (see ¶ [0011]) that include an identifier indicating which core should execute the code unit.);  
…translating the one or more decoded instructions into one or more instructions or operations of the ISA corresponding to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more decoded instructions; …collating the one or more translated instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet; and…issuing the instruction packet to the specified accelerator core (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core, which is interpreted as translating, and also discloses, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core. As disclosed at ¶ [0011], the ISA of the secondary core is different from the ISA of the primary core that is packaging and offloading the code.). 
Asai does not explicitly disclose fetch circuitry to fetch, decode circuitry to decode the one or more fetched instructions, circuitry to translate, circuitry to collate, circuitry to issue, and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG), a collectives engine (CENG) to perform collective operations, a queue engine (QENG), and a chain management unit (CMU).
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
fetch circuitry to fetch, decode circuitry to decode the one or more fetched instructions, means for translating, circuitry to translate, circuitry to collate, circuitry to issue (CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue.); and
and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG) (CBE discloses, at p. 75, a direct memory access controller (DMAC), which is interpreted as a memory engine.), 
a collectives engine (CENG) to perform collective operations (CBE discloses, at p. 72, a memory flow controller (MFC), which supports synchronization between synergistic processing elements (SPEs) and is considered a collectives engine). As disclosed at p. 518, the MFC uses commands including barrier, which is a collective operation.), 
a queue engine (QENG) (CBE discloses, at p. 73, a SPE command queue, which is considered a queue engine.), and 
a chain management unit (CMU) (CBE discloses, at p. 74, the MFC uses software to use tag group IDs to group commands, which is considered a chain management unit.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Regarding claim 18, Asai discloses:
a method of executing instructions using an execution circuit and a plurality of accelerator cores each having a corresponding instruction set architecture (ISA), the method comprising: (Asai discloses, at ¶ [0011], a processor having a plurality of cores having different ISAs.);
…one or more instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0013], code units, which are instructions (see ¶ [0011]) that include an identifier indicating which core should execute the code unit.);  
…translating…the one or more decoded instructions into one or more instructions or operations of the ISA corresponding to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more decoded instructions; collating…the one or more translated instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet; and issuing the instruction packet to the specified accelerator core (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core, which is interpreted as translating, and also discloses, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core. As disclosed at ¶ [0011], the ISA of the secondary core is different from the ISA of the primary core that is packaging and offloading the code.).
Asai does not explicitly disclose a fetch circuit, decoding, using a decode circuit; an issue circuit performs the aforementioned translating and collating, and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG), a collectives engine (CENG) to perform collective operations, a queue engine (QENG), and a chain management unit (CMU).
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
a fetch circuit to fetch one or more instructions, a decode circuit to decode the one or more fetched instructions, and an issue circuit (CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue.);
and wherein the plurality of accelerator cores comprise separate components for a memory engine (MENG) (CBE discloses, at p. 75, a direct memory access controller (DMAC), which is interpreted as a memory engine.), 
a collectives engine (CENG) to perform collective operations (CBE discloses, at p. 72, a memory flow controller (MFC), which supports synchronization between synergistic processing elements (SPEs) and is considered a collectives engine). As disclosed at p. 518, the MFC uses commands including barrier, which is a collective operation.), 
a queue engine (QENG) (CBE discloses, at p. 73, a SPE command queue, which is considered a queue engine.), and 
a chain management unit (CMU) (CBE discloses, at p. 74, the MFC uses software to use tag group IDs to group commands, which is considered a chain management unit.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.

Claims 5, 6, 16, 19, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Asai in view of CBE in view of MPI: A Message-Passing Interface Standard (hereinafter referred to as “MPI”). 
Regarding claim 5, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination, an arithmetic operation, and a block size, wherein the MENG copies a block of data according to the block size from the specified source to the specified destination, and wherein the MENG further performs the arithmetic operation on each datum of the data block before copying resulting datum to the specified destination.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination,…and a block size, wherein the MENG copies a block of data according to the block size from the specified source to the specified destination (CBE discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.
Also, in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the DMA instruction taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. When CBE, as modified by MPI, is combined with the architecture of Asai, all limitations of claim 5 are disclosed.

Regarding claim 6, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations.
However, in the same field of endeavor (e.g., coprocessors) MPI discloses:
wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 16, Asai, as modified, discloses the elements of claim 15, as discussed above. Asai also discloses:
…fetching another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).). 
Asai does not explicitly disclose wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core; wherein the fetch circuitry fetches instructions; wherein the one or more instructions specifying the one accelerator core are non- blocking;  wherein the decode circuitry decodes fetched instructions; wherein an execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet; wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write; wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination, an arithmetic operation, and a block size, wherein the MENG copies a block of data according to the block size from the specified source to the specified destination, and wherein the MENG further performs the arithmetic operation on each datum of the data block before copying resulting datum to the specified destination; wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations; wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO); and wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.); 
wherein the fetch circuitry fetches instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.);
 wherein the decode circuitry decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.); 
wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write (CBE discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual write write instruction.); 
wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination…wherein the MENG copies a block of data according to the block size from the specified source to the specified destination… (CBE discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.); 
wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.); and 
wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.
Also, in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.);
wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 19, Asai, as modified, discloses the elements of claim 18, as discussed above. Asai also discloses:
…fetching another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).). 
Asai does not explicitly disclose wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core; wherein the fetch circuit further fetches instructions; wherein the one or more instructions specifying the one accelerator core are non- blocking;  wherein the decode circuit is further to decode the other fetched instruction; wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet; wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write; wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination, an arithmetic operation, and a block size, wherein the MENG copies a block of data according to the block size from the specified source to the specified destination, and wherein the MENG further performs the arithmetic operation on each datum of the data block before copying resulting datum to the specified destination; wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations; wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO); and wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.); 
wherein the fetch circuit fetches instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.);
 wherein the decode circuit decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.); 
wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write (CBE discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual write write instruction.); 
wherein the ISA corresponding to the MENG includes a direct memory access (DMA) instruction specifying a source, a destination…wherein the MENG copies a block of data according to the block size from the specified source to the specified destination… (CBE discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.); 
wherein the QENG comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.); and 
wherein a subset of the one or more instructions is part of a chain, and wherein the CMU stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.
Also, in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.);
wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 26, Asai, as modified, discloses the elements of claim 1, as discussed above. Asai does not explicitly disclose wherein the CENG includes circuitry to maintain one or more state machines to machine execution of the collective operations including one or more of reduction, broadcast, gather, and scatter operations.
However, in the same field of endeavor (e.g., accelerators) CBE discloses:
wherein the CENG includes circuitry… CBE discloses, at p. 72, a memory flow controller (MFC), which discloses circuitry.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by Asai to include the circuitry and engines disclosed by CBE in order to improve performance by implementing support for a broad range of applications through the use of specialized heterogeneous cores. See CBE, pp. 40-41.
Also in the same field of endeavor (e.g., coprocessors) MPI discloses:
maintaining one or more state machines to machine execution of the collective operations including one or more of reduction, broadcast, gather, and scatter operations (MPI discloses, at p. 141, collective operations including reduction, broadcast, gather, and scatter operations. The execution of these instructions discloses performing the various determinations and operations that make up state machines to perform the functions of the instructions.).

Response to Arguments
On page 12 of the response filed January 12, 2022 (“response”), the Applicant argues, “the combination does not at least describe "an issue circuit to translate the one or more decoded instructions into one or more instructions or operations of the ISA corresponding to the specified accelerator core, collate the one or more instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet, and issue the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more decoded instructions."” In support of this position, the Applicant presents a number of arguments, starting with, “First, it appears the Office admits that Asai does not perform any translation between different ISAs.” 
Though fully considered, the Examiner respectfully disagrees. The previous office action dated July 13, 2021 noted that the claims did not recite translating between different ISAs, but noted that such translation is extremely well known. The office action did not make the admission alleged by the Applicant.

On page 12 of the response the Applicant argues, “Further, Asai, as noted previously performs no ISA translation of any sort.” 
Though fully considered, the Examiner respectfully disagrees. As discussed previously, Asai discloses a primary core which is has a first ISA packaging code for a secondary core that has a second ISA. As disclosed at ¶ [0024] of Asai, “Packaging the code unit comprises transforming instructions anddata in the code unit to conform to the ISA of the secondary core.” The Examiner maintains that this discloses the claimed translating. As previously noted, the claims recite translating at an extremely high level. The plain meaning of translating, as used in the claims, reads on the transforming instructions and data disclosed by Asai.

On page 12 of the response the Applicant argues, “how is anything "platform-independent" translated? Platform-independent means that there is no underlying code that is "dependent" upon where it gets executed.” 
Though fully considered, the Examiner respectfully disagrees. As noted, Asai discloses that each of a plurality of cores has its own ISA. Asai also discloses packaging the code to conform to those ISAs. Therefore, the packaging the code is dependent on the ISA of the core on which the code will be executed. 

On pages 12-13 of the response the Applicant argues, “Asai's "primary interpreter" interprets "platform- independent code 121" (which comprises a plurality of code units as detailed in paragraph [0011]) by simply figuring out which core "code unit 124" is to be executed on. In paragraph [0013] this determination is made based on an identifier. In paragraph [0014], this determination is made based on the capabilities of the secondary core which may including using a LUT have capabilities and ISA information for each secondary core. There is no translation during this interpreting. Asai's interpreter is akin to a physical mail sorter ... it just figures out a destination. It does not open up the mail and then translate it to a different language dependent upon the figured out destination. For what it's worth, Applicant's understanding is consistent with the claims of Asai too which Applicant believes lends credence to Applicant's understanding of what Asai thought he was describing. In paragraph [0015], this "primary interpreter" packages the code unit 123. In this paragraph, the primary interpreter 105 "may take into account data alignment, memory alignment, byte order, parameter passing mechanism, stack alignment, pointer size, etc." Applicant is still confused as to why that is "translating" from one ISA to another which is what the claim requires. However, Applicant has amended the claim language to make this clearer. However, Applicant requests that the Office find support, in any reference, that "data alignment, memory alignment, byte order, parameter passing mechanism, stack alignment, pointer size" mean taking an instruction from one ISA and converting or translating into one or more instructions or operations of a second ISA. Rather, those are all considerations one may take into account for bytecode.”
Though fully considered, the Examiner respectfully disagrees. As noted, Asai’s interpreter is not limited to determining which core to select and forwarding the code to that core. Asai’s interpreter also packages the code based on the ISA of that core. Paragraph 15 gives examples of what may be involved in such packaging, and paragraph 24 explicitly defines such packaging as transforming instructions and data in the code to conform to the ISA of that core. The Examiner disagrees that this is akin to sorting the mail or just figuring out a destination.  

On page 14 of the response the Applicant argues, “there is no rationale for combining the references. Asai states that in some embodiments its processor is a Cell processor and CBE describes the Cell processor. It is established that there can be no rationale for combining when the references teach the same thing. Applicant disagrees with the Office Action's statement that "the features disclosed by CBE does not mean those features are inherent in all cell processors." The Cell processor is a trademarked name for an old IBM processor and the citations from CBE are core aspects of the Cell processor. If the Office wants to assert that those aspects are not core (and therefore not inherent), the Office is required to prove it. Would it help for Applicant to submit IEEE or other articles describing the Cell processor or its first use case the PS3? The motivation to combine cannot be already found in the primary reference (see, e.g., Ex parte Gama and Ex parte Hansen). In this case, it is not disputed that Asai discusses the cell processor which CBE describes. How could Asai be modified to already use what it talks about?” 
Though fully considered, the Examiner respectfully disagrees. Despite the Applicant’s assertion that the features for which CBE is cited are inherent in Asai, the Examiner again notes that the core processor need not even be a Cell processor. That is just one example. Asai explicitly states that “embodiments may be implemented in other multi-core processors.” See Asai, ¶ [0009]. Since Asai is not limited to using a Cell processor, it cannot be said that the features described in CBE are inherent. However, the features are known, and it would have been obvious to utilize such features in the system disclosed by Asai for reasons given above.  

On page 15 of the response the Applicant argues the remaining claims are allowable for similar reasons as those addressed above and due to dependency.
Though fully considered, the Examiner respectfully disagrees. The remarks and rejections presented above apply similarly to these claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/Primary Examiner, Art Unit 2183