DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1, 4-8, 15, 16, 18, 19, and 26 have been amended.
Claims 1-8, 15, 16, 18, 19, and 26 have been examined.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 3, 15, 16, and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 2 recites, at lines 2-3, “the one or more instructions.” The antecedent basis for this limitation is unclear. Claim 1 recites both “one or more machine level instructions specifying...” and “one or more machine level instructions or operations of the ISA.” The limitation in question doesn’t exactly match either of the possible antecedents. For purposes of examination, this limitation is interpreted as referring back to the former, i.e., as though the limitation read, “the one or more machine level instructions specifying....” Claims 3, 15, 16, and 19 include similar issues and are similarly rejected.
Claim 16 recites, at lines 20 and 22, “the MENG.” As there are multiple possible antecedent bases for this limitation, i.e., at line 13 and again at line 18, the claim is indefinite. For purposes of examination, the claim is interpreted as though there were only one possible antecedent basis.
Claim 19 recites, at line 10, “the decode circuit.” The antecedent basis for this limitation was removed by amendment, rendering the claim indefinite. For purposes of examination, the claim is interpreted as though the limitation recited “a decode circuit.”
Claim 19 recites, at line 13, “the MENG.” There is insufficient antecedent basis for this limitation. For purposes of examination, the claim is interpreted as though there were proper antecedent basis.
Claim 19 recites, at lines 20 and 22, “the MENG.” As there are multiple possible antecedent bases for this limitation, i.e., at line 13 and again at line 18, the claim is indefinite. For purposes of examination, the claim is interpreted as though there were only one possible antecedent basis.
Claim 16 is rejected as depending from a rejected base claim and failing to cure the indefiniteness of that base claim.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7, 8, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Cell Broadband Engine Programming Handbook (hereinafter referred to as “CBE”) in view of US Publication No. 2010/0333075 by Asai et al. (hereinafter referred to as “Asai”) in view of US Patent No. 7,734,895 by Agarwal et al. (previously cited in the non-final office action mailed May 20, 2022 and hereinafter referred to as “Agarwal”). 
Regarding claim 1, CBE discloses:
an apparatus comprising: a plurality of accelerator cores... (CBE discloses, at p. 40, a processor having a plurality of cores, i.e., accelerator cores.); 
a fetch circuit to fetch one or more machine-level instructions specifying one of the accelerator cores; a decode circuit to decode the one or more fetched machine-level instructions; and an issue circuit... (CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue of machine-level instructions. As disclosed at p. 122, the instructions specify one or more of the cores using memory-mapped I/O.)
CBE does not explicitly disclose the aforementioned cores each having a corresponding instruction set architecture (ISA) and that the aforementioned circuitry is to: translate the one or more machine-level instructions specifying one of the accelerator cores into one or more machine-level instructions or operations of the ISA corresponding to the specified accelerator core, collate the one or more machine-level instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet, and issue the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more instructions specifying one of the accelerator cores.
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
cores each having a corresponding instruction set architecture (ISA) (Asai discloses, at ¶ [0011], a processor having a plurality of cores having different ISAs.); and
collating...instructions...into an instruction packet, and issuing the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core and, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to using different ISAs taught by Asai because using different cores optimized for different tasks is a well-known, or typical, method of improving performance. See, e.g., Asai, ¶ [0002].
Also in the same field of endeavor (e.g., multicore processing) Agarwal discloses:
translating one or more machine-level instructions into one or more machine-level instructions or operations of another ISA (Agarwal discloses, at col. 15, line 17 to col. 16, line 2, translating machine level instructions from one ISA to another. The specific example discussed is translating x86 binary, i.e., machine code, into instructions of the Raw ISA, though Agarwal discloses that the techniques are applicable to multiple different ISAs.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to translating to different ISAs taught by Agarwal to improve performance by increasing compatibility between different systems, such as legacy systems. See, e.g., Agarwal, col. 6, lines 31-36.

Regarding claim 2, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
wherein each of the plurality of accelerator cores is memory- mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.).

Regarding claim 3, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
an execution circuit wherein the fetch circuit fetches... [instructions] (CBE discloses, at p. 52, an instruction unit that performs execution, fetch, and decode.); 
...non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.), 
the decode circuit decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.); and 
executing decoded instructions without awaiting completion of execution of other instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.).
CBE does not explicitly disclose another instruction not specifying any accelerator core.
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
…another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to platform independent code taught by Asai because doing so can improve performance by increasing flexibility regarding where code is executed. See, e.g., Asai, ¶ [0010].

Regarding claim 4, CBE discloses:
wherein the ISA corresponding to a memory engine (MENG) accelerator includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dual_read_write, Dual_write_write, Dual_xchg_read, Dual_xchg_write, Dual_cmpxchg_read, Dual_cmpxchg_write, Dual_compare&read_read, and Dual_compare&read_write (CBE discloses, at p. 75, a direct memory access controller (DMAC), which is interpreted as a memory engine. CBE also discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual write write instruction.).

Regarding claim 7, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
wherein a queue engine (QENG) accelerator comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.).

Regarding claim 8, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
wherein a subset of the one or more instructions is part of a chain, and wherein a chain management unit (CMU) accelerator stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).

Regarding claim 15, CBE discloses:
a system comprising: a memory; a plurality of accelerator cores... (CBE discloses, at p. 40, a processor, which discloses memory, having a plurality of cores, i.e., accelerator cores.); 
fetch circuitry to fetch one or more machine-level instructions specifying one of the accelerator cores; a decode circuit to decode the one or more fetched machine-level instructions; and an issue circuit... (CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue of machine-level instructions. As disclosed at p. 122, the instructions specify one or more of the cores using memory-mapped I/O.). 
CBE does not explicitly disclose the aforementioned cores each having a corresponding instruction set architecture (ISA) and that the aforementioned circuitry is to: translate the one or more instructions specifying one of the accelerator cores into one or more machine-level instructions or operations of the ISA corresponding to the specified accelerator core, collate the one or more machine-level instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet, and issue the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more instructions specifying one of the accelerator cores.
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
cores each having a corresponding instruction set architecture (ISA) (Asai discloses, at ¶ [0011], a processor having a plurality of cores having different ISAs.); and
collating...instructions...into an instruction packet, and issuing the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core and, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to using different ISAs taught by Asai because using different cores optimized for different tasks is a well-known, or typical, method of improving performance. See, e.g., Asai, ¶ [0002].
Also in the same field of endeavor (e.g., multicore processing) Agarwal discloses:
translating one or more machine-level instructions into one or more machine-level instructions or operations of another ISA (Agarwal discloses, at col. 15, line 17 to col. 16, line 2, translating machine level instructions from one ISA to another. The specific example discussed is translating x86 binary, i.e., machine code, into instructions of the Raw ISA, though Agarwal discloses that the techniques are applicable to multiple different ISAs.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to translating to different ISAs taught by Agarwal to improve performance by increasing compatibility between different systems, such as legacy systems. See, e.g., Agarwal, col. 6, lines 31-36.

Regarding claim 18, CBE discloses:
a method of executing instructions using an execution circuit and a plurality of accelerator cores ... (CBE discloses, at p. 40, a processor, which executing instructions using execution circuitry, having a plurality of cores, i.e., accelerator cores.); 
fetching, by a fetch circuit, one or more machine-level instructions specifying one of the accelerator cores...an issue circuit...(CBE discloses, at p. 52, an instruction unit that performs fetch, decode, and issue of machine-level instructions. As disclosed at p. 122, the instructions specify one or more of the cores using memory-mapped I/O.).
CBE does not explicitly disclose the aforementioned cores each having a corresponding instruction set architecture (ISA) and translating, using an issue circuit, the one or more machine-level instructions instructions specifying one of the accelerator cores into one or more machine- level instructions or operations of the ISA corresponding to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more machine-level instructions; collating, by the issue circuit, the one or more translated machine-level instructions or operations of the ISA corresponding to the specified accelerator core into an instruction packet; and issuing the instruction packet to the specified accelerator core.
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
cores each having a corresponding instruction set architecture (ISA) (Asai discloses, at ¶ [0011], a processor having a plurality of cores having different ISAs.); and
collating...instructions...into an instruction packet, and issuing the instruction packet to the specified accelerator core, wherein the ISA corresponding to the specified accelerator core is different than the ISA of the one or more instructions specifying one of the accelerator cores (Asai discloses, at ¶ [0015], an interpreter packaging (collating) a code unit (instructions) according to the ISA of the secondary core and, at ¶ [0016], offloading (issuing) the packaged code unit to the secondary core.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to using different ISAs taught by Asai because using different cores optimized for different tasks is a well-known, or typical, method of improving performance. See, e.g., Asai, ¶ [0002].
Also in the same field of endeavor (e.g., multicore processing) Agarwal discloses:
translating one or more machine-level instructions into one or more machine-level instructions or operations of another ISA (Agarwal discloses, at col. 15, line 17 to col. 16, line 2, translating machine level instructions from one ISA to another. The specific example discussed is translating x86 binary, i.e., machine code, into instructions of the Raw ISA, though Agarwal discloses that the techniques are applicable to multiple different ISAs.). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to translating to different ISAs taught by Agarwal to improve performance by increasing compatibility between different systems, such as legacy systems. See, e.g., Agarwal, col. 6, lines 31-36.

Claims 5, 6, 16, 19, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over CBE in view of Asai in view of Agarwal in view of MPI: A Message-Passing Interface Standard (hereinafter referred to as “MPI”). 
Regarding claim 5, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
wherein the ISA corresponding to a memory engine (MENG) accelerator includes a direct memory access (DMA) instruction specifying a source, a destination,…and a block size, wherein the MENG copies a block of data according to the block size from the specified source to the specified destination (CBE discloses, at p. 75, a direct memory access controller (DMAC), which is interpreted as a memory engine. CBE also discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.).
CBE does not explicitly disclose an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination.
However, in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the DMA instruction taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. 

Regarding claim 6, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
a collectives engine (CENG) accelerator to perform collective operations (CBE discloses, at p. 72, a memory flow controller (MFC), which supports synchronization between synergistic processing elements (SPEs) and is considered a collectives engine. As disclosed at p. 518, the MFC uses commands including barrier, which is a collective operation.)
CBE does not explicitly disclose wherein the ISA corresponding to the CENG accelerator includes including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations.
However, in the same field of endeavor (e.g., coprocessors) MPI discloses:
collective operations including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 16, CBE, as modified, discloses the elements of claim 15, as discussed above. CBE also discloses:
wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.); 
wherein the fetch circuitry fetches instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.);
 wherein the decode circuitry decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.); 
wherein the ISA corresponding to a memory engine (MENG) accelerator includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write (CBE discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual-write write instruction.); 
wherein the ISA corresponding to a memory engine (MENG) accelerator includes a direct memory access (DMA) instruction specifying a source, a destination…wherein the MENG copies a block of data according to the block size from the specified source to the specified destination… (CBE discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.); 
...wherein a queue engine (QENG) accelerator comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.); and 
wherein a subset of the one or more instructions is part of a chain, and wherein a chain management unit (CMU) accelerator stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).
CBE does not explicitly disclose fetching another instruction not specifying any accelerator core , an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination and wherein the ISA corresponding to a collective engine (CENG) accelerator includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
…another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to platform independent code taught by Asai because doing so can improve performance by increasing flexibility regarding where code is executed. See, e.g., Asai, ¶ [0010].
Also in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.);
wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 19, CBE, as modified, discloses the elements of claim 18, as discussed above. CBE also discloses:
wherein each of the plurality of accelerator cores is memory-mapped to an address range, and wherein the one or more instructions are memory-mapped input/output (MMIO) instructions having an address to specify the one accelerator core (CBE discloses, at p. 122 et seq, the cores are mapped to address ranges which are accessed by instructions, as discloses at p. 121.); 
wherein the fetch circuit fetches instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
non- blocking instructions (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.);
 wherein the decode circuit decodes fetched instructions (CBE discloses, at p. 52, an instruction unit that performs fetch and decode.);
wherein the execution circuit is to execute the decoded other instruction without awaiting completion of execution of the instruction packet (CBE discloses, at p. 447, executing non-blocking instructions and that doing so does not introduce delay.); 
wherein the ISA corresponding to the MENG includes dual-memory instructions, each of the dual-memory instructions comprising one of Dual_read_read, Dualreadwrite, Dual_write_write, Dual_xchgread, Dual_xchgwrite, Dualcmpxchgread, Dualcmpxchg_write, Dualcompare&read_read, and Dual_compare&read_write (CBE discloses, at p. 598, the SPE ISA includes a putllc command that is a conditional store that writes to an address and updates a bit, as disclosed at p. 599. This corresponds to a dual write write instruction.); 
wherein the ISA corresponding to a memory engine (MENG) accelerator includes a direct memory access (DMA) instruction specifying a source, a destination…wherein the MENG copies a block of data according to the block size from the specified source to the specified destination… (CBE discloses, at p. 516, DMA commands that specify a source, destination, and size, and copy data of the particular size from the source to the destination.); 
wherein a queue engine (QENG) accelerator comprises a hardware-managed queue having an arbitrary queue type, and wherein the ISA corresponding to the QENG includes instructions for adding data to the queue and removing data from the queue, and wherein the arbitrary queue type is one of last-in-first-out (LIFO), first-in last-out (FILO) and first-in-first-out (FIFO) (CBE discloses, at p. 74, instructions to enqueue commands in a command queue and, at p. 844, queues can be implemented in a FIFO manner.); and 
wherein a subset of the one or more instructions is part of a chain, and wherein a chain management unit (CMU) accelerator stalls execution of each chained instruction until completion of a preceding chained instruction, and wherein other instructions of the one or more instructions can execute in parallel (CBE discloses, at p. 190,enforcing ordering for instructions in a tag group, which means that newer instructions are stalled until completion of older instructions, without affecting those instructions that are not part of the group.).
CBE does not explicitly disclose fetching another instruction not specifying any accelerator core , an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination and wherein the ISA corresponding to a collective engine (CENG) accelerator includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations
However, in the same field of endeavor (e.g., accelerators) Asai discloses:
…another instruction not specifying any accelerator core (Asai discloses, at ¶ [0022], code (instructions) that are platform independent, i.e., does not specify a core).).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the features related to platform independent code taught by Asai because doing so can improve performance by increasing flexibility regarding where code is executed. See, e.g., Asai, ¶ [0010].
Also in the same field of endeavor (e.g., coprocessors) MPI discloses:
an arithmetic operation and performing the arithmetic operation on each datum of a data block before copying resulting datum to a specified destination (MPI discloses, at p. 426, a get_accumulate function that specifies an operation to be performed on each element of data being written.);
wherein the ISA corresponding to the CENG includes collective operations, including reductions, all-reductions (reduction-2-all), broadcasts, gathers, scatters, barriers, and parallel prefix operations (MPI discloses, at p. 141, collective operations.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Regarding claim 26, CBE, as modified, discloses the elements of claim 1, as discussed above. CBE also discloses:
wherein the CENG includes circuitry… (CBE discloses, at p. 72, a memory flow controller (MFC), which discloses circuitry.).
CBE does not explicitly disclose one or more state machines to machine execution of the collective operations including one or more of reduction, broadcast, gather, and scatter operations.
However, in the same field of endeavor (e.g., coprocessors) MPI discloses:
maintaining one or more state machines to machine execution of the collective operations including one or more of reduction, broadcast, gather, and scatter operations (MPI discloses, at p. 141, collective operations including reduction, broadcast, gather, and scatter operations. The execution of these instructions discloses performing the various determinations and operations that make up state machines to perform the functions of the instructions.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the architecture taught by CBE to include the functionality disclosed by MPI to improve performance by allowing overlap between computation and communication. See MPI, p. 1. See MPI, p. 1.

Response to Arguments
On pages 13-15 of the response filed October 19, 2022 (“response”), the Applicant argues that the proposed combination of references does not disclose translating machine-level instructions into machine-level instructions of an ISA corresponding to a selected accelerator core. In support of this position, the Applicant argues that Asai discloses operating on higher level instructions that are not machine-level instructions.
These remarks have been fully considered and, in light of the claim amendments presented in the response, are deemed persuasive. Please see above for new grounds of rejection of the amended claims. Specifically, it is well-known to translate machine instructions into different ISAs, as disclosed by Agarwal. See, e.g., Agarwal, col. 15. 

On page 15 of the response the Applicant argues that the remaining claims are allowable for similar reasons. 
Though fully considered, the Examiner respectfully disagrees. The reasons set forth in the remarks and rejections presented above, including those regarding the independent claims, are applicable to these claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAWN DOMAN whose telephone number is (571)270-5677.  The examiner can normally be reached on Monday through Friday 8:30am-6pm Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jyoti Mehta can be reached on 571-270-3995.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHAWN DOMAN/
Primary Examiner, Art Unit 2183