Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to Applicant’s Application filed on 12/30/2020.
Claims 1-20 are pending for this examination.

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 10-11, and 14-19 are rejected under 35 U.S.C. 103 as being unpatentable over Fleming, Jr. et al. (US 10,572,376), herein referred to as Fleming ‘376.
Referring to claim 1, Fleming ‘376 teaches an apparatus (see Fig. 1, system 100), comprising:
a local memory (see Fig. 1A, memory subsystem 110, see Fig. 4 for a more detailed version wherein memory subsystem 110 includes cache 12 and memory 18);
a first hardware accelerator (HWA) (see Fig. 1, acceleration hardware 102; see Col. 5, lines 6-17, wherein the acceleration hardware 102 may be coarse grained spatial architecture made up of lightweight processing elements or other types of processing components connected by an inter-processing element network or another type of inter-component network);

a spare scheduler to manage, in response to the spare scheduler inserted in the flexible data pipeline, data movement between the first HWA and the second HWA through the local memory and a memory (see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data; see Fig. 15, wherein the system 1500 can comprise of multiple processing elements 1570 and 1580 connectable to each other 1550 and through bridge / chipset 1590 and local memories to each processor 1532, 1534, and an external memory 1528; Examiner also points out Fleming ‘376 Col. 6, lines 3-7, wherein it is stated that the hardware acceleration hardware 102 can be an external programmable chip such as an FPGA or CGRA and memory ordering circuit 105 interfaces with the acceleration hardware through an I/O hub or the like, i.e. the hardware accelerators can be external elements to the system as a whole and is connected to and operates with the system through a hub as would be seen in Fig. 15-16, or a multiprocessor core embodiment such as Fig. 17).

Examiner points out that Fleming ‘376 specifically teaches that the hardware accelerators 102 can be an external programmable chip such as an FPGA or CGRA and memory ordering circuit 105 interfaces with the acceleration hardware through an I/O hub or the like, see Col. 6, lines 3-7, i.e. the hardware accelerators can be external elements to the system as a whole and is connected to and operates with the system through a hub as would be seen in Fig. 15-16, or a multiprocessor core embodiment such as Fig. 17, wherein Examiner points out that each processor could include its own hardware accelerator block as each processor has their own memory / cache, i.e. local memory, and a shared memory accessible to all processor, and execution of instructions would be run through a pipeline using a scheduler to implement in-order or out-of-order processing, see Fig. 13. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fleming ‘376 system to have a hardware accelerator per processor or processing core, as implied by the teachings of Fleming ‘376, as a person of ordinary skill in the art would have recognized that hardware accelerator elements such as FPGAs or CGRAs are commonly used in the art and would be desirable to have at least one accelerator per processor to help increase efficiency / parallel operations which would result in faster computing speeds and times for processing instructions and as such a person of ordinary skill in the art would have found having at least one accelerator per processor to be an obvious variation.
claim 2, Fleming ‘376 teaches the apparatus of claim 1, wherein the data movement is performed by a direct memory access (DMA) controller (see Fig. 17, wherein the system can implement DMA 1732 to move data).
As to claim 3, Fleming ‘376 teaches the apparatus of claim 2, wherein the spare scheduler sends a DMA trigger instruction to the DMA controller to send data between the local memory and the memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline).
As to claim 4, Fleming ‘376 teaches the apparatus of claim 1, further including a memory mapped register to configure the flexible data pipeline (see Fig. 13A, wherein the pipeline for executing instruction include caches, TLBs, register files, etc., which can be considered as memory mapped registers).
As to claim 5, Fleming ‘376 teaches the apparatus of claim 1, wherein the memory is an on-chip memory or an external memory (see Figs. 15-17, wherein each processor / processor core includes its own cache units 1704, are connected to their own memory 1532, 1534, 1632, 1634, and can connect to a separate data storage 1528, 1730).
As to claim 10, Fleming ‘376 teaches the apparatus of claim 1, further including a series of spare schedulers inserted in the flexible data pipeline (see Fig. 4, operations manager circuit 

Referring to claim 11, Fleming ‘376 teaches a method to manage a spare scheduler (see Abstract), the method comprising: 
managing, in response to obtaining a manage instruction, data movement between a first HWA and a second HWA (see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data; see Fig. 15, wherein the system 1500 can comprise of multiple processing elements 1570 and 1580 connectable to each other 1550 and through bridge / chipset 1590 and local memories to each processor 1532, 1534, and an external memory 1528; Examiner also points out Fleming ‘376 Col. 6, lines 3-7, wherein it is stated that the hardware acceleration hardware 102 can be an external programmable chip such as an FPGA or CGRA and memory ordering circuit 105 interfaces with the acceleration hardware through an I/O hub or the like, i.e. the hardware accelerators can be external elements to the system as a whole and is connected to and operates with the system through a hub as would be seen in Fig. 15-16, or a multiprocessor core embodiment such as Fig. 17); 
sending, in response to obtaining a data swap-out instruction, a first DMA trigger instruction to a direct memory access (DMA) controller to transfer first data produced by the first HWA from a local memory to a memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and 
sending, in response to obtaining a data swap-in instruction, a second DMA trigger instruction to the DMA controller to transfer second data to be consumed by the second HWA from the memory to the local memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline; Examiner points out that a swap instruction can simply be multiple move instructions in order to move data to and from the cache without overwriting / erasing data).
However, Examiner recognizes that Fleming ‘376 only talks about a singular hardware accelerator block, whereas the instant claims are claiming two distinct hardware accelerators connectable to memory and a pipeline.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fleming ‘376 system to have a hardware accelerator per processor or processing core, as implied by the teachings of Fleming ‘376, as a person of ordinary skill in the art would have recognized that hardware accelerator elements such as FPGAs or CGRAs are commonly used in the art and would be desirable to have at least one accelerator per processor to help increase efficiency / parallel operations which would result in faster computing speeds and times for processing instructions and as such a person of ordinary skill in the art would have found having at least one accelerator per processor to be an obvious variation.
As to claim 14, Fleming ‘376 teaches the method of claim 11, wherein the memory is an on-chip memory (see Fig. 1, memory subsystem 110; see Fig. 17 where each processor core includes a cache unit 1704).
claim 15, Fleming ‘376 teaches the method of claim 11, wherein the manage instruction indicates the memory is used for the data movement between the first HWA and the second HWA (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline).
As to claim 16, Fleming ‘376 teaches the method of claim 11, wherein the first data is produced by the first HWA in response to completion of a task by the first HWA, the first data being either data block or data line (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline; also see Fig. 5, wherein queues such as operation queue 412 and completion queues 420 are used by the memory ordering circuit to track and schedule instructions to perform read / write operations).
claim 17, Fleming ‘376 teaches the method of claim 11, wherein the second data is consumed by the second HWA in response to initiation of a task by the second HWA, the second data being either data line or data block (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline; also see Fig. 5, wherein queues such as operation queue 412 and completion queues 420 are used by the memory ordering circuit to track and schedule instructions to perform read / write operations).
As to claim 18, Fleming ‘376 teaches the method of claim 11, wherein the data swap-out instruction is sent in response to a set of first data aggregated in the memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline; also see Fig. 5, wherein queues such as operation queue 412 and completion queues 420 are used by the memory ordering circuit to track and schedule 
As to claim 19, Fleming ‘376 teaches the method of claim 11, wherein the data swap-in instruction is sent in response to a set of second data aggregated in the memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline; also see Fig. 5, wherein queues such as operation queue 412 and completion queues 420 are used by the memory ordering circuit to track and schedule instructions to perform read / write operations; Examiner points out that a swap instruction can simply be multiple move instructions in order to move data to and from the cache without overwriting / erasing data).

Allowable Subject Matter
Claims 6-9, 12-13, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fleming, Jr. et al. (US 10,474,375) teaches a memory accelerator hardware system for executing instructions using load / store / completion queues.
Nurvitadhi et al. (US 2018/0189239) teaches a heterogeneous hardware accelerator system with memory connected to sparse tiles, each sparse tile with processing elements, memory and schedulers that can be used to scheduler operations to a memory through an interconnect. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724.  The examiner can normally be reached on Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL SUN/Primary Examiner, Art Unit 2183