Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
This Office Action is in response to Applicant’s Amendment filed on 4/21/2022.
Claims 1-11 and 13-20 are pending for this examination.
Claims 11 and 13 were amended.
Claim 12 was cancelled.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 3/03/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Fleming, Jr. et al. (US 10,572,376), herein referred to as Fleming ‘376.
Referring to claim 1, Fleming ‘376 teaches an apparatus (see Fig. 1, system 100), comprising:
a local memory (see Fig. 1A, memory subsystem 110, see Fig. 4 for a more detailed version wherein memory subsystem 110 includes cache 12 and memory 18);
a first hardware accelerator (HWA) (see Fig. 1, acceleration hardware 102; see Col. 5, lines 6-17, wherein the acceleration hardware 102 may be coarse grained spatial architecture made up of lightweight processing elements or other types of processing components connected by an inter-processing element network or another type of inter-component network);
a second HWA (see Fig. 15, wherein the system 1500 can comprise of multiple processing elements 1570 and 1580 connectable to each other 1550 and through bridge / chipset 1590; see Fig. 1, acceleration hardware 102; see Col. 5, lines 6-17, wherein the acceleration hardware 102 may be coarse grained spatial architecture made up of lightweight processing elements or other types of processing components connected by an inter-processing element network or another type of inter-component network), the second HWA and the first HWA connected in a flexible data pipeline (see Fig. 13A, wherein the system implements in-order and out-of-order pipelines according to one embodiment to schedule and execute instructions, which connects to memory to fetch instructions / data); and
a spare scheduler to manage, in response to the spare scheduler inserted in the flexible data pipeline, data movement between the first HWA and the second HWA through the local memory and a memory (see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data; see Fig. 15, wherein the system 1500 can comprise of multiple processing elements 1570 and 1580 connectable to each other 1550 and through bridge / chipset 1590 and local memories to each processor 1532, 1534, and an external memory 1528; Examiner also points out Fleming ‘376 Col. 6, lines 3-7, wherein it is stated that the hardware acceleration hardware 102 can be an external programmable chip such as an FPGA or CGRA and memory ordering circuit 105 interfaces with the acceleration hardware through an I/O hub or the like, i.e. the hardware accelerators can be external elements to the system as a whole and is connected to and operates with the system through a hub as would be seen in Fig. 15-16, or a multiprocessor core embodiment such as Fig. 17).
However, Examiner recognizes that Fleming ‘376 only talks about a singular hardware accelerator block, whereas the instant claims are claiming two distinct hardware accelerators connectable to memory and a pipeline.
Examiner points out that Fleming ‘376 specifically teaches that the hardware accelerators 102 can be an external programmable chip such as an FPGA or CGRA and memory ordering circuit 105 interfaces with the acceleration hardware through an I/O hub or the like, see Col. 6, lines 3-7, i.e. the hardware accelerators can be external elements to the system as a whole and is connected to and operates with the system through a hub as would be seen in Fig. 15-16, or a multiprocessor core embodiment such as Fig. 17, wherein Examiner points out that each processor could include its own hardware accelerator block as each processor has their own memory / cache, i.e. local memory, and a shared memory accessible to all processor, and execution of instructions would be run through a pipeline using a scheduler to implement in-order or out-of-order processing, see Fig. 13. 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Fleming ‘376 system to have a hardware accelerator per processor or processing core, as implied by the teachings of Fleming ‘376, as a person of ordinary skill in the art would have recognized that hardware accelerator elements such as FPGAs or CGRAs are commonly used in the art and would be desirable to have at least one accelerator per processor to help increase efficiency / parallel operations which would result in faster computing speeds and times for processing instructions and as such a person of ordinary skill in the art would have found having at least one accelerator per processor to be an obvious variation.
As to claim 2, Fleming ‘376 teaches the apparatus of claim 1, wherein the data movement is performed by a direct memory access (DMA) controller (see Fig. 17, wherein the system can implement DMA 1732 to move data).
As to claim 3, Fleming ‘376 teaches the apparatus of claim 2, wherein the spare scheduler sends a DMA trigger instruction to the DMA controller to send data between the local memory and the memory (see Fig. 17, wherein the system can implement DMA 1732 to move data; see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Col. 9, lines 38-67, wherein the queues and buffers of the memory ordering circuit 105 are used to implement and execute instructions to move / transfer / read / write data, wherein Examiner points out that an instruction to perform a DMA read / write would be an instruction that would trigger movement of data from memories, i.e. prefetching data from memory to cache or vice versa, see Fig. 13A, wherein a prefetcher can move data from memory 1370 to cache units to allow for execution of instructions through the pipeline).
As to claim 4, Fleming ‘376 teaches the apparatus of claim 1, further including a memory mapped register to configure the flexible data pipeline (see Fig. 13A, wherein the pipeline for executing instruction include caches, TLBs, register files, etc., which can be considered as memory mapped registers).
As to claim 5, Fleming ‘376 teaches the apparatus of claim 1, wherein the memory is an on-chip memory or an external memory (see Figs. 15-17, wherein each processor / processor core includes its own cache units 1704, are connected to their own memory 1532, 1534, 1632, 1634, and can connect to a separate data storage 1528, 1730).
As to claim 10, Fleming ‘376 teaches the apparatus of claim 1, further including a series of spare schedulers inserted in the flexible data pipeline (see Fig. 4, operations manager circuit 430 including scheduler 432 and execution circuit 434; see Fig. 13A, wherein the system implements in-order and out-of-order pipelines according to one embodiment to schedule and execute instructions, which connects to memory to fetch instructions / data).

Allowable Subject Matter
Claims 6-9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
In claims 6-9, Examiner points out that having the schedulers in the claimed invention having and implementing pattern adapters and pattern instructions is different from other prior art systems.


Claims 11, 13-20 are allowed.  
The following is an examiner’s statement of reasons for allowance: 
Prior art teaches scheduling systems and methods where data is moved between host / master devices or accelerators using DMA and transferring data from local memory to global / system / shared memories and systems and methods for performing context switching to swap out data / contexts, however, the prior art does not fairly teach or suggest, individually or in combination, a method for managing a spare scheduler where a pattern adapter instruction is sent to a spare scheduler, then managing data movement between a first hardware accelerator and second hardware accelerator, and sending in response to a data swap-out instruction a first DMA trigger to a DMA controller to transfer first data from the first hardware accelerator from local memory to memory, and sending in response to a data swap-in instruction a second DMA trigger to a DMA controller to transfer second data to be consumed by the second hardware accelerator from memory to the local memory as claimed.  Examiner specifically points out that prior art teaches doing context switching where data is moved out from local memory to a persistent memory and moving data from persistent memory to local memory, i.e. swap out and swap in, and prior arts also teach systems for scheduling and load balancing multiple schedulers, but Examiner points out that prior arts do not teach a system and method using spare scheduler and using pattern adapter instructions with the spare scheduler in a system that is swapping in / out data to multiple hardware accelerators as is being claimed.  Applicant argues these limitations on Page 7 of Applicant’s Remarks filed 4/21/20122.  The prior art of record neither anticipates nor renders obvious the above recited combination.

As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).

Response to Arguments
Applicant’s arguments, mailed 4/21/2022, have been fully considered but they are not deemed to be persuasive.

Applicant’s arguments that the modification of Fleming ‘376 in Examiner’s mapping which includes an official notice taken would not teach the claimed subject matter explicitly, as Applicants indicate that Fleming ‘376 teaches that a hardware accelerator can be an external component or a part of a system on a chip Col. 6, II. 1-7, and this would not suggest that it would be obvious that each processor of a multiprocessor core would  / could include a hardware accelerator as Examiner suggests as nothing in Fleming ‘376 does not suggest how the hardware accelerators would interact much less teach a flexible data pipeline coupling a first and second hardware accelerator (see Pages 5-6) are deemed to be unpersuasive. 
Examiner points out that the rejection using Flaming ‘376 mapped out that an operations manager circuit 430 that includes a scheduler 432 are used to manage queues and buffers of the memory ordering circuit which directs and executes instructions for moving / transferring data, see Fig. 4, and where the memory ordering circuit connects and communicates with an accelerator hardware, which may be an external element, through to an I/O hub such as the embodiments in Figs. 15-17.  The reasoning for the rejection being a 103 rejection was due to the fact that Fleming ‘376 only mentions a hardware accelerator as a singular element without mentioning multiples, i.e. not two hardware accelerators, wherein the 103 obvious rejection was made using just Fleming ‘376 with the logic of having an accelerator for each processor or processor core of a system which is in line with the idea of singular vs. plural or having multiples of the hardware elements arranged in the same exact way doing the same exact thing which would not be teaching away from what is taught by Fleming ‘376 and would be something those of ordinary skill in the art would have found to be an obvious variation.  More specifically, Examiner points out that taking the Fleming ‘376 system which connects a processor with a hardware accelerator could be duplicated to create a second exact setup with a second processor and second hardware accelerator and still fit within the teaching of Fleming ‘376 without needing extra experimentation to work, in which other embodiments of Fleming ‘376 depict multiple processor setups or multi-core processor setups, see Figs. 15-17.  As to why one would think of using the same hardware setup multiple times, Examiner points out that this was the original intent of parallel processing, to which those of ordinary skill in the art would recognize that the art as a whole would try to simplify this basic parallelism by combining redundant elements into a singular element, but the basis of parallelism is to essentially duplicate the hardware over and over so that multiple processes can be done at the same time in their own hardware path.  As such, Examiner points out that singular vs. multiple is often seen as an obvious variation and finds Applicant’s arguments to be unpersuasive.
Furthermore, Examiner points out that arguments about how hardware accelerators interacting with each other is a broad argument that is not specifically supported by the current claim language. Just having a flexible pipeline coupling a first and second hardware accelerator does not imply that the hardware accelerators are communicating directly with each other as is being argued, i.e. multiple elements connecting to a common bus / interconnect would not necessarily need to directly communicate with each element but could just communicate with a central unit such as a CPU / arbiter / scheduler / controller which in turn would communicate with each element, in which Examiner specifically points out that the claim language indicates that the spare scheduler is coupled to the flexible data pipeline and used to manage data movements between the first and second hardware accelerators THROUGH the local memory and a memory which does not implicitly or explicitly indicate that the HWAs are communicating to each other directly, rather this current claim language suggests that the scheduler is the unit that communicates with the individual HWAs and memory / local memory, not that the HWAs are communicating with each other.  As such these arguments directed to the communication between the HWAs and how Fleming ‘376 does not teach communication between accelerators are unpersuasive as this is not clearly indicated / claimed in the claim language and as should would not be the only possible interpretation under the broadest interpretation of the claim language.

In summary, Fleming ‘376 teaches the claimed invention as set forth above.

Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Fleming, Jr. et al. (US 10,474,375) teaches a memory accelerator hardware system for executing instructions using load / store / completion queues.
Nurvitadhi et al. (US 2018/0189239) teaches a heterogeneous hardware accelerator system with memory connected to sparse tiles, each sparse tile with processing elements, memory and schedulers that can be used to scheduler operations to a memory through an interconnect. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL SUN whose telephone number is (571)270-1724.  The examiner can normally be reached on Monday-Friday 8am-4pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on 571-272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL SUN/Primary Examiner, Art Unit 2183