Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office action is in response to application papers filed on May 20, 2021.
Claims 1-23 are pending in the application with claims 1, 11, 19, and 23 being independent claims.
Continuation Application 
This application is a continuation of U.S. Patent Application No.: 16/536,192 with patent no. 11,080,227 filed August 8, 2019.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.   A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-9, 11-13, 15, 17-19, and 23 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-21 of U.S. Patent No. 11,080,227 (hereafter ‘227).  Although the conflicting claims are not identical, they are not patentably distinct from each other because Claims 1-23 of the instant application define an obvious variation of the invention claimed in ‘227.
The following side-by-side comparison between the representative claim 1 of ‘227 and the representative claim 1 of the instant application with the similarities boldfaced for the Applicant’s convenience.
Claim 1 of ‘227
Claim 1 of instant application
1. A computer-implemented method of transforming a high-level program for mapping onto a reconfigurable data processor with an array of configurable units, the method including: 
partitioning a dataflow graph of the high-level program into memory allocations and execution fragments, wherein the memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph, and 
the execution fragments require operations on the data, including loading the data from allocated memory and computing with the data; 
designating the memory allocations to virtual memory units and designating the execution fragments to virtual compute units; 
partitioning the execution fragments into memory fragments that load the data from the allocated memory and compute fragments that compute with the data; 
assigning the memory fragments to the virtual memory units and assigning the compute fragments to the virtual compute units; allocating the virtual memory units to physical memory units and allocating the virtual compute units to physical compute units; placing the physical memory units and the physical compute units onto positions in the array of configurable units and routing data and control networks between the placed positions; and 
generating a bit file with configuration data for the placed positions and the routed data and control networks, 

wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph.
1. A computer-implemented method of transforming a high-level program for mapping onto a reconfigurable data processor with an array of configurable units, the method including: 
partitioning a dataflow graph of the high-level program into memory allocations and execution fragments, wherein the memory allocations represent creation of logical memory spaces in one or more memories for data to implement the dataflow graph, and 

the execution fragments represent operations on the data, including loading the data from allocated memory and computing with the data; 




partitioning the execution fragments into (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data; and 












generating a bit file with configuration data, based at least in part on the memory allocations, the memory fragments, and the compute fragments, 
wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph.
1 and 6. A computer-implemented method of transforming a high-level program for mapping onto a reconfigurable data processor with an array of configurable units, the method including: 
partitioning a dataflow graph of the high-level program into memory allocations and execution fragments, wherein the memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph, and 
the execution fragments require operations on the data, including loading the data from allocated memory and computing with the data; 
designating the memory allocations to virtual memory units and designating the execution fragments to virtual compute units; 
partitioning the execution fragments into memory fragments that load the data from the allocated memory and compute fragments that compute with the data; 
assigning the memory fragments to the virtual memory units and assigning the compute fragments to the virtual compute units; allocating the virtual memory units to physical memory units and allocating the virtual compute units to physical compute units; placing the physical memory units and the physical compute units onto positions in the array of configurable units and routing data and control networks between the placed positions; and 
generating a bit file with configuration data for the placed positions and the routed data and control networks, 
wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph.
11. A computer-implemented method comprising: 



generating, from a dataflow graph of a high-level program, (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data; 






assigning the memory fragments to the virtual memory units and assigning the compute fragments to the virtual compute units; allocating the virtual memory units to physical memory units and allocating the virtual compute units to physical compute units; 
fusing at least two of the physical memory units into a single physical memory unit and/or fusing at least two of the physical compute units into a single physical compute unit (‘227, claim 6); and 







generating a bit file with configuration data, based at least in part of the fusing, wherein the bit file, 
when loaded onto an instance of an array of configurable units, causes the array of configurable units to implement the dataflow graph.




Similarly, independent claims 19 and 23 are obvious variations of claims 24 and 23 in ‘227 respectively. Therefore, they are rejected for the same reason set forth in the rejection of Claim 1.
Likewise, dependent claims 2-9 are obvious variations of claims 1-3 and 11 and claims 12-13, 15, and 17-19 are obvious variations of claims 1, 5, 6, 7, and 9 in ‘227 respectively. Therefore, they are rejected for the same reason set forth in the rejection of Claim 1.
This is a non-provisional anticipatory-type double patenting rejection because the conflicting claims have been patented.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 8-12, 14, 16, 18, and 23 are rejected under 35 USC 103 (a) as being unpatentable over “Plasticine: A Reconfigurable Architecture For Parallel Patterns”, Prabhakar et al., ISCA ’17, June 24-28, 2017 (hereinafter “Prabhakar “), in view of US 2018/0189231 (hereinafter “Fleming”), and further in view of US Pat.10331836 (hereinafter “Hosangadi”).
In the following claim analysis, Applicant’s claim language is presented boldfaced and Examiner’s explanations are in square brackets.

As to claim 1, Prabhakar discloses a computer-implemented method of transforming a high-level program for mapping onto a reconfigurable data processor with an array of configurable units (Prabhakar, Figure 3, pg. 392, col. 1, Plasticine is a tiled architecture consisting of reconfigurable Pattern Compute Units (PCUs) and Pattern Memory Units (PMUs); pg. 392, col. 2, the PCU datapath is organized as a multi-stage, reconfigurable SIMD pipeline), the method including:
partitioning a dataflow graph of the high-level program into memory allocations and execution fragments (Prabhakar, pg. 390. col. 1,  Plasticine is a two dimensional array of two kinds of coarse-grained reconfigurable units: Pattern Compute Units (PCUs) and Pattern
Memory Units (PMUs); pg. 395, col. 2, To map DHDL to Plasticine, we first unroll outer pipelines using user-specified or auto-tuned parallelization factors. The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs … The computation in inner controllers is scheduled by linearizing the data flow graph and mapping the resulting list of operations to virtual stages and registers. Each local memory maps to a virtual PMU … map each virtual unit into a set of physical units by partitioning its stages. Virtual PCUs are partitioned into multiple PCUs, while PMUs become one PMU with zero or more supporting PCUs), wherein the memory allocations represent creation of logical memory spaces in one or more memories for data to implement the dataflow graph (Prabhakar, pg. 390, col. 2, The on-chip, banked scratchpads are configurable to support streaming and double buffered accesses. The off-chip memory controllers support both streaming (burst) patterns and scatter/gather accesses. Finally, the on-chip control logic is configurable to support nested patterns pg. 395, col. 2, Pipelines in DHDL are either outer controllers which contain only other pipelines, or inner controllers which contain no other controllers, only dataflow graphs of compute and memory operations), and the execution fragments represent operations on the data (Prabhakar, Fig. 5, pg. 392, col. 2, Based on the application’s control and data dependencies, the control block can be configured to combine multiple control signals from both local FIFOs and global control inputs to trigger PCU execution; pg. 394, col. 1, Address calculation is performed on the PMU datapath, while the core computation is performed within the PCU); and
generating a bit file with configuration data, based at least in part on the memory allocations, the memory fragments, and the compute fragments (Prabhakar, pg. 395, col. 2, Given this placement and routing information, we then generate a … a static configuration “bitstream” for the architecture. The hierarchical architecture, coupled with the coarse granularity of buses between compute units, allows our entire compilation process to finish; pg. 395, col. 2, To map DHDL to Plasticine … The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs).
	Prabhakar discloses the execution fragments represent operations on the data, but does not appear to explicitly disclose the execution fragments represent operations on the data, including loading the data from allocated memory and computing with the data; partitioning the execution fragments into (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data. However, in an analogous art to the claimed invention in the field of providing a reconfigurable architecture, Fleming teaches loading the data from allocated memory and computing with the data (Fleming, ¶ 84, a logical view of the dataflow graph is captured and committed into memory; Fig. 6, ¶ 103, A processing element (PE) may include an input buffer (e.g., buffer 606) and an output buffer (e.g., buffer 608; ¶ 237, A dataflow token may cause an output from a dataflow operator receiving the dataflow token to be sent to an input buffer of a particular processing element of the plurality of processing elements. The second operation may include a memory access [loading the data] and the plurality of processing elements); partitioning the execution fragments into (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data (Fleming, Fig. 3A, ¶ 79, an accelerator (e.g., CSA) with a plurality of processing elements 301 configured to execute the dataflow graph of FIG. 3B … one or more of the processing elements in the array of processing elements 301 is to access memory through memory interface 302; ¶ 84, (in which a dataflow graph is loaded into the CSA), extraction (in which the state of an executing graph is moved to memory) … Conceptually, configuration may load the state of a dataflow graph into the interconnect and processing elements).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar’s technology with Fleming’s configurable spatial accelerator, with a reasonable expectation of success, to include “the execution fragments represent operations on the data, including loading the data from allocated memory and computing with the data; partitioning the execution fragments into (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data”. The modification would be obvious because one of ordinary skill in the art would be motivated to constantly improve reconfigurable architectures like FPGAs and CGRAs for better performance and energy efficiency.
Prabhakar as modified does not appear to explicitly disclose wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph. However, in an analogous art to the claimed invention in the field of providing a reconfigurable architecture, Hosangadi teaches: wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph (Hosangadi, col. 1, lines 15-21, The DFG representation can be specified as a bit-level representation. The DFG defines the circuit architecture of the circuit design that is realized in physical circuitry; col. 5, lines 53-57, DFG can be specified as a word-level representation or as a bit-level representation).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with a bit-level representation of the dataflow graph of a high level program as taught by Hosangadi, with a reasonable expectation of success, to include “wherein the bit file, when loaded onto an instance of the array of configurable units, causes the array of configurable units to implement the dataflow graph”. The modification would be obvious because one of ordinary skill in the art would be motivated to also convert the bit-level representation of the dataflow graph of a high level program to word-level representations to achieve more compact representation of the circuitry requiring less memory and increased execution performance for other non-bit level processing (Hosangadi, col. 14, lines 25-34).

As to claim 2, the rejection of claim 1 is incorporated.  Prabhakar as modified further discloses: assigning the memory fragments to virtual memory units and assigning the compute fragments to virtual compute units (Prabhakar, pg. 395, col. 2, To map DHDL (Delite Hardware Definition Language) to Plasticine, we first unroll outer pipelines using user-specified or auto-tuned parallelization factors. The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs). 

As to claim 3, the rejection of claim 2 is incorporated. Prabhakar as modified further discloses: allocating the virtual memory units to physical memory units and allocating the virtual compute units to physical compute units (Prabhakar, pg. 395, col. 2, We then map each virtual unit into a set of physical units by partitioning its stages. Virtual PCUs are partitioned into multiple PCUs, while PMUs become one PMU with zero or more supporting
PCUs).

As to claim 4, the rejection of claim 2 is incorporated. Prabhakar as modified further discloses: placing the physical memory units and the physical compute units onto positions in the array of configurable units and routing data and control networks between the placed positions (Prabhakar, pg. 390, col. 1, Pattern Compute Units (PCUs) and Pattern Memory Units (PMUs). Each PCU consists of a reconfigurable pipeline with multiple stages of SIMD functional units … PMUs are composed of a banked scratchpad memory and dedicated addressing logic and address decoders. These units communicate with each other through a pipelined static hybrid interconnect with separate bus-level and word-level data, and bit-level control networks; Figure 1, pg. 395, col. 2, the compiler selects a proposed partitioning where all PCUs and PMUs are physically realizable … We then perform hierarchical binding of virtual hardware nodes to physical hardware resources … Given this placement and routing information, we then generate a Plasticine configuration description).

As to claim 5, the rejection of claim 2 is incorporated. Prabhakar as modified further discloses: wherein the bit file includes the configuration data for the placed positions and the routed data and control networks (Prabhakar, pg. 395, col. 2, Given this placement and routing information, we then generate a … a static configuration “bitstream” for the architecture. The hierarchical architecture, coupled with the coarse granularity of buses between compute units, allows our entire compilation process to finish; pg. 390, col. 1, These units communicate with each other through a pipelined static hybrid interconnect with separate bus-level and word-level data, and bit-level control networks).

As to claim 8, the rejection of claim 1 is incorporated. Prabhakar as modified further discloses: designating the memory allocations to virtual memory units and designating the execution fragments to virtual compute units (Prabhakar, pg. 395, col. 2, To map DHDL (Delite Hardware Definition Language) to Plasticine, we first unroll outer pipelines using user-specified or auto-tuned parallelization factors. The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs).

As to claim 9, the rejection of claim 1 is incorporated. Prabhakar as modified further discloses: wherein the execution fragments are executable asynchronously (Fleming, ¶ 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) including that the execution fragments are executable asynchronously, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated use a CSA that targets the direct execution of a dataflow graph to yield a computationally dense yet energy-efficient spatial microarchitecture which far exceeds conventional roadmap architectures, resulting in a highly scalable architecture with a distributed, asynchronous execution model (Fleming, ¶ ¶ 70 and 78).

As to claim 10, the rejection of claim 1 is incorporated. Prabhakar as modified further discloses: wherein a first execution fragment is fragmented into one or more corresponding memory fragments and exactly one corresponding compute fragment (Fleming, ¶ 97, each PE is responsible for a single operation in one embodiment, the register files and ports counts may be small, e.g., often only one).
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) that a first execution fragment is fragmented into one or more corresponding memory fragments and exactly one corresponding compute fragment, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to use a CSA that targets the direct execution of a dataflow graph to yield an energy level per watt that is only a small percentage over the cost of the bare arithmetic circuitry. For example, in the case of an integer multiply, a CSA may consume no more than 25% more energy than the underlying multiplication circuit. Relative to one embodiment of a core, an integer operation in that CSA fabric consumes less than 1/30th of the energy per integer operation (Fleming, ¶ ¶ 70 and 97).
As to claim 11, Prabhakar as modified further discloses a computer-implemented method (Fleming, Abstract, Systems, methods, and apparatuses relating to a configurable spatial accelerator) comprising: 
generating, from a dataflow graph of a high-level program, (i) memory fragments that load the data from the allocated memory and (ii) compute fragments that compute with the data (Fleming, Fig. 3A, ¶ 79, an accelerator (e.g., a configurable spatial accelerator (CSA)) with a plurality of processing elements 301 configured to execute the dataflow graph of FIG. 3B … one or more of the processing elements in the array of processing elements 301 is to access memory through memory interface 302; ¶ 84, (in which a dataflow graph is loaded into the CSA), extraction (in which the state of an executing graph is moved to memory) … Conceptually, configuration may load the state of a dataflow graph into the interconnect and processing elements); 
assigning the memory fragments to the virtual memory units and assigning the compute fragments to the virtual compute units (Prabhakar, pg. 395, col. 2, To map DHDL (Delite Hardware Definition Language) to Plasticine, we first unroll outer pipelines using user-specified or auto-tuned parallelization factors. The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs); 
allocating the virtual memory units to physical memory units and allocating the virtual compute units to physical compute units (Prabhakar, pg. 395, col. 2, We then map each virtual unit into a set of physical units by partitioning its stages. Virtual PCUs are partitioned into multiple PCUs, while PMUs become one PMU with zero or more supporting PCUs); 
fusing at least two of the physical memory units into a single physical memory unit (Fleming, Fig. 6, ¶102, PEs may be configured as dataflow operators …  data may be streamed in from memory, through the fabric, and then back out to memory; ¶ 103, A processing element may include an input buffer (e.g., buffer 606) and an output buffer (e.g., buffer 608)) and/or fusing at least two of the physical compute units into a single physical compute unit; and generating a bit file with configuration data, based at least in part of the fusing (Prabhakar, pg. 395, col. 2, Given this placement and routing information, we then generate a … a static configuration “bitstream” for the architecture. The hierarchical architecture, coupled with the coarse granularity of buses between compute units, allows our entire compilation process to finish; pg. 395, col. 2, To map DHDL to Plasticine … The resulting unrolled representation is then used to allocate and schedule virtual PMUs and PCUs), wherein the bit file, when loaded onto an instance of an array of configurable units, causes the array of configurable units to implement the dataflow graph (Hosangadi, col. 1, lines 15-21, The DFG representation can be specified as a bit-level representation. The DFG defines the circuit architecture of the circuit design that is realized in physical circuitry; col. 5, lines 53-57, DFG can be specified as a word-level representation or as a bit-level representation).
The motivation to combine Prabhakar, Fleming, and Hosangadi references are the same as articulated in the rejection of claim 1. 
As to claim 12, the rejection of claim 11 is incorporated. Prabhakar as modified further discloses: placing the physical memory units and the physical compute units, including the single physical memory unit and/or the single physical compute unit, onto positions in the array of configurable units (Fleming, Fig. 6, ¶102, PEs may be configured as dataflow operators …  data may be streamed in from memory, through the fabric, and then back out to memory; ¶ 103, A processing element may include an input buffer (e.g., buffer 606) and an output buffer (e.g., buffer 608)) and routing data and control networks between the placed positions, wherein the bit file includes the configuration data for the placed positions and the routed data and control networks (Fleming, Fig.6-7, ¶ 102, Spatial architectures, such as the one shown in FIG. 6, may be the composition of lightweight processing elements connected by an inter-PE network; ¶ 107, . Network 700 includes a plurality of multiplexers (e.g., multiplexers 702, 704, 706) that may be configured (e.g., via their respective control signals) to connect one or more data paths (e.g., from PEs) together). 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) including placing the physical memory units and the physical compute units, including the single physical memory unit and/or the single physical compute unit, onto positions in the array of configurable units and routing data and control networks between the placed positions, wherein the bit file includes the configuration data for the placed positions and the routed data and control networks, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to optimize the CSA microarchitecture in which a CSA is to provide a backward-flowing flow control path that is physically paired with the forward data-path. The combination of the two microarchitectural paths may provide a low-latency, low-energy, low-area, point-to-point implementation of the latency-insensitive channel abstraction (Fleming, ¶ 117).
As to claim 14, the rejection of claim 11 is incorporated. Prabhakar as modified further discloses: the method of claim 11, wherein fusing at least two of the physical compute units into the single physical compute unit comprises: causing execution of multiple operations on the single physical compute unit that would otherwise execute on the at least two of the physical compute units, wherein fusing is a fusing in space (Fleming, Fig. 6, ¶ 104, A PE may obtain operands from input channels and write results to output channels). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) including causing execution of multiple operations on the single physical compute unit that would otherwise execute on the at least two of the physical compute units, wherein fusing is a fusing in space, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to execute operations executed based on the availability of their inputs and the status of the PE. This style of PE may be extremely energy efficient, for example, rather than reading data from a complex, multi-ported register file, a PE reads the data from a register (Fleming, ¶ 104).

As to claim 16, the rejection of claim 11 is incorporated. Prabhakar as modified further discloses: the method of claim 11, wherein fusing at least two of the physical memory units into the single physical memory unit comprises: causing execution of multiple memory operations on the single physical memory unit that would otherwise execute on the at least two of the physical memory units, wherein fusing is a fusing in space (Fleming, Fig. 6, ¶ 104, A PE may obtain operands from input channels and write results to output channels … the integer PE. This PE consists of several I/O buffers, an ALU, a storage register, some instruction registers, and a scheduler. Each cycle, the scheduler may select an instruction for execution based on the availability of the input and output buffers and the status of the PE … rather than reading data from a complex, multi-ported register file, a PE reads the data from a register. Similarly, instructions may be stored directly in a register, rather than in a virtualized instruction cache). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) causing execution of multiple memory operations on the single physical memory unit that would otherwise execute on the at least two of the physical memory units, wherein fusing is a fusing in space, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to decompose a fused multiply add (FMA) into separate, but tightly coupled floating multiply and floating add units to improve support for multiply-add-heavy workloads (Fleming, ¶ 106).

As to claim 18, the rejection of claim 11 is incorporated. Prabhakar as modified further discloses:  the method of claim 11, wherein the fusing is based, at least in part, on a capacity of on- chip SRAM available in a physical memory unit (Fleming, Fig. 45, ¶ 358,  a block diagram of a SoC 4500 in accordance with an embodiment of the present disclosure … In FIG. 45, an interconnect unit(s) 4502 is coupled to: an application processor 4510 which includes a set of one or more cores 202A-N and shared cache unit(s) 4106; a system agent unit 4110; a bus controller unit(s) 4116; an integrated memory controller unit(s) 4114; a set or one or more coprocessors 4520 which may include integrated graphics logic, an image processor, an audio processor, and a video processor; an static random access memory (SRAM) unit 4530), and a number of arithmetic logic unit (ALU) stages within the single physical compute unit (Fleming, ¶ 116, PE execution may proceed in a dataflow style … orchestrates the actual execution of the operation by a dataflow operator (e.g., on the ALU)). Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Fleming’s configurable spatial accelerator (CSA) that the fusing is based, at least in part, on a capacity of on- chip SRAM available in a physical memory unit, and a number of arithmetic logic unit (ALU) stages within the single physical compute unit, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to orchestrate a configuration including one or two control words which specify an opcode controlling the ALU, steer the various multiplexors within the PE, and actuate dataflow into and out of the PE channels (Fleming, ¶ 115).

Claim 13 is rejected under 35 USC 103 (a) as being unpatentable over “Plasticine: A Reconfigurable Architecture For Parallel Patterns”, Prabhakar et al., ISCA ’17, June 24-28, 2017 (hereinafter “Prabhakar “), in view of US 2018/0189231 (hereinafter “Fleming”), in view of US Pat.10331836 (hereinafter “Hosangadi”), and further in view of US 2019/0102338 (hereinafter “Tang”).

As to claim 13, the rejection of claim 11 is incorporated. Prabhakar as modified further discloses: further generating, from the dataflow graph of the high-level program, memory allocations that represent creation memory spaces in one or more memories for data to implement the dataflow graph (Fleming, Fig. 6, ¶ 91, a spatial architecture schema, e.g., as exemplified in FIG. 6, is the composition of light-weight processing elements (PE) connected by an inter-PE network. Generally, PEs may comprise dataflow operators, e.g., where once all input operands arrive at the dataflow operator, some operation (e.g., micro-instruction or set of micro-instructions) is executed, and the results are forwarded to downstream operators. Control, scheduling, and data storage may therefore be distributed amongst the Pes).
Fleming does not appear to explicitly disclose creation of logical memory spaces. However, in an analogous art to the claimed invention in the field of providing a reconfigurable architecture, Tang teaches creation of logical memory spaces (Tang, Fig. 60, the logical representation of the logical memory operation). 
Therefore, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Prabhakar as modified technology with Tang’s configurable spatial accelerator (CSA) including generating, from the dataflow graph of the high-level program, memory allocations that represent creation memory spaces in one or more memories for data to implement the dataflow graph, with a reasonable expectation of success. The modification would be obvious because one of ordinary skill in the art would be motivated to optimize the CSA microarchitecture to reduce design risk, e.g., such that the CSA and core are completely decoupled in manufacturing. In addition to allowing better component reuse, this may allow the design of components like the CSA Cache to consider only the CSA, e.g., rather than needing to incorporate the stricter latency requirements of the core (Tang, ¶ 143).

As to claim 23, the claim is a non-transitory computer readable storage medium claim corresponding to the method claim 1. Accordingly, it is rejected under the same rational set forth in the rejection of the method claim.,
Allowable Subject Matter
Claims 6, 7, 15, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim.
Claim 19 is allowable upon an extensive search of various database and internet web sites.
Claims 20-22 are considered allowable by virtue of their dependence on their allowable base claim.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 2019/0102179 teaches using a spatial array of processing elements (e.g., a CSA) target the direct execution of a dataflow graph (or graphs) to yield a computationally dense yet energy-efficient spatial microarchitecture which far exceeds conventional roadmap architectures; and
US 2016/0019286 teaches receiving a specification of the dataflow graphs 117 from the data storage system 116 and resolves parameters for the dataflow graphs 117 to prepare the dataflow graph(s) 117 for execution by the execution module 112. The execution module 112 receives the prepared dataflow graphs 117 from the parameter resolution module 106 and uses them to process data from a data source 102 and generate output data 114.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAXIN WU whose telephone number is (571)270-7721.  The examiner can normally be reached on M-F (7 am - 11:30 am; 1:30- 5 pm) and 7:30 am to 3:30 pm every other Friday.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Zhen can be reached at (571) 272-3708.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DAXIN WU/
Primary Examiner, Art Unit 2191