Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2. 	This Office Action is taken in response to Applicants’ Amendments and Remarks filed on 9/10/2021 regarding application 16/433,698 filed on 6/6/2019.  
 	Claims 1-16, and 18-27 are pending for consideration.

3.				Response to Amendments and Remarks 
	Applicants’ amendments and remarks have been fully and carefully considered, with the Examiner’s response set forth below.
(1) Applicant further amends claim 1 to recite “at least two of the operations are performed in a same one of the plurality of computing devices in parallel, wherein each of the at least two operations comprises an operation in which at least some of the data is ordered, reordered, removed, or discarded,” and points to paragraphs [0016] and [0048] of the Specification and the current application as the support of the limitation. 
However, both paragraphs [0016] and [0048] merely recite “… because extended memory operations can be performed in parallel within a same computing tile or across two or more of the computing tiles that are in communication with each other …” Thus, the cited passages merely recites the portion “at least two of the operations are performed in a same one of the plurality of computing devices in parallel at least two of the operations are performed in a same one of the plurality of computing devices in 
Thus, the two portions are not recited in a manner that explicitly associates the at least two operations performed in parallel must be one of an operation in which at least some of the data is ordered, reordered, removed, or discarded. Rather, it relies on the general descriptions that the operations performed in the system include these operation.
Therefore, as far as prior art is concerned, so long as a prior art teaches performing operations in parallel and also refers to all possible operations that may be perform in the system include the cited operations, then it would be considered as reading on the amended limitation.
Similarly, none of the paragraph in the Specification of the current application explicitly recite the limitation “the at least two operations performed in parallel are different type of operations.” Again, it has to rely on the general descriptions that a number of different operations are included in the system to reach that conclusion. Therefore, and prior art that teaches performing two operations in parallel and 
Applicant is advised that, if Applicant disagrees with this interpretation of the newly amended limitations, then 112 issues may be raised by the Examiner. 
	(2) In response to the amendments and remarks, an updated claim analysis has been made. Refer to the corresponding sections of the following Office Action for details.

4.					Examiner’s Note
(1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution.  MPEP 714.02 recites: “Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.”  Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R.  1.131(b), (c), (d), and (h) and therefore held not fully responsive.  Generic statements such as “Applicants believe no new matter has been introduced” may be deemed insufficient.
(2) Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


5.	Claim(s) 1-6 are rejected under 35 U.S.C. 103 as being unpatentable over Sinclair (US Patent Application Publication 2011/0138100), and in view of Fleming, Jr. et al. (US Patent Application Publication 2020/0004538, hereinafter Fleming).
	As claim 1, Sinclair teaches An apparatus [as shown in figure 1; Fleming also teaches this limitation -- as shown in figures 1 and 105], comprising: a plurality of computing devices coupled to one another [the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” … The memory system includes a plurality of subarrays, each associated with a separate subarray controller, and a front end controller adapted to select and initiate concurrent operations in the subarrays (abstract); … The memory system also includes plurality of subarray controllers, each of the plurality of subarray controllers configured to control data read or write operations in a respective one of the plurality of subarrays independent of read or write operations in any other of the plurality of subarrays. A front end controller in the memory system is in communication with the plurality of subarray controllers, where the front end controller is adapted to select at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for data received from a host, select at least one other subarray of the plurality of subarrays in which to execute a second operation, and initiate execution of the host write operation and the second operation substantially concurrently in the at least one subarray and the at least one other subarray (¶ 0005); The memory system of claim 10, wherein the plurality of subarray controllers comprises a plurality of processors, each processor associated with a different subarray and each processor positioned on a single die separate from any die comprising the subarrays (claim 23);
Fleming also teaches this limitation – a plurality of Processing Elements (P.E.), figure 1] and that each comprise: a processing unit configured to perform operations on a block of data in response to receipt of the block of data [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051); A method of operating a memory system having an array of the data comprises a plurality of host logical block addresses (LBAs); selecting at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for the received data, wherein each of the plurality of subarrays is associated with a unique, fixed region of host LBA addresses and each of the plurality of subarrays is associated with a separate subarray controller; selecting at least one other subarray of the plurality of subarrays in which to execute a second operation on data already residing in the other subarray; and executing the host write operation and the second operation substantially concurrently in the at least one subarray and the at least one other subarray (claim 1);
Fleming also teaches this limitation – While embodiments of the disclosure will be described in which the vector friendly instruction format supports the following: a 64 byte vector operand length (or size) with 32 bit (4 byte) or 64 bit (8 byte) data element widths (or sizes) (and thus, a 64 byte vector consists of either 16 doubleword-size elements or alternatively, 8 quadword-size elements); a 64 byte vector operand length (or size) with 16 bit (2 byte) or 8 bit (1 byte) data element widths (or sizes); a 32 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); and a 16 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); alternative embodiments may support more, less and/or different vector operand sizes (e.g., 256 byte vector operands) with more, less, or different data element widths (e.g., 128 bit (16 byte) data element widths) (¶ 0634)], wherein: at least one of the operations specifies a single address and involves accessing an address space that corresponds to a size of a memory device couplable to the plurality of computing devices [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051); A method of operating a memory system having an array of non-volatile memory cells, the method comprising the memory system: receiving data at a front end of the memory system from a host, wherein the data comprises a plurality of host logical block addresses (LBAs); selecting at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for the received data, wherein each of the plurality of subarrays is associated with a unique, fixed region of host LBA addresses and each of the plurality of subarrays is associated with a separate subarray controller; selecting at least one other subarray of the plurality of subarrays in which to execute a second operation on data already residing in the other subarray; and executing the host write operation and the second operation substantially concurrently in the at least one subarray and the at least one other subarray (claim 1)]; and at least two of the operations are performed in a same one of the plurality of computing devices in parallel [A method and system for permitting host write operations in one part of a flash memory concurrently with another operation in a second part of the flash memory is disclosed. The method includes receiving data at a front end of a memory system, selecting at least one of a plurality of subarrays in the memory system for executing a host write operation, and selecting at least one other subarray in which to execute a second operation. The write operation and second operation are then executed substantially concurrently … (abstract); … The number of slices 706 may be set by the number of memory subarrays within a bank that should perform background operations concurrently with host data write operations to the bank (¶ 0040);
Fleming also teaches this limitation – Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178)], wherein each of the at least two operations comprises an operation in which at least some of the data is ordered, reordered, removed, or discarded [this limitation is taught y Fleming -- Certain embodiments herein provide for performance increases parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ 0559); The way the microarchitecture may perform this reordering is discussed with reference to FIGS. 111A-111B and 112A-112G … (¶ 0581)]; and a memory array configured as a cache for the processing unit [flash memory, figure 1, 116; Fleming also teaches this limitation -- memory, figure 2, 202]; wherein: each of the plurality of computing devices is separately addressable across a contiguous address space [as shown in figures 7 and 8, where each of the controllers/computing devices (figure , 802) is assigned a separate address space associated with its memory bank (figure 7, 704; figure 8, 804); … One arrangement of physical subarrays 702 in a multi-bank memory 700 is illustrated in FIG. 7. Each bank 704 may have one or more subarrays 702 where a sequence of contiguous addresses in host LBA address space is high to maximize parallelism of writing sequential data to physical subarrays; second, that the length of the sequence of contiguous addresses in individual LBA subarrays mapped to a sequence of contiguous addresses in host LBA address space is high to maximize the writing of data with sequential addresses in physical subarrays … (¶ 0039-0040)]; and the plurality of computing devices are resident on a controller [the corresponding “first controller” is the “system controller,” figure 1, 118; the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” … The memory system includes a plurality of subarrays, each associated with a separate subarray controller, and a front end controller adapted to select and initiate concurrent operations in the subarrays (abstract)]; a first communication subsystem within the apparatus and coupled to the plurality of computing devices and to the controller, wherein the first communication subsystem is configured to request the block of data [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a host write command … (¶ 0051); A method of operating a memory system having an array of non-volatile memory cells, the method comprising the memory system: receiving data at a front end of the memory system from a host, wherein the data comprises a plurality of host logical block addresses (LBAs); selecting at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for the received data, wherein each of the plurality of subarrays is associated with a unique, fixed region of host LBA addresses and each of the plurality of subarrays is associated with a separate subarray controller; selecting at least one other subarray of the plurality of subarrays in which to execute a second operation on data already residing in the other subarray; and executing the host write operation and the second operation substantially concurrently in the at least one subarray and the at least one other subarray (claim 1);
Fleming also teaches this limitation – communication circuitry, figure 1, the upper one] [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051);
Fleming also teaches this limitation – communication circuitry, figure 1, the upper one]; and a second communication subsystem within the apparatus and coupled to the plurality of computing devices and to the controller, wherein the second communication subsystem is configured to transfer, within the apparatus, the block of data from the controller to at least one of the plurality of computing devices [A method and system for permitting host write operations in one part of a flash memory concurrently with another operation in a second part of the flash memory is disclosed. The method includes receiving data at a front end of a memory system, selecting at least one of a plurality of subarrays in the memory system for executing a host write operation, and selecting at least one other subarray in which to execute a second operation. The write operation and second operation are then executed substantially concurrently … (abstract); as shown in figures 1 and 8; When writing data to a conventional flash data memory system, a host typically assigns unique logical addresses to sectors, clusters or other units of data within a continuous virtual address space of the memory system. The host writes data to, and reads data from, addresses within the logical address space of the memory system. The memory system then commonly maps data between the logical address space and the physical blocks or metablocks of the memory, where data is stored in fixed logical groups corresponding to ranges in the logical address space. Generally, each fixed logical group is stored in a separate physical block of the memory system. The memory system keeps track of how the logical address space is mapped into the physical memory but the host is unaware of this. The host keeps track of the addresses of its data files within the logical address space but the memory system operates without knowledge of this mapping (¶ 0002);
Data are transferred into and out of the planes 310 and 312 through respective data input/output circuits 334 and 336 that are connected with the data portion 304 of the 
Fleming also teaches this limitation – communication circuitry, figure 1, the lower one].
Regarding claim 1, Sinclair does not teach each of the at least two operations comprises an operation in which at least some of the data is ordered, reordered, removed, or discarded.
However, an operation in which at least some of the data is ordered, reordered, removed, or discarded is well known and commonly used in the art.
For example, Fleming specifically teaches an operation in which at least some of the data is ordered, reordered, removed, or discarded [Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ 0559); The way the microarchitecture may perform this reordering is discussed with reference to FIGS. 111A-111B and 112A-112G … (¶ 0581)].
Therefore, it would have been obvious for one of ordinary skills in the art at the time prior to Applicant’s invention to have an operation in which at least some of the data is ordered, reordered, removed, or discarded, as demonstrated by Fleming, and to incorporate it into the existing scheme disclosed by Sinclair, in order to also support these operations.
As claim 2, Sinclair in view of Fleming teaches The apparatus of claim 1, further comprising an additional controller, wherein the computing tiles, the first communication subsystem, and the second communication subsystem are coupled with the additional controller [Sinclair -- as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” Fleming – as shown in figure 1].
	As claim 3, Sinclair in view of Fleming teaches The apparatus of claim 1, further comprising the controller coupled to the first communication subsystem and the second communication subsystem and comprising circuitry configured to send the block of data to the first communication subsystem [Sinclair -- as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” Fleming – as shown in figure 1; While embodiments of the disclosure will be described in which the vector friendly instruction format supports the following: a 64 byte vector operand length (or size) with 32 bit (4 byte) or 64 bit (8 byte) data element widths (or sizes) (and thus, a 64 byte vector consists of either 16 doubleword-size elements or alternatively, 8 quadword-size elements); a 64 byte vector operand length (or size) with 16 bit (2 byte) or 8 bit (1 byte) data element widths (or sizes); a 32 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); and a 16 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); alternative embodiments may support more, less and/or different vector operand sizes (e.g., 256 byte vector operands) with more, less, or different data element widths (e.g., 128 bit (16 byte) data element widths) (¶ 0634)].
As claim 4, Sinclair in view of Fleming teaches The apparatus of claim 1, further comprising an additional controller configured to transfer commands associated with the block of data from a host to the first communication subsystem and the second communication subsystem [Sinclair -- as shown in executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” Fleming – as shown in figures 1 and 2; While embodiments of the disclosure will be described in which the vector friendly instruction format supports the following: a 64 byte vector operand length (or size) with 32 bit (4 byte) or 64 bit (8 byte) data element widths (or sizes) (and thus, a 64 byte vector consists of either 16 doubleword-size elements or alternatively, 8 quadword-size elements); a 64 byte vector operand length (or size) with 16 bit (2 byte) or 8 bit (1 byte) data element widths (or sizes); a 32 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); and a 16 byte vector operand length (or size) with 32 bit (4 byte), 64 bit (8 byte), 16 bit (2 byte), or 8 bit (1 byte) data element widths (or sizes); alternative embodiments may support more, less and/or different vector operand sizes (e.g., 256 byte vector operands) with more, less, or different data element widths (e.g., 128 bit (16 byte) data element widths) (¶ 0634)].
As claim 5, Sinclair in view of Fleming teaches The apparatus of claim 4, further comprising logic coupled to the additional controller and configured to perform one or more additional operations on the block of data prior to an operation performed by one of the computing devices [Sinclair -- as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;”
Fleming -- Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ 0559); The way the microarchitecture may perform this reordering is discussed with reference to FIGS. 111A-111B and 112A-112G … (¶ 0581)].
The apparatus of claim 4, wherein at least one computing device of the plurality of computing devices comprises the additional controller [Sinclair -- as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” Fleming -- as shown in figures 1 and 2].

6.	Claims 7-9, and 18-23 are rejected under 35 U.S.C. 103 as being unpatentable over Sinclair in view of Fleming, and further in view of Surti et al. (US Patent Application Publication 2018/0286105, hereinafter Surti).
	Regarding claim 7, Sinclair in view of Fleming does not teach the communication subsystem comprises a network on a chip (NoC) or a crossbar (XBAR), or both.
	However, network on a chip (NoC) and crossbar (XBAR) are well known and widely used in the art.
	For example, Surti specifically teaches network on a chip (NoC) [… Within the parallel processing unit 202, the I/O unit 204 connects with a host interface 206 and a memory crossbar 216, where the host interface 206 receives commands directed to performing processing operations and the memory crossbar 216 receives commands directed to performing memory operations (¶ 0053); Embodiments are applicable for use with all types of semiconductor integrated circuit ("IC") chips. Examples of these IC network chips, systems on chip (SoCs), SSD /NAND controller ASICs, and the like … (¶ 0462)].
	Therefore, it would have been obvious for one of ordinary skills in the art at the time prior to Applicant’s invention to network on a chip (NoC), as demonstrated by Surti, and to incorporate it into the existing scheme disclosed by Sinclair in view of Fleming, in order to take advantages of the benefit of high degree of integration offered by the network on a chip (NoC), and the benefit of flexibility connectivity offered by the crossbar.
	As to claim 8, Sinclair in view of Fleming & Surti teaches The apparatus of claim 1, wherein the processing unit of each computing device is configured with a reduced instruction set architecture [Surti -- … In some embodiments, instruction set 1609 may facilitate Complex Instruction Set Computing (CISC), Reduced Instruction Set Computing (RISC), or computing via a Very Long Instruction Word (VLIW) … (¶ 0277)].
	As to claim 9, Sinclair in view of Fleming & Surti teaches The apparatus of claim 1, wherein each of the at least two operations are different types of operation [Fleming -- Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ 0559); The way the microarchitecture may perform this reordering is discussed with reference to FIGS. 111A-111B and 112A-112G … (¶ 0581);
Surti -- … The de-noise logic reduces or removes data noise from video and image data … (¶ 0300); … Arithmetic operations on the texture data and the input geometry data compute pixel color data for each geometric fragment, or discards one or more pixels from further processing (¶ 0316)].
As claim 18, Sinclair teaches A system [as shown in figure 1; Fleming also teaches this limitation -- as shown in figures 1 and 105], comprising:
 a host [host system, figure 1, 100;	 Fleming also teaches this limitation – core. figure 2, 200; … First, a hardware entity, the local extraction controller (LEC) is utilized, for example, as in FIGS. 97-99. A LEC may accept commands from a host (for example, a processor core), e.g., extracting a stream of data from the spatial array, and writing this data back to virtual memory for inspection by the host … (¶ 0524)]; 
a memory device [flash memory, figure 1, 116; Fleming also teaches this limitation -- memory, figure 2, 202]; and 
a first controller coupled to the host and the memory device [the corresponding “first controller” is the “system controller,” figure 1, 118; The memory system 102 of FIG. 1 may include non-volatile memory, such as a multi-bank flash memory 116, and a system controller 118 that both interfaces with the host 100 to which the memory system 102 is connected for passing data back and forth and controls the memory 116 … (¶ 0027); Fleming also teaches this limitation -- accelerator, figure 2], wherein the first controller comprises: a first communication subsystem configured to send and receive, within the first controller, instructions to be executed [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051);
Fleming also teaches this limitation – communication circuitry, figure 1, the upper one]; a second communication subsystem configured to transfer, within the first controller, data [A method and system for permitting host write operations in one part of a flash memory concurrently with another operation in a second part of the flash memory is disclosed. The method includes receiving data at a front end of a memory system, selecting at least one of a plurality of subarrays in the memory system for executing a host write operation, and selecting at least one other subarray in which to execute a second operation. The write operation and second operation are then executed substantially concurrently … (abstract); Data are transferred into and out of the planes 310 and 312 through respective data input/output circuits 334 and 336 that are connected with the data portion 304 of the system bus 302. The circuits 334 and 336 provide for both programming data into the memory cells and for reading data from the memory cells of their respective planes, through lines 338 and 340 connected to the planes through respective column control circuits 314 and 316 (¶ 0030);
Fleming also teaches this limitation – communication circuitry, figure 1, the lower one]; and 
a plurality of computing devices [the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;” … The memory system includes a plurality of subarrays, each associated with a separate subarray controller, and a front end controller adapted to select and initiate concurrent operations in the subarrays (abstract); … The memory system also includes plurality of subarray controllers, each of the plurality of subarray controllers configured to control data read or write operations in a respective one of the plurality of subarrays independent of read or write operations in any other of the plurality of subarrays. A front end controller in the memory system is in communication with the plurality of subarray controllers, where the front end controller is adapted to select at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for data the plurality of subarray controllers comprises a plurality of processors, each processor associated with a different subarray and each processor positioned on a single die separate from any die comprising the subarrays (claim 23);
Fleming also teaches this limitation – a plurality of Processing Elements (P.E.), figure 1], wherein: each of the plurality of computing devices is separately addressable across a contiguous address space [as shown in figures 7 and 8, where each of the controllers/computing devices (figure , 802) is assigned a separate address space associated with its memory bank (figure 7, 704; figure 8, 804); … One arrangement of physical subarrays 702 in a multi-bank memory 700 is illustrated in FIG. 7. Each bank 704 may have one or more subarrays 702 where each subarray 702 is associated with a respective LBA region 708 made up of a fixed address range of host LBA addresses that differs from every other LBA region 708 in the multi-bank memory 700 … first, that the number of LBA subarrays 702 mapped to a sequence of contiguous addresses in host LBA address space is high to maximize parallelism of writing sequential data to physical subarrays; second, that the length of the sequence of contiguous addresses in individual LBA subarrays mapped to a sequence of contiguous addresses in host LBA address space is high to maximize the writing of data with sequential addresses in physical subarrays … (¶ 0039-0040)], and the plurality of computing devices are resident on the first controller [the corresponding “first controller” is the “system a plurality of subarrays, each associated with a separate subarray controller, and a front end controller adapted to select and initiate concurrent operations in the subarrays (abstract)];
wherein the first controller is configured to: send, via the first communication subsystem, an instruction from the host to at least one of the plurality of computing devices to perform operations on a block of data [as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051); A method of operating a memory system having an array of non-volatile memory cells, the method comprising the memory system: receiving data at a front end of the memory system from a host, wherein the data comprises a plurality of host logical block addresses (LBAs); selecting at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for the received data, wherein each of the plurality of subarrays is associated with a unique, fixed region of host LBA addresses and each of the plurality 
Fleming also teaches this limitation – as shown in figures 1 and 2]; 
wherein: each of the operations specify a single address and involves accessing an address space that corresponds to a size of a memory device couplable to the plurality of computing devices [as shown in figures 1 and 8; When writing data to a conventional flash data memory system, a host typically assigns unique logical addresses to sectors, clusters or other units of data within a continuous virtual address space of the memory system. The host writes data to, and reads data from, addresses within the logical address space of the memory system. The memory system then commonly maps data between the logical address space and the physical blocks or metablocks of the memory, where data is stored in fixed logical groups corresponding to ranges in the logical address space. Generally, each fixed logical group is stored in a separate physical block of the memory system. The memory system keeps track of how the logical address space is mapped into the physical memory but the host is unaware of this. The host keeps track of the addresses of its data files within the logical address space but the memory system operates without knowledge of this mapping (¶ 0002); Data are transferred into and out of the planes 310 and 312 through respective data input/output circuits 334 and 336 that are connected with the data portion 304 of the system bus 302. The circuits 334 and 336 provide for both programming data into the executes commands from the controller 118 to perform such functions … (¶ 0030-0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051); as shown in figures 7 and 8, where each of the subarray controllers (figure 8, 802) is assigned a separate address space associated with its memory bank (figure 7, 704; figure 8, 804); … One arrangement of physical subarrays 702 in a multi-bank memory 700 is illustrated in FIG. 7. Each bank 704 may have one or more subarrays 702 where each subarray 702 is associated with a respective LBA region 708 made up of a fixed address range of host LBA addresses that differs from every other LBA region 708 in the multi-bank memory 700 … first, that the number of LBA subarrays 702 mapped to a sequence of contiguous addresses in host LBA address space is high to maximize parallelism of writing sequential data to physical subarrays; second, that the length of the sequence of contiguous addresses in individual LBA subarrays mapped to a sequence of contiguous addresses in host LBA address space is high to maximize the writing of data with sequential addresses in physical subarrays … (¶ 0039-0040)]; and at least two of the operations performed by at least two of the plurality of computing devices are performed in parallel [A method and system for permitting host write operations in one receiving data at a front end of a memory system, selecting at least one of a plurality of subarrays in the memory system for executing a host write operation, and selecting at least one other subarray in which to execute a second operation. The write operation and second operation are then executed substantially concurrently … (abstract); … The number of slices 706 may be set by the number of memory subarrays within a bank that should perform background operations concurrently with host data write operations to the bank (¶ 0040);
Fleming also teaches this limitation – Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ wherein each of the at least two of the operations comprise an operation to reduce a size of the block of data from a first size to a second size, a gather-scatter operation, or both [This limitation is taught by Surti -- … Some embodiments may provide better motion prediction information to a reprojection or time warp (TW) application so the TW overdraw size may be reduced … (¶ 0168); Turning now to FIGS. 8I to 8N, a graphics system 850 may include a visible frame 851 and an overdraw frame 852. In some embodiments, the size of the overdraw frame 852 may be dynamic and the system may make more efficient use of the overdraw frame 852 by adjusting the size of the overdraw frame 852 relative to the visible frame 851 based on the motion information (e.g. a motion vector 853) … (¶ 0208-0211)];
transfer, via the second communication subsystem, the block of data from the memory device to the least one of the plurality of computing devices [as shown in figures 1 and 8; When writing data to a conventional flash data memory system, a host typically assigns unique logical addresses to sectors, clusters or other units of data within a continuous virtual address space of the memory system. The host writes data to, and reads data from, addresses within the logical address space of the memory system. The memory system then commonly maps data between the logical address space and the physical blocks or metablocks of the memory, where data is stored in fixed logical groups corresponding to ranges in the logical address space. Generally, each fixed logical group is stored in a separate physical block of the memory system. The memory system keeps track of how the logical address space is mapped into the 
Regarding claim 18, Sinclair in view of Fleming not teach each of the at least two of the operations comprise an operation to reduce a size of the block of data from a first size to a second size, a gather-scatter operation, or both.
However, an operation to reduce a size of the block of data from a first size to a second size is well known and commonly used in the art.
For example, Surti specifically teaches an operation to reduce a size of the block of data from a first size to a second size [… Some embodiments may provide better motion prediction information to a reprojection or time warp (TW) application so the TW overdraw size may be reduced … (¶ 0168); Turning now to FIGS. 8I to 8N, a graphics system 850 may include a visible frame 851 and an overdraw frame 852. In some embodiments, the size of the overdraw frame 852 may be dynamic and the system may make more efficient use of the overdraw frame 852 by adjusting the size of the overdraw frame 852 relative to the visible frame 851 based on the motion information (e.g. a motion vector 853) … (¶ 0208-0211)].
Therefore, it would have been obvious for one of ordinary skills in the art at the time prior to Applicant’s invention to have an operation to reduce a size of the block of data from a first size to a second size, as demonstrated by Surti, and to incorporate it into the existing scheme disclosed by Sinclair in view of Fleming, in order to also support these operations.
The system of claim 18, wherein at least one additional computing device of the plurality of computing devices comprises a second controller and the second controller transfers the instruction from a host to the first communication subsystem [Sinclair -- as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); the corresponding “computing devices” are the “subarray controllers” -- figure 1 shows that the system controller (118) includes a controller firmware module (124), and figure 8, further shows it comprises a plurality of “subarray controllers“ (802), which are the corresponding “computing devices;”].
	As claim 20, Sinclair in view of Fleming & Surti teaches The system of claim 19, wherein the second controller is configured to allocate and de-allocate computing resources to the plurality of computing devices to perform the operation on the block of data [Fleming -- The execution engine unit 11750 includes the rename /allocator unit 11752 coupled to a retirement unit 11754 and a set of one or more scheduler unit(s) 11756. The scheduler unit(s) 11756 represents any number of different schedulers, including reservations stations, central instruction window, etc. … (¶ 0704)].
As to claim 21, Sinclair in view of Fleming & Surti teaches The system of claim 18, wherein the first controller is further configured to transfer, via the second communication subsystem, the block of data having the reduced size associated therewith to the memory device [Surti -- … Some embodiments may provide better the TW overdraw size may be reduced … (¶ 0168)].
	As to claim 22, it recites substantially the same limitations as in claim 9, and is rejected for the same reasons set forth in the analysis of claim 9. Refer to "As to claim 9" presented earlier in this Office Action for details.
	As to claim 23, Sinclair in view of Fleming & Surti teaches The apparatus of claim 18, wherein the memory device comprises a NAND memory device or a 3D XPoint memory device, or combinations thereof [Surti -- … By way of example, and not limitation, the processor memories 401-402 and GPU memories 420-423 may be volatile memories such as dynamic random access memories (DRAMs) (including stacked DRAMs), Graphics DDR SDRAM (GDDR) (e.g., GDDR5, GDDR6), or High Bandwidth Memory (HBM) and/or may be non-volatile memories such as 3D XPoint or Nano-Ram … (¶ 0089); … Embodiments are applicable for use with all types of semiconductor integrated circuit ("IC") chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD /NAND controller ASICs, and the like … (¶ 0462)].
As to claim 24, it recites substantially the same limitations as in claims 1, 10, and 18, and is rejected for the same reasons set forth in the analysis of claims 1, 10, and 18. Refer to "As to claim 1;" "As to claim 10;" and "As to claim 18" presented earlier in this Office Action for details.
Further, Sinclair in view of Fleming & Surti teaches wherein: one of the at least two of the operations comprise an operation in which at least some of the data is ordered, reordered, removed, or discarded, a comma-separated value parsing operation, or both [Fleming -- Certain embodiments herein provide for performance increases from parallel execution within a (e.g., dense) spatial array of processing elements (e.g., CSA) where each PE and/or network dataflow endpoint circuit utilized may perform its operations simultaneously, e.g., if input data is available … An embodiment of a "Pick" dataflow operator is to select data (e.g., a token) from a plurality of input channels and provide that data as its (e.g., single) output according to control data. Control data for a Pick may include an input selector value. In one embodiment, the selected input channel is to have its data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation). In one embodiment, additionally, those non-selected input channels are also to have their data (e.g., token) removed (e.g., discarded), for example, to complete the performance of that dataflow operation (or its portion of a dataflow operation) … (¶ 0159-0164); … Although execution is serialized in this example, in principle all dataflow operations may execute in parallel … (¶ 0178); … The completion queue 10520, may therefore, be used to reorder data and operation flow (¶ 0559); The way the microarchitecture may perform this reordering is discussed with reference to FIGS. 111A-111B and 112A-112G … (¶ 0581)].
	Still further, Sinclair in view of Fleming & Surti teaches another of the at least two of the operations comprise an operation to reduce a size of the block of data from a first size to a second size, a gather-scatter operation, or both [Surti -- … Some embodiments may provide better motion prediction information to a reprojection or time warp (TW) application so the TW overdraw size may be reduced … (¶ 0168); the size of the overdraw frame 852 may be dynamic and the system may make more efficient use of the overdraw frame 852 by adjusting the size of the overdraw frame 852 relative to the visible frame 851 based on the motion information (e.g. a motion vector 853) … (¶ 0208-0211)].
	As to claim 25, it recites substantially the same limitations as in claim 3, and is rejected for the same reasons set forth in the analysis of claim 3. Refer to "As to claim 3" presented earlier in this Office Action for details.
	As to claim 26, it recites substantially the same limitations as in claim 14, and is rejected for the same reasons set forth in the analysis of claim 14. Refer to "As to claim 14" presented earlier in this Office Action for details.
	As to claim 27, it recites substantially the same limitations as in claim 20, and is rejected for the same reasons set forth in the analysis of claim 20. Refer to "As to claim 20" presented earlier in this Office Action for details.
7.	Claims 10-16 are rejected under 35 U.S.C. 103 as being unpatentable over Sinclair in view of Fleming, and further in view of Das et al. (US Patent Application Publication 2018/0329644, hereinafter Das).
As to claim 10, it recites substantially the same limitations as in claim 1, and is rejected for the same reasons set forth in the analysis of claim 1. Refer to "As to claim 1" presented earlier in this Office Action for details.
	Regarding claim 10, Sinclair in view of Fleming does not teach wherein each of the at least two of the operations comprise a comma-separated value (CSV) parsing operation.

	For example, Das specifically teaches a comma-separated value (CSV) parsing operation [The ingestion circuitry 118 may further include parser circuitry and validation circuitry 214. The parser may extract the structure of the data streams received from the connectors 208, and relay the structure and data to the ingestion processor 216. The implementation of the parser circuitry varies according to the formats of the data stream that the DPA 114 is setup to handle. As one example, the parser circuitry may include a CSV (comma-separated values) reader. As another example, the parser circuitry may include voice recognition circuitry and text analysis circuitry. Executing the parser circuitry is optional, for instance, when the data stream will be stored as raw unprocessed data (¶ 0029)].
	Therefore, it would have been obvious for one of ordinary skills in the art at the time prior to Applicant’s invention to use a comma-separated value (CSV) parsing operation, as demonstrated by Das, and to incorporate it into the existing scheme disclosed by Sinclair in view of Fleming, in order to also support a comma-separated value (CSV) parsing operation.
	As to claim 11, it recites substantially the same limitations as in claim 19, and is rejected for the same reasons set forth in the analysis of claim 19. Refer to "As to claim 19" presented earlier in this Office Action for details.
	As to claim 12, it recites substantially the same limitations as in claim 21, and is rejected for the same reasons set forth in the analysis of claim 21. Refer to "As to claim 21" presented earlier in this Office Action for details.

	As to claim 14, Sinclair in view of Fleming & Das teaches The apparatus of claim 10, wherein the first controller is configured to perform copy, read, write, and error correction operations for a memory device coupled to the apparatus [Fleming -- FIG. 5 illustrates a program source (e.g., C code) 500 according to embodiments of the disclosure. According to the memory semantics of the C programming language, memory copy (memcpy) should be serialized. However, memcpy may be parallelized with an embodiment of the CSA if arrays A and B are known to be disjoint. FIG. 5 further illustrates the problem of program order. In general, compilers cannot prove that array A is different from array B, e.g., either for the same value of index or different values of index across loop bodies. This is known as pointer or memory aliasing. Since compilers are to generate statically correct code, they are usually forced to serialize memory accesses … (¶ 0180); … The result of the operation may then be written to either an output buffer or to a (e.g., local to the PE) register. Data written to an output buffer may be transported to a downstream PE for further processing. This style of PE may be extremely energy efficient, for example, rather than reading data from a complex, multi-ported register file, a PE reads the data from a register. Similarly, instructions may be stored directly in a register, rather than in a virtualized instruction cache … (¶ 0201); In a control flow graph G, suppose an operation in basic block A defines a virtual register x, and an operation in basic block B that uses x. Then a correct control-to-dataflow transformation can replace x with a latency-insensitive channel only if A and B are control equivalent … One correct algorithm for converting a CFG to dataflow is to have the compiler insert (1) switches to compensate for the mismatch in execution frequency for any values that flow between basic blocks which are not control equivalent, and (2) picks at the beginning of basic blocks to choose correctly from any incoming values to a basic block. Generating the appropriate control signals for these picks and switches may be the key part of dataflow conversion (¶ 0451)].
	As to claim 15, Sinclair in view of Fleming & Das teaches The apparatus of claim 10, wherein the first computing device and the second computing device are configured such that: the first computing device can access, through the first communication subsystem, an address space associated with the second computing device; and the second computing device can access, through the first communication subsystem, an address space associated with the first computing device [Sinclair – as shown in figures 1 and 8; Each memory chip in each bank 120 contains some controlling circuitry that executes commands from the controller 118 to perform such functions … (¶ 0031); … As used herein, a foreground operation refers to any activity that is a direct consequence of a host write command. A foreground operation may include garbage collection if garbage collection is necessary to carry out a pending write command. In contrast, a background operation refers to activities that take place that are not a direct consequence of a host write command … (¶ 0051); A method of operating a memory system having an array of non-volatile memory cells, the method comprising the memory system: receiving data at a front end of the memory system from a host, wherein the data comprises a plurality of host logical block addresses (LBAs); selecting at least one subarray of a plurality of subarrays in the array in which to execute a host write operation for the received data, wherein each of the plurality of subarrays is associated with a unique, fixed region of host LBA addresses and each of the plurality of subarrays is associated with a separate subarray controller; selecting at least one other subarray of the plurality of subarrays in which to execute a second operation on data already residing in the other subarray; and executing the host write operation and the second operation substantially concurrently in the at least one subarray and the at least one other subarray (claim 1); Fleming -- as shown in figures 1 and 2].
	As to claim 16, it recites substantially the same limitations as in claim 8, and is rejected for the same reasons set forth in the analysis of claim 8. Refer to "As to claim 8" presented earlier in this Office Action for details.
	
					Conclusion
8.	Claims 1-16, and 18-27 are rejected as presented above.
9. 	THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHENG JEN TSAI whose telephone number is 571-272-4244.  The examiner can normally be reached on Monday-Friday, 9-6.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Charles Rones can be reached on 571-272-4085. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/SHENG JEN TSAI/Primary Examiner, Art Unit 2136                                                                                                                                                                                                        
September 16, 2020