DETAILED ACTION
Claims 1-31 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 5/9/2022.

	Withdrawn objections and rejections
The specification objection has been withdrawn.

New Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4, 6, 10, and 23-31 are rejected under 35 U.S.C. 102(a)(1 & 2) as being anticipated by Goel et al. (U.S. 2016/0055667).
As per claim 1:
Goel disclosed a system comprising: 
an array of processing nodes (Goel: Figure 2 elements 30, 34, and 38, paragraphs 72, 76, and 78), each processing node comprising; 
a vector-scalar processor (VSP), wherein the VSP can be re-configured to perform vector processing or scalar processing, the VSP comprising onboard memory (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The set of PEs of a given shader unit (i.e. VSP) perform vector or scalar processing from the instruction cache (i.e. onboard memory). The set of PEs can execute SIMD operations with a common program counter. The shader units are programmable in an embodiment.);
a Random Access Memory (RAM) unit coupled to the VSP (Goel: Figure 3 element 50, paragraph 98); 
a storage unit coupled to one of the processing nodes or the RAM unit (Goel: Figure 3 element 48, paragraphs 96-97); and 
a driver comprising code to control the array of processing nodes to perform an operation by distributing an execution of the operation across at least a portion of the array of processing nodes (Goel: Figure 8 elements 116 and 30, paragraphs 23, 92, 162-163, 169, and 178)(The GPU driver (i.e. driver) receives compiled shader programs and controls loading the GPU to execute the compiled shader programs.).
As per claim 4:
Goel disclosed the system of claim 1, wherein the array of processing nodes is embedded on a graphics card configured to couple to a host system (Goel: Figures 2-3 and 8 elements 30 and 38-40, paragraphs 72, 76, 78, and 170).
As per claim 6:
Goel disclosed the system of claim 1, wherein the VSP is configured to buffer data by loading the data into the RAM unit (Goel: Figure 3 element 50, paragraphs 90 and 98)(The local memory buffers data used for GPU processing.) and to store the buffered data in the storage unit (Goel: Figure 3 element 48, paragraphs 96-97 and 100)(The registers buffer data used for GPU processing.).
As per claim 10:
Goel disclosed the system of claim 9, wherein the storage unit is configured to provide access to a user while bypassing the host operating system (Goel: Figure 3 element 48, paragraphs 96-97)(Register access doesn’t require OS intervention.).
As per claim 23:
Goel disclosed a method comprising: 
receiving data from a host system (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs from the host CPU for execution.);  
buffering the received data into memory (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory.);
performing, by a plurality of Graphics Processor Unit (GPU) nodes, at least one operation on the buffered data to provide processed data (Goel: Figure 3 elements 46A-H, paragraphs 92-96)(The shader unit (i.e. GPU node) executes the operations of the compiled shader program.), wherein each of the GPU nodes comprises a vector-scalar processor (VSP) re-configurable to perform vector processing or scalar processing (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The set PEs of a given shader unit (i.e. VSP) perform vector or scalar processing. The set of PEs can execute SIMD operations with a common program counter. The shader units are programmable in an embodiment.); and 
storing the processed data (Goel: Figure 3 elements 48-50, paragraph 96-98)(Execution results are stored in the registers and local memory.).
As per claim 24:
Goel disclosed the method of claim 23, wherein the data processed by each GPU node is stored in a Random Access Memory (RAM) unit or a storage unit (Goel: Figure 3 elements 48-50, paragraph 96-98)(Execution results are stored in the registers and local memory (i.e. RAM).).
As per claim 25:
Goel disclosed the method of claim 23, further comprising receiving, from the host system, instructions specifying the at least one operation to perform (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs from the host CPU for execution.).
As per claim 26:
Goel disclosed the method of claim 23, wherein the host system is coupled to a distributed Graphics Processor Unit (GPU) drive that includes the GPU nodes (Goel: Figure 2-3 and 8 elements 30, 38, 40, and 104, paragraphs 41, 72, 78, 89, and 162-163).
As per claim 27:
Goel disclosed the method of claim 26, wherein the distributed GPU drive identifies at least one processing node to store the received data (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory. Storing data within the shader unit identifies a shader unit (i.e. processing node) to store the program data.).
As per claim 28:
Goel disclosed the method of claim 26, wherein the distributed GPU drive receives a message from the host system to process data stored in the distributed GPU drive  (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs (i.e. message) from the host CPU for execution.).
As per claim 29:
Goel disclosed the method of claim 26, wherein the received data is stored as data blocks within one or more memory components of the distributed GPU drive (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory. The set of data stored in each reads upon a data block.).
As per claim 30:
Goel disclosed the method of claim 23, wherein the data is stored as each of the GPU nodes processes the buffered data (Goel: Figures 3 and 8 elements 30, 40, 44, 46A-H, and 50, paragraphs 90-96 and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory. This data set is stored during execution of the compiled shader program.).
As per claim 31:
The additional limitation(s) of claim 31 basically recite the additional limitation(s) of claim 1. Therefore, claim 31 is rejected for the same reason(s) as claim 1.

New Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667).
As per claim 15:
Goel disclosed the system of claim 1, wherein the RAM unit and the storage unit are integrated into a single unit (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98)(It would have been obvious to one of ordinary skill in the art to implement a combined local memory storing data and instructions to reduce memory costs. In addition, according to “In re Larson” (144 USPQ 347 (CCPA 1965)), making elements integral doesn’t give patentability over prior art.).

Claims 2-3, 5, 7-8, 11, 13-14, 16-18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Official Notice.
As per claim 2:
Goel disclosed the system of claim 1, further comprising coherent fabric coupled to each processing node, the coherent fabric being configured to exchange data between each processing node to synchronize the execution of the operation (Goel: Figures 2-3 elements 38-40, paragraphs 72 and 89)(The broadest reasonable interpretation of coherent fiber is based on specification paragraph 26 describing a coherent fabric to provide for data exchange between GPU nodes. Official notice is given that processor arrays include bus connections between processor nodes for the advantage of data communication and synchronization. Thus, it would have been obvious to one of ordinary skill in the art to implement bus connections between the shader units of the GPU for data communication and synchronized execution.).
As per claim 3:
Goel disclosed the system of claim 2, wherein the coherent fabric is configured to couple to a host processor (Goel: Figure 8 elements 30, 104, 114, paragraph 159).
As per claim 5:
Goel disclosed the system of claim 1, wherein the RAM unit comprises Dynamic RAM (DRAM) (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98), and wherein the storage unit comprises a NAND- or NOR- based Flash memory or 3D Cross Point memory (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98)(Official notice is given that memory can be implemented as flash memory for the advantage of increased access speeds. Thus, it would have been obvious to one of ordinary skill in the art to implement either the instruction store or local memory as a flash memory.).
As per claim 7:
Goel disclosed the system of claim 1, wherein the VSP is configured to load index data of a data block in the RAM unit and store the data block in the storage unit (Goel: Figure 3 elements 48-50, paragraphs 96-98)(The registers can store blocks of data. Official notice is given that local memories can store index data for the advantage of implementing various data structures. Thus, it would have been obvious to one of ordinary skill in the art to implement index storage within the local memory.).
As per claim 8:
Goel disclosed the system of claim 7, wherein the array of processing nodes is configured to perform an in-drive database operation by accessing the data block using the index data (Goel: Figure 2 elements 30, 34, and 38, paragraphs 72, 76, and 78)(Official notice is given that GPUs can perform database operations using index data for the advantage of increased parallel performance of such operations. Thus, it would have been obvious to one of ordinary skill in the art to perform such database operations.).
As per claim 11:
Goel disclosed the system of claim 1, wherein the onboard memory is configured to buffer a data block received from the storage unit (Goel: Figure 3 elements 48 and 50, paragraph 96-98 and 100)(Alternatively, the local memory reads upon the storage unit and the registers read upon the onboard memory. Official notice is given that register data can be loaded to registers from local memory for the advantage of quicker data accesses. Thus, it would have been obvious to one of ordinary skill in the art to implement loading input data to the registers from the local memory.), the VSP configured to perform byte-addressed operations on the data block (Goel: Figure 3 elements 46A-H and 48, paragraphs 93-96)(The PEs can perform SIMD operations. Official notice is given that SIMD data element sizes can be byte-sized for the advantage of processing more data elements. Thus, it would have been obvious to one of ordinary skill in the art to implement SIMD operations using byte data elements stored in the registers.).
As per claim 13:
Goel disclosed the system of claim 1, wherein the VSP is configured to implement at least a portion of a neural network (Goel: Figures 2-3 elements 30, 40, and 46A-H, paragraphs 72, 89, and 95)(Official notice is given that GPUs can be configured to perform neural network operations for the advantage of increased performance of such operations. Thus, it would have been obvious to one of ordinary skill in the art to implement execution of neural network operations on the GPU of Goel.).
As per claim 14:
Goel disclosed the system of claim 1, wherein, the RAM unit is coupled to the VSP via a first bus (Goel: Figure 3 elements 46A-H and 50, paragraphs 95 and 98)(Official notice is given that processor elements and memory are connected together via busses for the advantage of transmitting data between the elements. Thus, it would have been obvious to one of ordinary skill in the art to implement a first bus connecting the PEs and the local memory.), and wherein the storage unit is coupled to one of the VSP or the RAM unit via a second bus (Goel: Figure 3 elements 46A-H and 44, paragraphs 91 and 95)(Official notice is given that processor elements and memory are connected together via busses for the advantage of transmitting data between the elements. Thus, it would have been obvious to one of ordinary skill in the art to implement a second bus connecting the PEs and the instruction store.).
As per claim 16:
Goel disclosed a system comprising: 
coherent fiber (Goel: Figures 2-3 elements 38-40, paragraphs 72 and 89)(The broadest reasonable interpretation of coherent fiber is based on specification paragraph 26 describing a coherent fabric to provide for data exchange between GPU nodes. Official notice is given that processor arrays include bus connections between processor nodes for the advantage of data communication. Thus, it would have been obvious to one of ordinary skill in the art to implement bus connections between the shader units of the GPU.) coupled to a plurality of Vector-Scalar Processors (VSPs), each VSP comprising onboard cache  (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The set of PEs of a given shader unit (i.e. VSPs) include an instruction store cache.); and 
each VSP being configured to: 
perform both vector processing and scalar processing (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The set PEs of a given shader unit (i.e. VSP) perform vector or scalar processing. The set of PEs can execute SIMD operations with a common program counter. The shader units are programmable in an embodiment.); 
couple to a respective Dynamic Random Access Memory (DRAM) unit (Goel: Figure 3 element 50, paragraph 98);
couple to a respective storage unit (Goel: Figure 3 element 48, paragraphs 96-97); and
transfer data from the DRAM unit to the storage unit (Goel: Figure 3 elements 48 and 50, paragraph 96-98 and 100)(The control unit loads input registers with input data. Official notice is given that register data can be loaded to registers from local memory for the advantage of quicker data accesses. Thus, it would have been obvious to one of ordinary skill in the art to implement loading input data to the registers from the local memory.); 
wherein the plurality of VSPs implement a file system (Goel: Figures 2-3 elements 30, 40, and 108, paragraphs 72, 89, and 164-165)(Official notice is given that memory elements can be used to store a file system for the advantage of storing different types of data. Thus, it would have been obvious to one of ordinary skill in the art to implement a file system within the processing system of Goel.).
As per claim 17:
The additional limitation(s) of claim 17 basically recite the additional limitation(s) of claim 4. Therefore, claim 17 is rejected for the same reason(s) as claim 4.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 7. Therefore, claim 18 is rejected for the same reason(s) as claim 7.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 11. Therefore, claim 20 is rejected for the same reason(s) as claim 11.
As per claim 21:
Goel disclosed the system of claim 16, further comprising a driver configured to control the plurality of VSPs to perform a file system operation (Goel: Figure 8 elements 116 and 30, paragraphs 23, 92, 162-163, 169, and 178)(The GPU driver (i.e. driver) receives compiled shader programs and controls loading the GPU to execute the compiled shader programs. In view of the above official notice, loading the compiler shader program into the GPU is a file system operation.).
As per claim 22:
Goel disclosed the system of claim 16, further comprising a Solid-State Drive Unit (SSD) configured to couple to a host system (Goel: Figure 8 element 108, paragraphs 164-165)(Official notice is given that external memories can be implemented as solid-state devices for the advantage of increased memory access speeds. Thus, it would have been obvious to one of ordinary skill in the art to implement the external memory as a SSD.).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Ballow et al. (U.S. 2018/0315160).
As per claim 9:
Goel disclosed the system of claim 1.
Goel failed to teach wherein the operation comprises at least one of a memory allocation (malloc), a memory mapping (mmap) function, or a memory free function, wherein the operation originates at a host operating system.
However, Ballow combined with Goel disclosed wherein the operation comprises at least one of a memory allocation (malloc), a memory mapping (mmap) function, or a memory free function, wherein the operation originates at a host operating system (Ballow: Figure 1 elements 140 and 150, paragraph 15)(Goel: Figure 8 elements 30 and 108, paragraphs 160 and 165-166)(Ballow disclosed a GPU performing memory allocation operations in the main memory. The combination allows for the GPU of Goel to perform such operations. The OS originates execution of the compiled shader program and the memory allocation operation.).
The advantage of performing memory allocation operations is that software applications can have dedicated memory spaces for application processing. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the memory allocation operations of Ballow within Goel for the above advantage.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), further in view of Juffa et al. (U.S. 8,860,741).
As per claim 12:
Goel disclosed the system of claim 1.
Goel failed to teach wherein the VSP is configured to execute at least partly an operating system that supports a user application.
However, Juffa combined with Goel disclosed wherein the VSP is configured to execute at least partly an operating system that supports a user application (Juffa: Figure 1 element 122, column 2 lines 22-51)(Goel: Figure 2 element 30, paragraph 72)(Juffa disclosed a GPU that is configured to execute an operating system. The combination allows for the GPU of Goel to execute an operating system.).
The advantage of implementing operating system functions within the GPU is that the GPU can access shared memory space in the same way a generic CPU would (Juffa: Column 2 lines 22-37). Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the operating system method of Juffa into Goel for the above advantage.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Official Notice, further in view of Ballow et al. (U.S. 2018/0315160).
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 9. Therefore, claim 19 is rejected for the same reason(s) as claim 9.

Response to Arguments
The arguments presented by Applicant in the response, received on 5/9/2022 are not considered persuasive.
Applicant argues regarding claims 1, 16, and 23:
“The claims require that each processing node comprises a VSP. Further, the claims require that a VSP (thus, "each processing node") comprises a device "configured to perform vector processing and configured to perform scalar processing." That is, a given processing node must be capable of performing both vector processing and scalar processing and each such node in an array must be capable of performing both. 
In contrast to this requirement, Goel never describes such a heterogeneous type of node. Applicant directs the Examiner to cited paragraph 95, which describes "scalar ALU" and "vector ALU" implementations of the processing elements in Goel. Applicant acknowledges that Goel discusses vector and scalar processing. However, Goel critically never describes that a given "processing node" can perform both. Instead, Goel states that that a given processing node can be either a scalar ALU or a vector ALU.”  

This argument is not found to be persuasive for the following reason. An individual shader unit reads upon the claimed processing node. The set of PEs and instruction store read upon the claimed VSP. The set of PEs in a given shader can perform scalar or vector processing. In addition, all of the PEs can perform SIMD processing using a common PC. Lastly, the shader units are programmable. Thus, reading upon the newly claimed limitations.
Applicant argues regarding claims 1, 16, and 23:
“Claim 1 further recites that the each VSP comprises an "onboard memory." Claim 16 recites that the VSP comprises an "onboard cache." Goel fails to describe this architecture. As illustrated in FIG. 2 of the specification, and as explicitly recited, the onboard cache or memory is within the VSP itself (comprised within). By contrast, the processing elements cited by the Examiner in Goel include no such onboard memory or cache. Indeed, the Examiner appears to cited to the "local memory" or alternatively the instruction store which are shared among all processing elements. This is not what is claimed and, indeed, the claims explicitly recite a separate shared memory (RAM). As such, Goel fails to disclose or suggest the onboard memory or cache in each processing element as explicitly claimed. As a result, the cited prior art, alone or in combination, fails to disclose, teach, or suggest all elements of the claims, and thus Applicant respectfully requests the withdrawal of the rejections.”  

This argument is not found to be persuasive for the following reason. The applicant is correct that the instruction store is mapped to the claimed onboard memory. The set of PEs and the instruction store within a given shader unit maps to the claimed VSP. Thus, reading upon the claimed limitation.

	Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183