DETAILED ACTION
Claims 1-31 are pending.
The office acknowledges the following papers:
Claims and remarks filed on 5/5/2021,
IDS filed on 7/15/2021.

	Priority
No claim for priority has been made in this application.

Drawings
The Examiner contends that the drawings submitted on 4/22/2020 are acceptable for examination proceedings. 

Specification
The disclosure is objected to because of the following informalities:
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. The Applicant’s cooperation is requested in correcting any errors of which the Applicant may become aware.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 4, 6, 10, and 23-31 are rejected under 35 U.S.C. 102(a)(1 & 2) as being anticipated by Goel et al. (U.S. 2016/0055667).
As per claim 1:
Goel disclosed a system comprising: 
an array of processing nodes (Goel: Figure 2 elements 30, 34, and 38, paragraphs 72, 76, and 78), each processing node comprising; 
a vector-scalar processor (VSP) configured to perform vector processing and configured to perform scalar processing, the VSP comprising onboard memory (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96); 
a Random Access Memory (RAM) unit coupled to the VSP (Goel: Figure 3 element 50, paragraph 98); 
a storage unit coupled to one of the VSP or the RAM unit (Goel: Figure 3 element 48, paragraphs 96-97); and 

As per claim 4:
Goel disclosed the system of claim 1, wherein the array of processing nodes is embedded on a graphics card configured to couple to a host system (Goel: Figures 2-3 and 8 elements 30 and 38-40, paragraphs 72, 76, 78, and 170).
As per claim 6:
Goel disclosed the system of claim 1, wherein the VSP is configured to buffer data by loading the data into the RAM unit (Goel: Figure 3 element 50, paragraphs 90 and 98)(The local memory buffers data used for GPU processing.) and to store the buffered data in the storage unit (Goel: Figure 3 element 48, paragraphs 96-97 and 100)(The registers buffer data used for GPU processing.).
As per claim 10:
Goel disclosed the system of claim 1, wherein the storage unit is configured to provide access to a user while bypassing the host operating system (Goel: Figure 3 element 48, paragraphs 96-97)(Register access doesn’t require OS intervention.).
As per claim 23:
Goel disclosed a method comprising: 
receiving data from a host system (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs from the 
buffering the received data into memory (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory.);
performing, by a plurality of Graphics Processor Unit (GPU) nodes, at least one operation on the buffered data to provide processed data (Goel: Figure 3 elements 46A-H, paragraphs 92-96)(The processing elements of the shader unit (i.e. GPU node) execute the operations of the compiled shader program.); and 
storing the processed data (Goel: Figure 3 elements 48-50, paragraph 96-98)(Execution results are stored in the registers and local memory.).
As per claim 24:
Goel disclosed the method of claim 23, wherein the data processed by each GPU node is stored in a Random Access Memory (RAM) unit or a storage unit (Goel: Figure 3 elements 48-50, paragraph 96-98)(Execution results are stored in the registers and local memory (i.e. RAM).).
As per claim 25:
Goel disclosed the method of claim 23, further comprising receiving, from the host system, instructions specifying the at least one operation to perform (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs from the host CPU for execution.).
As per claim 26:
Goel disclosed the method of claim 23, wherein the host system is coupled to a distributed Graphics Processor Unit (GPU) drive that includes the GPU nodes (Goel: 
As per claim 27:
Goel disclosed the method of claim 26, wherein the distributed GPU drive identifies at least one processing node to store the received data (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory. Storing data within the shader unit identifies a shader unit (i.e. processing node) to store the program data.).
As per claim 28:
Goel disclosed the method of claim 26, wherein the distributed GPU drive receives a message from the host system to process data stored in the distributed GPU drive  (Goel: Figures 8 elements 30 and 116, paragraphs 23, 41, 72, 162-163)(The GPU receives compiled shader programs (i.e. message) from the host CPU for execution.).
As per claim 29:
Goel disclosed the method of claim 26, wherein the received data is stored as data blocks within one or more memory components of the distributed GPU drive (Goel: Figures 3 and 8 elements 30, 40, 44, and 50, paragraphs 90-91, 96, and 163)(The compiled shader program data is stored in the instruction store, registers, local memory, and external memory. The set of data stored in each reads upon a data block.).
As per claim 30:
Goel disclosed the method of claim 23, wherein the data is stored as each of the GPU nodes processes the buffered data (Goel: Figures 3 and 8 elements 30, 40, 44, 46A-H, and 50, paragraphs 90-96 and 163)(The compiled shader program data is stored 
As per claim 31:
The additional limitation(s) of claim 31 basically recite the additional limitation(s) of claim 1. Therefore, claim 31 is rejected for the same reason(s) as claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667).
As per claim 15:
Goel disclosed the system of claim 1, wherein the RAM unit and the storage unit are integrated into a single unit (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98)(It would have been obvious to one of ordinary skill in the art to implement a combined local memory storing data and instructions to reduce memory costs. In addition, according to “In re Larson” (144 USPQ 347 (CCPA 1965)), making elements integral doesn’t give patentability over prior art.).

Claims 2-3, 5, 7-8, 11, 13-14, 16-18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Official Notice.
As per claim 2:
Goel disclosed the system of claim 1, further comprising coherent fabric coupled to each processing node, the coherent fabric being configured to exchange data between each processing node to synchronize the execution of the operation (Goel: Figures 2-3 elements 38-40, paragraphs 72 and 89)(The broadest reasonable interpretation of coherent fiber is based on specification paragraph 26 describing a coherent fabric to provide for data exchange between GPU nodes. Official notice is given that processor arrays include bus connections between processor nodes for the advantage of data communication and synchronization. Thus, it would have been obvious to one of ordinary skill in the art to implement bus connections between the shader units of the GPU for data communication and synchronized execution.).
As per claim 3:
Goel disclosed the system of claim 2, wherein the coherent fabric is configured to couple to a host processor (Goel: Figure 8 elements 30, 104, 114, paragraph 159).
As per claim 5:
Goel disclosed the system of claim 1, wherein the (RAM) unit comprises Dynamic RAM (DRAM) (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98), and wherein the storage unit comprises a NAND- or NOR- based Flash memory or 3D Cross Point memory (Goel: Figure 3 elements 44 and 50, paragraphs 91 and 98)(Official notice is given that memory can be implemented as flash memory for the advantage of increased access speeds. Thus, it would have been obvious to one of ordinary skill in the art to implement either the instruction store or local memory as a flash memory.).
As per claim 7:

As per claim 8:
Goel disclosed the system of claim 1, wherein the array of processing nodes is configured to perform an in-drive database operation by accessing the data block using the index data (Goel: Figure 2 elements 30, 34, and 38, paragraphs 72, 76, and 78)(Official notice is given that GPUs can perform database operations using index data for the advantage of increased parallel performance of such operations. Thus, it would have been obvious to one of ordinary skill in the art to perform such database operations.).
As per claim 11:
Goel disclosed the system of claim 1, wherein the onboard memory is configured to buffer a data block received from the storage unit (Goel: Figure 3 elements 48 and 50, paragraph 96-98 and 100)(Alternatively, the local memory reads upon the storage unit and the registers read upon the onboard memory. Official notice is given that register data can be loaded to registers from local memory for the advantage of quicker data accesses. Thus, it would have been obvious to one of ordinary skill in the art to implement loading input data to the registers from the local memory.), the VSP configured to perform byte-addressed operations on the data block (Goel: Figure 3 
As per claim 13:
Goel disclosed the system of claim 1, wherein the VSP is configured to implement at least a portion of a neural network (Goel: Figures 2-3 elements 30, 40, and 46A-H, paragraphs 72, 89, and 95)(Official notice is given that GPUs can be configured to perform neural network operations for the advantage of increased performance of such operations. Thus, it would have been obvious to one of ordinary skill in the art to implement execution of neural network operations on the GPU of Goel.).
As per claim 14:
Goel disclosed the system of claim 1, wherein, the RAM unit is coupled to the VSP via a first bus (Goel: Figure 3 elements 46A-H and 50, paragraphs 95 and 98)(Official notice is given that processor elements and memory are connected together via busses for the advantage of transmitting data between the elements. Thus, it would have been obvious to one of ordinary skill in the art to implement a first bus connecting the PEs and the local memory.), and wherein the storage unit is coupled to one of the VSP or the RAM unit via a second bus (Goel: Figure 3 elements 46A-H and 44, paragraphs 91 and 95)(Official notice is given that processor elements and memory are connected together via busses for the advantage of transmitting data between the elements. Thus, it would have been obvious to one of ordinary skill in the art to implement a second bus 
As per claim 16:
Goel disclosed a system comprising: 
coherent fiber (Goel: Figures 2-3 elements 38-40, paragraphs 72 and 89)(The broadest reasonable interpretation of coherent fiber is based on specification paragraph 26 describing a coherent fabric to provide for data exchange between GPU nodes. Official notice is given that processor arrays include bus connections between processor nodes for the advantage of data communication. Thus, it would have been obvious to one of ordinary skill in the art to implement bus connections between the shader units of the GPU.) coupled to a plurality of Vector-Scalar Processors (VSPs), each VSP comprising onboard cache  (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The shader units (i.e. VSPs) include an instruction store cache.); and 
each VSP being configured to: 
perform vector processing and configured to perform scalar processing (Goel: Figures 2-3 elements 38-40, 44, and 46A-H, paragraphs 78, 89, and 91-96)(The PEs of a shader unit perform vector and scalar processing.);
couple to a respective Dynamic Random Access Memory (DRAM) unit (Goel: Figure 3 element 50, paragraph 98);
couple to a respective storage unit (Goel: Figure 3 element 48, paragraphs 96-97); 
transfer data from the DRAM unit to the storage unit (Goel: Figure 3 elements 48 and 50, paragraph 96-98 and 100)(The control unit loads input 
wherein the plurality of VSPs implement a file system (Goel: Figures 2-3 elements 30, 40, and 108, paragraphs 72, 89, and 164-165)(Official notice is given that memory elements can be used to store a file system for the advantage of storing different types of data. Thus, it would have been obvious to one of ordinary skill in the art to implement a file system within the processing system of Goel.).
As per claim 17:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 4. Therefore, claim 18 is rejected for the same reason(s) as claim 4.
As per claim 18:
The additional limitation(s) of claim 18 basically recite the additional limitation(s) of claim 7. Therefore, claim 18 is rejected for the same reason(s) as claim 7.
As per claim 20:
The additional limitation(s) of claim 20 basically recite the additional limitation(s) of claim 11. Therefore, claim 20 is rejected for the same reason(s) as claim 11.
As per claim 21:
Goel disclosed the system of claim 16, further comprising a driver configured to control the plurality of VSPs to perform a file system operation (Goel: Figure 8 elements 116 and 30, paragraphs 23, 92, 162-163, 169, and 178)(The GPU driver (i.e. driver) receives compiled shader programs and controls loading the GPU to execute the 
As per claim 22:
Goel disclosed the system of claim 16, further comprising a Solid-State Drive Unit (SSD) configured to couple to a host system (Goel: Figure 8 element 108, paragraphs 164-165)(Official notice is given that external memories can be implemented as solid-state devices for the advantage of increased memory access speeds. Thus, it would have been obvious to one of ordinary skill in the art to implement the external memory as a SSD.).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Ballow et al. (U.S. 2018/0315160).
As per claim 9:
Goel disclosed the system of claim 1.
Goel failed to teach wherein the operation comprises at least one of a memory allocation (malloc), a memory mapping (mmap) function, or a memory free function, wherein the operation originates at a host operating system.
However, Ballow combined with Goel disclosed wherein the operation comprises at least one of a memory allocation (malloc), a memory mapping (mmap) function, or a memory free function, wherein the operation originates at a host operating system (Ballow: Figure 1 elements 140 and 150, paragraph 15)(Goel: Figure 8 elements 30 and 108, paragraphs 160 and 165-166)(Ballow disclosed a GPU performing memory allocation operations in the main memory. The combination allows for the GPU of Goel to 
The advantage of performing memory allocation operations is that software applications can have dedicated memory spaces for application processing. Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the memory allocation operations of Ballow within Goel for the above advantage.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), further in view of Juffa et al. (U.S. 8,860,741).
As per claim 12:
Goel disclosed the system of claim 1.
Goel failed to teach wherein the VSP is configured to execute at least partly an operating system that supports a user application.
However, Juffa combined with Goel disclosed wherein the VSP is configured to execute at least partly an operating system that supports a user application (Juffa: Figure 1 element 122, column 2 lines 22-51)(Goel: Figure 2 element 30, paragraph 72)(Juffa disclosed a GPU that is configured to execute an operating system. The combination allows for the GPU of Goel to execute an operating system.).
The advantage of implementing operating system functions within the GPU is that the GPU can access shared memory space in the same way a generic CPU would (Juffa: Column 2 lines 22-37). Thus, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date to implement the operating system method of .

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Goel et al. (U.S. 2016/0055667), in view of Official Notice, further in view of Ballow et al. (U.S. 2018/0315160).
As per claim 19:
The additional limitation(s) of claim 19 basically recite the additional limitation(s) of claim 9. Therefore, claim 19 is rejected for the same reason(s) as claim 9.

	Conclusion
The following is text cited from 37 CFR 1.111(c): In amending in reply to a rejection of claims in an application or patent under reexamination, the applicant or patent owner must clearly point out the patentable novelty which he or she thinks the claims present in view of the state of the art disclosed by the references cited or the objections made. The applicant or patent owner must also show how the amendments avoid such references or objections.
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.  
Chen et al. (U.S. 9,799,089), taught graphics processing with a shader core.
Ray et al. (U.S. 2021/0303481), taught a parallel processor with a memory interface.
Hassaan et al. (U.S. 2021/0294646), taught a GPU with compute units in a shader.

Jin (U.S. 2019/0034093), taught GPU compute units with scalar and SIMD units.
Gruber et al. (U.S. 2017/0243320), taught a GPU with scalar and vector processing.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACOB A. PETRANEK whose telephone number is (571)272-5988.  The examiner can normally be reached on M-F 8:00-4:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached on (571) 272-4169.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JACOB PETRANEK/Primary Examiner, Art Unit 2183