DETAILED ACTION
This office action is in response to an Amendment/Request for Reconsideration-After Non-Final Rejection filed 05/13/2022.
Claims 1, 3-4, 6-8, 11-13, and 15-16 have been amended.  No claims have been cancelled.  Claim 18 is new.  Thus claims 1-18 have been examined.
Acknowledgment is made of applicant’s claim for foreign priority based on an application filed in the Republic of Korea on 01/20/2020.  Examiner notes the priority documents to KR10-2020-0007202 have been received by the USPTO.
The objections and rejections from the prior correspondence that are not restated herein are withdrawn.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Allowable Subject Matter
Claims 3 and 15 are objected to as being dependent upon a rejected base claim claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication for allowable subject matter:
regarding claims 3 and 15, the prior art does not teach ‘the GPU transfers offset information regarding an address of the data memory along with the read request to the CPU, the CPU transfer the read command with the offset information to the data storage device,  and the data storage device stores the read data in the data shared memory at an address corresponding to the offset information’ when combined with the remaining limitations of claim 1.
Examiner is able to find art where a CPU and GPU process read and/or write requests and the GPU performs direct memory access (DMA) of data from a storage device into GPU memory without involvement of the CPU.  The claim limitation requires that the request begins at the GPU, is sent to the CPU (along with an offset limitation), the CPU then sends the request to the storage device along with the offset, where the storage device that performs the read or write without the involvement of the CPU.   Thus, the CPU is setting up the (DMA) request on behalf of the GPU.   Examiner is able to find prior art where the CPU makes a request to a GPU, which in turn sets up a (DMA) request to the storage device to read data into the GPU memory without accessing the CPU.  See US 2017/0147516 A1 by De Arup.   However with De Arup, the GPU is setting up the DMA request on behalf of the CPU where the DMA transfer of the data is executed without the involvement of the CPU.  
Examiner is able to find prior art where the GPU initiates the read command and performs a DMA access with support from the CPU, as described in Raindel et al. in US 2015/0347349 A1.   However,  Raindel does not disclose sending a read command to a CPU that sends the read command to the storage device.
Examiner is able to find art where the GPU initiates a need to perform a read and performs the read without involvement by the CPU as described in Qureshi et. al., US 2021/0117333 A1.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 4-6, 8,  10, 12, and 18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lee US 2019/0244140 A1) .
 
Regarding claim 1, Lee teaches A system comprising: graphics processing unit (GPU) (Lee [Abstract] the solution is directed to a system including a GPU with a GPU memory.) configured to generate a read request (Lee [Abstract] discloses the TPU writes a key value request to an input-output region of the GPU and in response the value corresponding to the key of the key value request is retrieved from the key value device, thus is an example of a read request from the GPU to the storage device that returns the requested value) and including a data memory; (Lee [Abstract] discloses the GPU contains memory) a central processing unit (CPU) configured to generate a read command corresponding to the read request; (Lee Fig. 2 and supporting para [0041] that discloses the key value access request originates for the host application on the CPU) and a data storage device including a data storage memory, (Lee Fig. 2 On Board SSD 205 and supporting para [0035]) wherein the data storage device transmits entire read data output from the data storage device according to the read command to the data memory of the GPU without passing the CPU.  (Lee [0035] disclose the system utilizes peer to peer (P2P) direct memory access (DMA between the onboard key value SSD and the GPU 210 and gives the GPU 210 complete P2P DMA control.  See also Lee [0032] that discloses the value corresponding to the key contained in the request is returned.  Thus the Onboard SSD transfer the entire data requested to the CPU memory without passing the CPU. )

Regarding claim 2, Lee teaches all of the limitations of claim 1 above.  Lee further teaches wherein the data storage device further comprises a data shared memory storing the read data, (Lee [0036] discloses that the input-output region of the GPU memory, being directly accessible by both the GPU and the key value SSD 205, may operate functionally as a shared memory.) and wherein the data shared memory includes an address space shared with the data memory. (Lee [0036] discloses input-output region that is shared between the GPU and the key value SSD 205, the system identifies the region based on a (PCI) base address registers (BAR) memory area, thus is a shared address space. )  

Regarding claim 4, Lee teaches all of the limitations of claim 2 above.  Lee further teaches wherein the GPU further comprises a first data input/output (IO) control circuit to transfer the read data received from the data storage device to the data memory.  (Lee Fig. 2 and supporting paras [0027] and [0032]-[0035] that shows On Board PCIe bus that transfers data between the On Board SSD and the GPU Memory.  See also Lee  [0027] that discloses Fig. 2 is a block diagram of a graphics card, where a graphics card is an example of circuitry.  Thus the solution is directed to a key value SSD device which is an example of circuitry that contains input/output (IO) control circuit such as the on board PCIe bus.)

Regarding claim 5, Lee teaches all of the limitations of claim 4 above.  Lee further teaches  wherein the data storage device further comprises a second data IO control circuit to transfer the read data stored in the data shared memory to the data memory.  (Lee Fig. 2 and supporting paras [0027] and [0032]-[0035] discloses Global PCIe that may transfer data from the shared data memory to the Graphics Card shown in Fig. 2 System Memory.)

Regarding claim 6, The system of claim 1, wherein the CPU comprises: an GPU shared memory configured to store the read request provided by the GPU(Lee Fig. 2 that discloses a Graphics Card that contains a CPU and GPU memory 125 which may be consider part of the CPU and the GPU memory contains the data shared by the GPU) a request management circuit configured to monitor the GPU shared memory; and a command control circuit configured to generate the read command corresponding to the read request stored in the GPU shared memory.  (Lee Claim 19 that discloses a key value storage device, which is an example of a circuit, that receives one or more key value requests and in response to the key value request, sends a value to the graphics processing unit via the shared memory means.  See also Lee Title that discloses the solution is directed to a Note that Lee sends a value to the graphics process unit by reading the value from the SSD, thus the system generates a read command to the SSD for the corresponding key in order to return the corresponding value for the key to the graphics processing unit.)

Regarding claim 8, Lee teaches all of the limitations of claim 6 above.  Lee further teaches wherein the GPU comprises: a request generating circuit configured to generate the read request; (Lee Claim 19 that discloses a graphics processing unit containing a key value storage device, which is an example of a circuit, receives one or more key value requests and in response to the value request, sends a value to the graphics processing unit via the shared memory means.  ) and a shared memory control circuit configured to transfer the read request to the GPU and to monitor completion of processing of the read request.  (Lee Claim 19  receives one or more key value requests and in response to the value request, sends a value to the graphics processing unit via the shared memory means.  )

Regarding claim 10,  Lee teaches all of the limitations of claim 6 above.  Lee further teaches wherein the data storage device further comprises a command processing circuit to control the data storage device so that the read data is output according to the read command.  (Lee Fig. 2 and supporting para [0027] that discloses figure 2 is a block diagram of a graphics card, which is an example of circuitry.   Thus when Lee Claim 19 discloses a graphics processing unit containing a key value storage device that receives one or more key value requests and in response to the value request, sends a value to the graphics processing unit via the shared memory means, it is a data storage device containing a command processing circuity to control output the read data according to the read command.)

Regarding claim 12, Lee teaches all of the limitations of claim 1 above. Lee further teaches further comprising a bus circuit coupling the GPU and the CPU, the CPU and the data storage device, or the data storage device and the GPU.  (Lee Fig. 2 and supporting para [0035[ that discloses a GPU memory bus that connects the GPU to the GPU memory and then to PCIe circuit coupling the GPU 210  to CPU, the GPU 210 to the On Board SSD 205, or the On Board SSD 205 to the GPU 210.  Examiner notes that the coupling may be an indirect connection.)

Regarding claim 18, Lee teaches all of the limitations of claim 1 above. Lee further teaches wherein the data storage device is a key-value based device, wherein the read request has a format for reading a value corresponding to a key and the read data includes the value corresponding to the key.   (Lee [0005] that discloses the system is a graphics processing using that processes key value requests where in response to presenting the first key value, the system returns the first value corresponding to the key of firs key value request (ie. the target data from the SSD).)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7, 9,  11, 13 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (Lee et al., US 2019/0244140 A1)  in view of  Fujii (An article titled “Data Transfer Matters for GPU Computing by Yusuke Fujii, Takuya Azumi, el al, and presented at the 2013 International Conference on Parallel and Distributed Systems, available online at https://ieeexplore.ieee.org/document/6808184 and attached dot this office action.

Regarding claim 7, Lee teaches all of the limitations of claim 6 above.  Lee further teaches wherein the request management circuit controls the command control circuit so that the read command corresponding to the read request is generated when the read request is stored in the GPU shared memory, (Lee Claim 19 that discloses a key value storage device, which is an example of a circuit, receives one or more key value requests and in response to the value request, sends a value to the graphics processing unit via the shared memory means.  Thus the key value storage device is monitoring key value requests to respond to.) 
However, Lee does not discuss how the completion status is sent to indicate the data is available.  Thus Lee does not explicitly disclose and records a flag representing completion of processing of the read command in the GPU shared memory when processing of the read command is completed. 
Fujii, of a similar field of endeavor further teaches and records a flag representing completion of processing of the read command in the GPU shared memory when processing of the read command is completed.  (Fujii page 277, column 1, lines 32-47 discloses that CPU data transfers using standard DMA engines are either synchronous (ie. polling driven) or asynchronous (ie. interrupt driven).  Thus the solution of Lee may be polling driven which is a processes where the sending records a flag representing the completion of processing of the command and the receiver polls/reads this flag to determine if and when the process is completed.)
Lee and Fujii are in a similar field of endeavor as both relate to performing DMA access on a GMA device.  Thus it would have been obvious to a person of ordinary skill in the art before the time of the claimed invention to incorporate the polling method of Fujii into the DMA completion processing of Lee.  Such as solution would be obvious to try, as the designer is choosing from a file number of identified, predictable solutions (the standard DMA engine completion notification methods of polling or interrupts) to with a reasonable expectation of success (being able to identify when the completion event occurs using this standard method).   One would be motivated to do so in order to reduce CPU overhead.  As noted in Fujii, page 275, column 1 lines 28-31 a single NVIDIA GPUs integrate thousands of processing cores on a single chip.  If each chip generated an interrupt for each read,  to host would see a tremendous burden to the CPU, as each interrupt requires saving and storing the existing state of the system, including all register associated with the CPU.  

Regarding claim 9, Lee teaches all of the limitations of claim 6 above.  Lee further teaches wherein the shared memory control circuit (Lee [0036] discloses that the input-output region of the GPU memory, being directly accessible by both the GPU and the key value SSD 205, may operate functionally as a shared memory.) 
However, Lee does not discuss how the completion status is sent to indicate the data is available.  Thus Lee does not explicitly disclose records a flag representing completion of processing when processing of the request is completed and the request generating circuit performs a subsequent operation by receiving the read data from the data memory.  
Fujii, of a similar field of endeavor, further discloses records a flag representing completion of processing when processing of the request is completed and the request generating circuit performs a subsequent operation by receiving the read data from the data memory. (Fujii page 277, column 1, lines 32-47 discloses that CPU data transfers using standard DMA engines are either synchronous (i.e. polling driven) or asynchronous (i.e. interrupt driven).  Thus the solution of Lee may be polling driven which is a processes where the sending records a flag representing the completion of processing of the command and the receiver polls/reads this flag to determine if and when the process is completed.)
The motivation to combine Fujii into Lee is the same as described in claim 7 above. 

Regarding claim 11, Lee teaches all of the limitations of claim 10 above.  However, Lee does not discuss how the completion status is sent to indicate the data is available.  Thus Lee does not explicitly disclose wherein the command processing circuit notifies completion of processing of the read command when the read data is transferred to the GPU. 
Fujii, of a similar field of endeavor, further discloses wherein the command processing circuit notifies completion of processing of the read command when the read data is transferred to the GPU.  (Fujii page 277, column 1, lines 32-47 discloses that CPU data transfers using standard DMA engines are either synchronous (i.e. polling driven) or asynchronous (i.e. interrupt driven).  Thus the solution of Lee may be polling driven which is a processes where the sending records a flag representing the completion of processing of the command and the receiver polls/reads this flag to determine if and when the process is completed.  Thus when Lee [0005] discloses that the DMA stores the data in the first memory connected to the first graphics processing unit in view of Fujii that discloses the completion of writing the data is recorded using polling, it is suggesting that the command processing circuit that notifies the polling circuit using a flag does so when the rad data is transferred to the GPU.)
The motivation to combine Fujii into Lee is the same as described in claim 7 above. 

Regarding claim 13, Lee teaches all of the limitations of claim 1 above.   Lee [0023] discloses that the solution of Lee sends one or more key value requests to the key value storage device.   And further discloses reading data from the key value storage device.  However, Lee does not explicitly disclose writing data to the key value storage device.   Thus Lee does not explicitly disclose wherein the GPU further generates a write request and write data, the GPU stores the write data in the data memory, and the GPU transfers the write data to the data storage device without passing the CPU.  
Fujii, of a similar field of endeavor, further teaches wherein the GPU further generates a write request and write data, the GPU stores the write data in the data memory, and the GPU transfers the write data to the data storage device without passing the CPU.  (Fujii, page 276, column 1, lines 8-15 and lines 41-54 discloses that data transfer methods for GPU computing include well-known read and write access which may be performed through commands such as cuMemCopyHtoD and cuMemCopyDtoH.   Thus suggesting the solution of Lee that performs GPU read (via cuMemCopyDtoH commands) could provide GPU write (via cuMemCopyHtoD commands).
Lee and Fujii are in a similar field of endeavor as both relate to performing DMA access on a GMA device.  Thus it would have been obvious to a person of ordinary skill in the art before the time of the claimed invention to incorporate the write processing of Fujii into the solution of Lee.  Such as solution would combine prior art elements (DMA read and write access on GPU devices) according to known methods (such as provided via cuMemCopyHtoD and cuMemCopyDtoH provided by the CUDA application programming interface (API)) to yield predictable results (being able to record data on the storage).   One would be motivated to do so in order to (Fujii, page 275, column 1, lines 37-47 and page 275, column 2 lines 25-35) be able to write data for some of the main GPU applications such as plasma control, autonomous driving, and storage management, thus enabling a key feature of storage management (storing data).

Regarding claim 14, the combination of Lee and Fujii teaches all of the limitations of claim 13 above.   Lee further teaches wherein the data storage device further comprises a data shared memory to store the write data, (Lee [0036] discloses that the input-output region of the GPU memory, being directly accessible by both the GPU and the key value SSD 205, may operate functionally as a shared memory.)  and the data shared memory includes an address space shared with the data memory.  (Lee [0036] discloses input-output region that is shared between the GPU and the key value SSD 205, the system identifies the region based on a (PCI) base address registers (BAR) memory area, thus is a shared address space. )  

Regarding claim 16,  the combination of Lee and Fujii teaches all of the limitations of claim 14 above.   Lee further teaches wherein the GPU further comprises a first data IO control circuit to transfer the write data stored in the data memory to the data storage device.  (Lee Fig. 2 and supporting paras [0027] and [0032]-[0035] discloses Global PCIe that may transfer data from the shared data memory to the Graphics Card shown in Fig. 2 System Memory.   This PCIe circuit in the solution of Lee in view of Fujii would transfer write IO control data as well as read IO control data.)

Regarding claim 17,  the combination of Lee and Fujii teaches all of the limitations of claim 16 above.   Lee further teaches wherein the data storage device further comprises a second data IO control circuit to transfer the write data to the data shared memory referring to the offset information.  (Lee Fig. 2 and supporting paras [0027] and [0032]-[0035] discloses Global PCIe that may transfer data from the shared data memory to the Graphics Card shown in Fig. 2 System Memory for a read operation.   Thus Lee in view of Fujii would transfer data form the Graphics Gard to the shared data memory using the Global PCIe which is an example of a second data IO control circuit that performs a write from the CPU (thus starts at the CPU).)





Response to Remarks
Examiner thanks applicant for the claim updates and remarks in their response of 05/12/2022.   They have been fully considered.
Applicant’s arguments with respect to claims 1-18 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANICE M. GIROUARD whose telephone number is (469)295-9131. The examiner can normally be reached M-F 9:30 - 7:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tim Vo can be reached on 571-272-3642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.M.G./Examiner, Art Unit 2138                                                                                                                                                                                                        
/William E. Baughman/Primary Examiner, Art Unit 2138