DETAILED ACTION

This Office action is in response to the amendment filed February 26, 2021.
Claims 1-20 are pending and have been examined.
Claims 1-4, 6, 8-11, 13, 15-18, and 20 have been amended.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (GPUnet: Networking Abstractions for GPU Programs), in view of Daoud (GPUrdma: GPU-side library for high performance networking from GPU kernels), and further in view of Simha (US 2017/0358279).

Regarding claim 1, Kim
wherein the processor is configured to: generate a network task within a kernel executing on a compute unit (see at least page 204, right column, last paragraph, “In-GPU networking eliminates the overheads of CPU-GPU data transfer and kernel invocation, which penalize short requests.”); 
store an indication of the network task in the cache (see at least page 205, left column, paragraph 2, “GPUnet stores network buffers in GPU memory, keeps track of active connections, and manages control flow for their associated network streams”);  10
detect, by the command processor, the indication of the network task in the cache; and process, by the command processor, the network task to generate a network message […] (see at least figures 2 and 4; page 206, left column, last two paragraphs, To bypass CPU memory, eliminate packet processing, and enable NIC sharing across different processors in the system, we leverage RDMA-capable high-performance NICs… The NIC can concurrently dispatch messages to multiple buffers and multiple applications, while placing source and destination buffers in both CPU and GPU memory. As a result, multiple CPU and GPU applications can share the NIC without coordinating their access to the hardware for every data transfer… GPUnet uses both a CPU and a GPU to interact with the NIC. It stores network buffers for GPU applications in GPU memory, and leaves the buffer memory management to the GPU socket layer. The per-connection receive and send queues are also managed by the GPU.)
However Kim does not explicitly disclose, but Daoud
process, by the command processor, the network task to generate a network message without involving any external processor (see at least Abstract, “We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the network directly from GPU kernels. The library executes no code on CPU, directly accessing the Host Channel Adapter (HCA) Infiniband hardware for both control and data”; figure 1, showing GPU control path to the NIC without the use of the CPU; page 3, left column, paragraph 1, “GPUs in our system interact with the HCA directly without CPU involvement”; page 4, section 4.1, paragraph 2, GPU thread triggers the HCA; page 2, left column, paragraph 5, “GPUrdma differs from previous works in that the CPU is completely bypassed, executing no code relevant to GPU communications.”);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kim by adapting the teachings of Daoud to include bypassing the CPU for GPU-NIC communication.  The combination “provides strong performance isolation of GPU communications from the CPU workloads” (Daoud page 2, left column, paragraph 5) and “GPU-side GPI prototype can achieve higher application performance than the traditional CPU-side GPI for these workloads” (Daoud page 8, right column, paragraph 1).
However Kim and Daoud do not explicitly disclose, but Simha discloses:
a command processor; a plurality of compute units; and a cache (see at least figure 7 and paragraph 0026, the GPU 700 includes components similar to that of the Intel GPU 208, such as command processors, compute units, caches…These 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kim and Daoud by including Simha’s teaching of the makeup of a graphics processor which would include a command processor, a plurality of compute units, and a cache.  The claimed invention is merely a combination of old elements, and in the combination each element would have performed the same function as it did separately.  One of ordinary skill in the art would have recognized that the results of the combination were predictable.

Regarding claim 2, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is further configured to convey the network message directly to a network interface unit by bypassing any external processor (see at least Daoud Abstract, “We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the network directly from GPU kernels. The library executes no code on CPU, directly accessing the Host Channel Adapter (HCA) Infiniband hardware for both control and data”; figure 1, showing GPU control path to the NIC without the use of the CPU; page 3, left column, paragraph 1, “GPUs in our system interact with the HCA directly without CPU involvement”; page 4, section 4.1, paragraph 2, GPU thread triggers the HCA; page 2, left column, paragraph 5, “GPUrdma differs 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 3, the rejection of claim 2 is incorporated, and Kim as modified further discloses:
wherein the command processor is further configured to process the network task by translating a first command of the network task to a second command of the network message by using a network stack (see at least Daoud Abstract; figure 1; page 3, left column, paragraph 1; section 4.1)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 4, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is configured to process the network task to generate the network message prior to the kernel completing execution (see at least page 208, right column, section 6.4, paragraph 2)

Regarding claim 5, the rejection of claim 1 is incorporated, and Kim further discloses:
wherein a thread of the kernel is configured to dynamically determine a target address of the network message (see at least page 208, last paragraph – page 209, 

Regarding claim 6, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is configured to attach a source address and a destination address to the network message based on a link layer of a network stack (see at least Daoud page 4, section 4.1, paragraph 2, GPU thread creates a Work Queue Entry (WQE) in the QP, with information about the source and the destination addresses)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 7, the rejection of claim 1 is incorporated, and Kim further discloses:
wherein the processor is a graphics processing unit (GPU) (see at least figures 2 and 4)


Kim further discloses:
a compute unit of a parallel processor and a general purpose processor (see at least figures 2 and 4, CPU and GPU; page 201, section 1, paragraph 2, GPU hardware architecture has matured to support general-purpose parallel workloads)

Regarding claims 9-14, the scope of the instant claims does not differ substantially from that of claims 2-7, and they are rejected for the same reasons, respectively.  Regarding claim 15, the scope of the instant claim does not differ substantially from that of claims 1 and 2, and it is rejected for the same reasons.  Regarding claims 16-20, the scope of the instant claims does not differ substantially from that of claims 2-6, and they are rejected for the same reasons, respectively.

Response to Arguments
Rejection of claims under §103:
Applicant’s arguments with respect to the claims have been fully considered but are moot in view of the new grounds of rejection.  


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Orr discloses GPU-initiated network messages.  LeBeane discloses GPU triggered networking.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIMBERLY L JORDAN whose telephone number is (571)270-5481.  The examiner can normally be reached on Monday-Friday 9:30am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KIMBERLY L JORDAN/Examiner, Art Unit 2194                                                                                                                                                                                                        
	
	
	/DOON Y CHOW/            Supervisory Patent Examiner, Art Unit 2194