DETAILED ACTION

This Office action is in response to the amendment filed October 26, 2021.
Claims 1-20 are pending and have been examined.
Claims 1, 2, 8, 9, 15, and 16 have been amended.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/16/2021 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (GPUnet: Networking Abstractions for GPU Programs), in view of Daoud (GPUrdma: GPU-side library for high performance networking from GPU kernels), and further in view of Simha (US 2017/0358279).

Regarding claim 1, Kim discloses:
wherein the processor is configured to: generate a network task within a kernel executing on a compute unit (see at least page 204, right column, last paragraph, “In-GPU networking eliminates the overheads of CPU-GPU data transfer and kernel invocation, which penalize short requests.”); 
store an indication of the network task in the cache (see at least page 205, left column, paragraph 2, “GPUnet stores network buffers in GPU memory, keeps track of active connections, and manages control flow for their associated network streams”);  10
detect, by the command processor, the indication of the network task in the cache; and process, by the command processor, the network task to generate a network message […] (see at least figures 2 and 4; page 206, left column, last two paragraphs, To bypass CPU memory, eliminate packet processing, and enable NIC sharing across different processors in the system, we leverage RDMA-capable high-performance NICs… The NIC can concurrently dispatch messages to multiple buffers and multiple applications, while placing source and destination buffers in both CPU and GPU memory. As a result, multiple CPU and GPU 
However Kim does not explicitly disclose, but Daoud discloses:
process, by the command processor, the network task to generate a network message without involving any external processor (see at least Abstract, “We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the network directly from GPU kernels. The library executes no code on CPU, directly accessing the Host Channel Adapter (HCA) Infiniband hardware for both control and data”; figure 1, showing GPU control path to the NIC without the use of the CPU; page 3, left column, paragraph 1, “GPUs in our system interact with the HCA directly without CPU involvement”; page 4, section 4.1, paragraph 2, GPU thread triggers the HCA; page 2, left column, paragraph 5, “GPUrdma differs from previous works in that the CPU is completely bypassed, executing no code relevant to GPU communications.”; page 5, left column, paragraph 4, “Doorbell registers reside across the PCIe bus in the HCA... Our naive implementation writes to the doorbell register for each message separately”);
convey the network message directly to a network interface unit by bypassing any external processor (see at least Abstract, “We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kim by adapting the teachings of Daoud to include bypassing the CPU for GPU-NIC communication.  The combination “provides strong performance isolation of GPU communications from the CPU workloads” (Daoud page 2, left column, paragraph 5) and “GPU-side GPI prototype can achieve higher application performance than the traditional CPU-side GPI for these workloads” (Daoud page 8, right column, paragraph 1).
However Kim and Daoud do not explicitly disclose, but Simha discloses:
a command processor; a plurality of compute units; and a cache (see at least figure 7 and paragraph 0026, the GPU 700 includes components similar to that of the Intel GPU 208, such as command processors, compute units, caches…These units are not discussed here in detail but their operation is known to those skilled in the art)


Regarding claim 2, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is further configured to attach, to the network message, source and destination media access control (MAC) addresses, allowing the network message to be directed to a specific network interface on a network interface card  (see at least Daoud page 3, section 3.3, paragraph 2, HCA is an Infiniband compliant device and the standard includes physical addresses for the data path; section 3.3, paragraph 4, In order to use the services of the Infiniband transport layer, a process asks the HCA to create a Queue Pair (QP) — a control object consisting of a pair of Work Queues: a Send Work Queue, which allows the process to be the initiator of transport operations, and a Receive Work Queue, which allows the process to be the target of a transport layer operation. A QP is identified by a QP number (analogous to a TCP port number).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 3, the rejection of claim 2 is incorporated, and Kim as modified further discloses:
wherein the command processor is further configured to process the network task by translating a first command of the network task to a second command of the network message by using a network stack (see at least Daoud Abstract; figure 1; page 3, left column, paragraph 1; section 4.1)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 4, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is configured to process the network task to generate the network message prior to the kernel completing execution (see at least page 208, right column, section 6.4, paragraph 2)

Regarding claim 5, the rejection of claim 1 is incorporated, and Kim further discloses:
wherein a thread of the kernel is configured to dynamically determine a target address of the network message (see at least page 208, last paragraph – page 209, first paragraph, “The multiplication kernel gets pointers to the input matrices and the socket for writing the results. The number of threads – a critical parameter defining how many GPU computational resources a kernel should use – is derived from the matrix dimensions as in the standard GPU implementation. When the 

Regarding claim 6, the rejection of claim 1 is incorporated, and Kim as modified further discloses:
wherein the command processor is configured to attach a source address and a destination address to the network message based on a link layer of a network stack (see at least Daoud page 4, section 4.1, paragraph 2, GPU thread creates a Work Queue Entry (WQE) in the QP, with information about the source and the destination addresses)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Kim, Daoud, and Simha for the reasons listed above.

Regarding claim 7, the rejection of claim 1 is incorporated, and Kim further discloses:
wherein the processor is a graphics processing unit (GPU) (see at least figures 2 and 4)

Regarding claim 8, the instant claim contains several limitations of the same scope as claim 1.  The corresponding limitations are rejected for the same reasons as seen in claim 1 above, in addition to the following limitations:
Kim
a compute unit of a parallel processor and a general purpose processor (see at least figures 2 and 4, CPU and GPU; page 201, section 1, paragraph 2, GPU hardware architecture has matured to support general-purpose parallel workloads)

Regarding claims 9-14, the scope of the instant claims does not differ substantially from that of claims 2-7, and they are rejected for the same reasons, respectively.  Regarding claim 15, the scope of the instant claim does not differ substantially from that of claims 1 and 2, and it is rejected for the same reasons.  Regarding claims 16-20, the scope of the instant claims does not differ substantially from that of claims 2-6, and they are rejected for the same reasons, respectively.

Response to Arguments
Rejection of claims under §103:
Applicant’s arguments with respect to the claims have been fully considered but are not persuasive.  
Applicant asserts that the prior art does not disclose conveying the network message directly to a network interface unit by bypassing any external processor.  Applicant contends Daoud discloses a GPU which stores work in the GPU memory rather than conveying the work request to the NIC.  Application states the write operation triggers the NIC to process the work request stored in the GPU memory.  Examiner respectfully disagrees.  The claim requires a network message is sent directly to a network interface unit by bypassing any external processor.  Daoud discloses the GPU writes directly into a doorbell register in the HCA (page 5, left column, paragraph 4, Doorbell registers reside across the PCIe bus in the HCA... Our naive 
Applicant asserts the prior art does not disclose attaching a source address and a destination address to the network message based on a link layer of a network stack as seen in claim 6.  Applicant contends the Infiniband standard as seen in Daoud obviates the need for a network stack.  Examiner respectfully disagrees.  The claim does not require a network stack but requires source and destination addresses based on a link layer of a network stack and Daoud discloses a Work Queue Entry (WQE) in the QP, with information about the source and the destination addresses.  Therefore the prior art does disclose attaching a source address and a destination address to the network message based on a link layer of a network stack as seen in claim 6.
The remaining arguments are related to newly added limitations and are addressed in the rejections/action above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIMBERLY L JORDAN whose telephone number is (571)270-5481.  The examiner can normally be reached on Monday-Friday 9:30am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dennis Chow can be reached on (571) 272-7767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/KIMBERLY L JORDAN/Examiner, Art Unit 2194