DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 1, 4, 9-13 and 16-20 have been amended. Claims 1-20 remain pending and are ready for examination.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6 and 9-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossetti (US Publication No. 2015/0039793 -- "Rossetti") in view of Bhadauria et al. (US Publication No. 2017/0212698 -- "Bhadauria") in further view of Waclawsky et al. (US Patent No. 6,628,610 – “Waclawsky”) in further view of Parker et al. (US Publication No. 2018/0032435 -- "Parker").

Regarding claim 9, Rossetti teaches A graphics processing unit (GPU) compute node comprising: a GPU comprising a GPU cache; a network interface controller (NIC) configured to receive data for processing on the GPU; (Rossetti paragraph [0026-0027], NIC network interface card being characterized in that it further comprises the following blocks: a transmission block, which comprises means suitable to receive data. NIC initially receives the data. For the GPU cache, see Rossetti paragraph [0036], the calculation application of the receiving node pre-allocates one or more buffers on the GPU memory of the receiving node. The GPU memory contains one or more buffers (caches) to store host data among other data. Also see Rossetti paragraph [0042], T1. The host communicates to said transmission block of the NIC metadata relevant to the data transmission, including the amount of data to be transmitted and the virtual address of the GPU memory buffer) the NIC further configured to send the data to a main memory of the GPU compute node; (Rossetti paragraph [0028], a reception block, which comprises means suitable to receive data from said reception network connection block and to provide them to the GPU memory through said bus. NIC sends received data to GPU main memory) the NIC further configured to send coherence information to a coherence directory of the GPU compute node based on the data, the coherence information indicating that a cache entry of the cache is invalid; (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata to the GPU which receives the metadata (i.e., coherence information) in the memory management block (i.e., coherence directory) and uses the received metadata in order to control the reading and writing of data to maintain consistency) the GPU configured to receive the coherence information; (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata to the GPU which receives the metadata (i.e., coherence information) in the memory management block (i.e., coherence directory) and uses the received metadata in order to control the reading and writing of data to maintain consistency. This data/calculation can be used to determine whether or not GPU cache entries have been invalidated, see Rossetti paragraph [0018], Moreover, the very point of splitting the computation into chunks introduces additional overhead as the whole calculation is effectively slowed down, e.g. as GPU memory caches are invalidated, some data structures are to be re-created, etc) the GPU further configured to determine, based on the coherence information, whether the data includes a network header or a command packet; (Rossetti claim 7, T1. communicating metadata relevant to the data transmission, from the host to said transmission block of the NIC, the metadata including the amount of data to be transmitted and the virtual address of the GPU memory buffer where the data is stored. The metadata (coherence information) contains data size and buffer/cache address information to determine if the data satisfies the small data heuristic) the GPU further configured to load the data into the GPU cache from the main memory of the GPU compute node responsive to the data being determined to include the network header or the command packet (Rossetti Figures 1 & 4, also see above mapping for structure).
Rossetti does not teach the coherence information indicating that a cache entry of the cache is invalid; determine, based on the coherence information, whether the data includes a network header or a command packet; the GPU further configured to load the data into the GPU cache from the main memory of the GPU compute node responsive to the data being determined to include the network header or the command packet.
However, Bhadauria teaches the GPU further comprising on a condition that the data includes a network header or a command packet, to load the data into a cache of the GPU from the main memory (Bhadauria paragraph [0056], The cache possibility module 218 determines a caching possibility 231. The caching possibility 231 is a determination whether the data 101 should be written to the volatile memory 106 or not. For example, the cache possibility module 218 can determine the caching possibility 231 based on the data type 229. The decision on whether or not to cache the data depends on the data type satisfying requirements of the caching possibility module).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti with those of Bhadauria. Adding and using a cache to the memory system of Rossetti allows for the system to operate with greater speed and efficiency. Prefetching data into a cache reduces the time required to fetch that data when it is requested. Bhadauria teaches prefetching the data to the cache when a condition is met, which allows for the system to operate in a more organized manner such as doing batch prefetching, which extends the life of the system (Bhadauria paragraph [0018], By eliminating numerous individual small write operations from having multiple instances of the write request with the batch write request, the embodiments can extend the system utilization lifetime of the nonvolatile memory to store the data).

Rossetti in view of Bhadauria does not teach the coherence information indicating that a cache entry of the cache is invalid; determine, based on the coherence information, whether the data includes a network header or a command packet; load the data into the GPU cache from the main memory of the GPU compute node responsive to the data being determined to include the network header or the command packet.
However, Waclawsky teaches determine, based on the coherence information, whether the data includes a network header or a command packet; load the data into the GPU cache from the main memory of the GPU compute node responsive to the data being determined to include the network header or the command packet. (Waclawsky column 1; lines 48-61, An example of a network that offers different types of services at different rates is a network that supports different Quality of Service (QoS) classes. Generally, in such a network, the header of each packet includes a Quality of Service (QoS) field that enables the network nodes (host computers and data communications devices) to classify that packet as belonging to one of the QoS classes (i.e., as containing one of a variety of data types). For example, packets of a video QoS class (i.e., packets carrying video data to provide video service) travel through the network at a high bandwidth, packets of an audio QoS class travel through the network at a relatively slower bandwidth, and packets of a general data QoS class travel through the network at an even slower bandwidth. The system determines whether or not the data includes a command packet or whether it’s part of a network header. It does this based on coherence information, which in this case is considered to be the data type).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti and Bhadauria with those of Waclawsky. Waclawsky teaches determining if the data contains a command packet or a network header based on coherence information. This can be beneficial to the system as it can allow the system to improve performance by determining which command packets (or associated data types) it chooses to enhance the speed of, or potentially drop completely (i.e., prioritizing data processing/caching) from service, amongst other various benefits detailed below (Waclawsky column 1; line 62 – column 2; line 20, To transfer packets having different types of data (e.g., packets of different QoS classes) at different rates in a network, the data communications devices typically allocate different amounts of network resources (e.g., processing time and buffer space) to different packet types. To accomplish this, the specialized packet management policies (e.g., QoS classification, scheduling and drop policies) within the data communications device control the manner in which the data communications device processes the packets. For example, in the above-described network that supports different QoS classes, each data communications device in the network may classify packets into a video QoS class, an audio QoS class, and a general data QoS class according to a QoS classification policy. Additionally, each device may schedule the packets according to a QoS scheduling policy into either a video queue having a high transmission rate, an audio queue having a relatively slower transmission rate, or a general data queue having an even slower transmission rate. Furthermore, under certain conditions (e.g., significantly high network traffic), some devices may drop packets of a particular QoS class (e.g., the general data QoS class) to reduce congestion and reduce resource contention for the non-dropped packets according to a QoS drop policy. Accordingly, the QoS field of each packet can be viewed essentially as a priority field that controls the transfer rate of that packet).

Rossetti in view of Bhadauria in further view of Waclawsky does not teach the coherence information indicating that a cache entry of the cache is invalid.
However, Parker teaches send coherence information to a coherence directory of the GPU compute node based on the data, the coherence information indicating that a cache entry of the cache is invalid (Parker paragraph [0087], The CPUs and GPU each have a local cache 120 and the interconnect 114 may include coherency control circuitry 130 for maintaining coherency between the data in the caches 120. A snoop filter 132 may be provided within the interconnect 114 to track which data is stored by each cache 120. When one of the processing elements initiates an access to a particular address, the snoop filter 132 can determine whether any of the other caches stores data for that address, and if so initiate snoop operations for checking the coherency status of the data in the other caches. Any known coherency protocol may be used to maintain coherency. Coherency information may be used to maintain cache control, which includes validating and invalidating certain cache entries and cache addresses, see Parker paragraph [0031] and paragraph [0088], When performing cache maintenance operations identified by virtual page address as discussed above, then the snoop filter 132 can be useful for reducing the amount of cache searching required. In general, when a cache maintenance operation is issued then this may be broadcast throughout the coherent fabric so that the data is cleaned or invalidated in any of the caches in which the data may be stored. However, often the page size may be relatively large and caches may be relatively small and so there is a reasonable probability that a certain cache may not store any data from the page specified in the instruction. To reduce the overhead of searching, the snoop filter 132 can be used to determine whether it is necessary to forward the cache maintenance commands to each cache, so that only the caches which are identified as storing data from the specified page are looked up. The coherency controller 130 may prevent transmission of cache maintenance commands to caches which are not indicated in the snoop filter 132 as storing data from that page, so that the bandwidth and control overhead associated with transmitting and tracking the commands, and the overhead of searching the cache to determine whether it holds the required data, can be reduced).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria and Waclawsky with those of Parker. Parker teaches using coherency information for the GPU cache management to indicate invalid cache entries, which allows the system to improve performance in a variety of ways, i.e., by reducing the overhead for certain cache searching operations and minimizing cache maintenance (Parker paragraph [0088], When performing cache maintenance operations identified by virtual page address as discussed above, then the snoop filter 132 can be useful for reducing the amount of cache searching required. In general, when a cache maintenance operation is issued then this may be broadcast throughout the coherent fabric so that the data is cleaned or invalidated in any of the caches in which the data may be stored. However, often the page size may be relatively large and caches may be relatively small and so there is a reasonable probability that a certain cache may not store any data from the page specified in the instruction. To reduce the overhead of searching, the snoop filter 132 can be used to determine whether it is necessary to forward the cache maintenance commands to each cache, so that only the caches which are identified as storing data from the specified page are looked up. The coherency controller 130 may prevent transmission of cache maintenance commands to caches which are not indicated in the snoop filter 132 as storing data from that page, so that the bandwidth and control overhead associated with transmitting and tracking the commands, and the overhead of searching the cache to determine whether it holds the required data, can be reduced).

Claim 1 is the corresponding method claim to the system claim 9. It contains the same limitations and is rejected with the same references and rationale.

Regarding claim 10, Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker teaches The GPU of claim 9, wherein the NIC is configured to send the data to the main memory using a direct memory access (DMA) (Rossetti paragraph [0086], Feature a) relies on the RDMA technique, extending it by adding the capability to receive message directly on GPU memory, where step a is the following direct memory access operation/connection: Rossetti paragraph [0122], a. NIC_GPU_RX triggers the NIC_GPU_DMA, which manipulates the GPU to make it accessible the memory region from the bus. The data is transmitted via direct memory access technology to the GPU memory...Also see Rossetti claim 1, wherein realizing a direct exchange between the GPU memory and a network through the NIC).

Claim 2 is the corresponding method claim to the system claim 10. It contains the same limitations and is rejected with the same references and rationale.


Regarding claim 11, Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker teaches The GPU of claim 9, wherein the GPU is configured to receive the coherence information from the NIC (Rossetti paragraph [0031], a direct transmission block for transmission from GPU, which includes means suitable to receive from the GPU both the data to be transmitted and the relevant metadata, and to route them towards said transmission network connection block, in such a way that the GPU starts an operation of transmission with respect to the NIC. The coherence information can be sent directly from the NIC to the GPU).

Claim 3 is the corresponding method claim to the system claim 11. It contains the same limitations and is rejected with the same references and rationale.

Regarding claim 12, Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker teaches The GPU of claim 9, wherein the GPU is configured to receive the coherence information from the coherence directory (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata to the memory management block (i.e., coherence directory) which then sends it to the GPU which receives the metadata (i.e., coherence information) and uses the received metadata in order to control the reading and writing of data to maintain consistency).

Claim 4 is the corresponding method claim to the system claim 12. It contains the same limitations and is rejected with the same references and rationale.

Regarding claim 13, Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker teaches The GPU of claim 9, wherein the GPU is configured to receive the coherence information transmitted from the NIC to the coherence directory (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata to the memory management block (i.e., coherence directory) which then sends it to the GPU which receives the metadata (i.e., coherence information) and uses the received metadata in order to control the reading and writing of data to maintain consistency).

Claim 5 is the corresponding method claim to the system claim 13. It contains the same limitations and is rejected with the same references and rationale.


Claims 6 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker teaches as applied to claims 1 and 9 above, and further in view of Mannava et al. (US Publication No. 2018/0225209 -- "Mannava").

 The GPU of claim 9, wherein the coherence information comprises an invalidating probe (Mannava paragraph [0017], interface circuitry to receive, from an interconnect for managing coherency of data in the cache, a snoop-with-overridable-invalidate request specifying a target address of target data and requesting that the target data is invalidated from the cache. Coherency information is gathered by requested that the target data is checked for invalidity).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria, Waclawsky and Parker with those of Mannava. Mannava teaches using an invalidation probe for coherence, which helps manage coherency of data and remove unnecessary data from memory, improving the efficiency and storage of the system (Mannava paragraph [0036], The master device which reads that data will typically be aware of the nature of the data and so can use the read-with-overridable-invalidate transaction in situations where repeated use of the data is unlikely, to signal to other caches that this data can be invalidated. While the invalidation does not need to be performed in order to maintain coherency or functional correctness, by removing a cache line that is unlikely to be used again in future, this frees an unallocated cache entry which can be selected on a later cache allocation, avoiding an unnecessary eviction of a different cache line that would have benefited from remaining in the cache. Hence, by using the read-with-overridable-invalidate transaction instead of a read which does not provide an invalidate hint, greater efficiency in cache usage can be achieved and hence the performance of the system as a whole improved by reducing the chance that required data is not in the cache).

Claim 6 is the corresponding method claim to the system claim 14. It contains the same limitations and is rejected with the same references and rationale.


Claims 7-8 and 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossetti in view of Bhadauria in further view of Waclawsky and Parker as applied to claim 9 above, and further in view of Arnold et al. (US Publication No. 2003/0154216 -- "Arnold").

Regarding claim 15, Rossetti in view of Bhadauria in further view of Waclawsky and further in view of Parker and further in view of Arnold teaches The GPU of claim 9, wherein the coherence information includes an indication of a data type of the data (Arnold claim 10, a data coherency mechanism that maintains coherency of reflective columns in the database that are created by the data access mechanism and that contain the same data in different data types. The coherency information includes a database featuring multiple different types of data).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria, Waclawsky and Parker with those of Arnold. Arnold teaches using data type for coherence, which is useful to help the system determine which functions and actions need to be taken, since they can vary based on the data type (Arnold paragraph [0006], The way that data is stored in a database affects the performance of applications that access the data. If the data is stored as a particular data type, but an application requires a different data type, the data must typically be read, then converted to the desired data type). Additionally, there also exists the possibility for the data type to be changed to better suit the system (Arnold paragraph [0008], and if the data in the database is stored in a less-than-optimal format for the application, the data type of one or more columns in the database is changed to a more optimal format for the application).

Claim 7 is the corresponding method claim to the system claim 15. It contains the same limitations and is rejected with the same references and rationale.

Regarding claim 16, Rossetti in view of Bhadauria in further view of Waclawsky and in further view of Parker and further in view of Arnold teaches The GPU of claim 15, wherein the GPU is is configured to determine whether the data satisfies a heuristic based on the data type indicated in the coherence information (Bhadauria paragraph [0056], The cache possibility module 218 determines a caching possibility 231. The caching possibility 231 is a determination whether the data 101 should be written to the volatile memory 106 or not. For example, the cache possibility module 218 can determine the caching possibility 231 based on the data type 229. The 

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti with those of Bhadauria, Waclawsky, Parker and Arnold. Bhadauria teaches the heuristic being based on the data type, which as mentioned previously, can allow the system to used optimized functions for that particular data type (Arnold paragraph [0008], and if the data in the database is stored in a less-than-optimal format for the application, the data type of one or more columns in the database is changed to a more optimal format for the application).

Claim 8 is the corresponding method claim to the system claim 16. It contains the same limitations and is rejected with the same references and rationale.


Claims 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Rossetti in view of Bhadauria in further view of Waclawsky in further view of Parker  and in further view of Bernath (US Publication No. 2017/0180272 -- "Bernath").

A method for inputting memory from a network interface controller (NIC) of a graphics processing unit (GPU) compute node to a cache of a GPU of the GPU compute node, the method comprising: receiving, by the NIC, data for processing on the GPU; (Rossetti paragraph [0026-0027], NIC network interface card being characterized in that it further comprises the following blocks: a transmission block, which comprises means suitable to receive data. NIC initially receives the data) sending, by the NIC, the data to a main memory of the GPU compute node; (Rossetti paragraph [0028], a reception block, which comprises means suitable to receive data from said reception network connection block and to provide them to the GPU memory through said bus. NIC sends received data to GPU main memory) sending coherence information to a coherence directory of the compute node based on the data, (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata to the GPU which receives the metadata (i.e., coherence information) in the memory management block (i.e., coherence directory) and uses the received metadata in order to control the reading and writing of data to maintain consistency) and responsive to the coherency information, writing the data into the cache from the main memory of the GPU compute node (Rossetti Figures 1 & 4, also see above mapping for structure).
Rossetti does not teach determining, by the NIC, whether the data includes a network header or a command packet; responsive to the data being determined to include the network header or the command packet, the coherence information indicating that a cache entry of the cache is invalid; writing the data into the cache from the main memory of the GPU compute node.
However, Bhadauria teaches writing the data into the cache from the main memory of the GPU compute node (Bhadauria paragraph [0056], The cache possibility module 218 determines a caching possibility 231. The caching possibility 231 is a determination whether the data 101 should be written to the volatile memory 106 or not. For example, the cache possibility module 218 can determine the caching possibility 231 based on the data type 229. The decision on whether or not to cache the data depends on the data type satisfying requirements of the caching possibility module).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti with those of Bhadauria. Adding and using a cache to the memory system of Rossetti allows for the system to operate with greater speed and efficiency. Prefetching data into a cache reduces the time required to fetch that data when it is requested. Bhadauria teaches prefetching the data to the cache when a condition is met, which allows for the system to operate in a more organized manner such as doing batch prefetching, which extends the life of the system (Bhadauria paragraph [0018], By eliminating numerous individual small write operations from having multiple instances of the write request with the batch write request, the embodiments can extend the system utilization lifetime of the nonvolatile memory to store the data).

determining, by the NIC, whether the data includes a network header or a command packet; responsive to the data being determined to include the network header or the command packet, the coherence information indicating that a cache entry of the cache is invalid.
However, Waclawsky teaches determining, by the NIC, whether the data includes a network header or a command packet; responsive to the data being determined to include the network header or the command packet, (Waclawsky column 1; lines 48-61, An example of a network that offers different types of services at different rates is a network that supports different Quality of Service (QoS) classes. Generally, in such a network, the header of each packet includes a Quality of Service (QoS) field that enables the network nodes (host computers and data communications devices) to classify that packet as belonging to one of the QoS classes (i.e., as containing one of a variety of data types). For example, packets of a video QoS class (i.e., packets carrying video data to provide video service) travel through the network at a high bandwidth, packets of an audio QoS class travel through the network at a relatively slower bandwidth, and packets of a general data QoS class travel through the network at an even slower bandwidth. The system determines whether or not the data includes a command packet or whether it’s part of a network header. It does this based on coherence information, which in this case is considered to be the data type).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti and Bhadauria with those of Waclawsky. Waclawsky teaches determining if To transfer packets having different types of data (e.g., packets of different QoS classes) at different rates in a network, the data communications devices typically allocate different amounts of network resources (e.g., processing time and buffer space) to different packet types. To accomplish this, the specialized packet management policies (e.g., QoS classification, scheduling and drop policies) within the data communications device control the manner in which the data communications device processes the packets. For example, in the above-described network that supports different QoS classes, each data communications device in the network may classify packets into a video QoS class, an audio QoS class, and a general data QoS class according to a QoS classification policy. Additionally, each device may schedule the packets according to a QoS scheduling policy into either a video queue having a high transmission rate, an audio queue having a relatively slower transmission rate, or a general data queue having an even slower transmission rate. Furthermore, under certain conditions (e.g., significantly high network traffic), some devices may drop packets of a particular QoS class (e.g., the general data QoS class) to reduce congestion and reduce resource contention for the non-dropped packets according to a QoS drop policy. Accordingly, the QoS field of each packet can be viewed essentially as a priority field that controls the transfer rate of that packet).
determining, by the NIC; the coherence information indicating that a cache entry of the cache is invalid.
However, Parker teaches the coherence information indicating that a cache entry of the cache is invalid (Parker paragraph [0087], The CPUs and GPU each have a local cache 120 and the interconnect 114 may include coherency control circuitry 130 for maintaining coherency between the data in the caches 120. A snoop filter 132 may be provided within the interconnect 114 to track which data is stored by each cache 120. When one of the processing elements initiates an access to a particular address, the snoop filter 132 can determine whether any of the other caches stores data for that address, and if so initiate snoop operations for checking the coherency status of the data in the other caches. Any known coherency protocol may be used to maintain coherency. Coherency information may be used to maintain cache control, which includes validating and invalidating certain cache entries and cache addresses, see Parker paragraph [0031] and paragraph [0088], When performing cache maintenance operations identified by virtual page address as discussed above, then the snoop filter 132 can be useful for reducing the amount of cache searching required. In general, when a cache maintenance operation is issued then this may be broadcast throughout the coherent fabric so that the data is cleaned or invalidated in any of the caches in which the data may be stored. However, often the page size may be relatively large and caches may be relatively small and so there is a reasonable probability that a certain cache may not store any data from the page specified in the instruction. To reduce the overhead of searching, the snoop filter 132 can be used to determine whether it is necessary to forward the cache maintenance commands to each cache, so that only the caches which are identified as storing data from the specified page are looked up. The coherency controller 130 may prevent transmission of cache maintenance commands to caches which are not indicated in the snoop filter 132 as storing data from that page, so that the bandwidth and control overhead associated with transmitting and tracking the commands, and the overhead of searching the cache to determine whether it holds the required data, can be reduced).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria and Waclawsky with those of Parker. Parker teaches using coherency information for the GPU cache management to indicate invalid cache entries, which allows the system to improve performance in a variety of ways, i.e., by reducing the overhead for certain cache searching operations and minimizing cache maintenance (Parker paragraph [0088], When performing cache maintenance operations identified by virtual page address as discussed above, then the snoop filter 132 can be useful for reducing the amount of cache searching required. In general, when a cache maintenance operation is issued then this may be broadcast throughout the coherent fabric so that the data is cleaned or invalidated in any of the caches in which the data may be stored. However, often the page size may be relatively large and caches may be relatively small and so there is a reasonable probability that a certain cache may not store any data from the page specified in the instruction. To reduce the overhead of searching, the snoop filter 132 can be used to determine whether it is necessary to forward the cache maintenance commands to each cache, so that only the caches which are identified as storing data from the specified page are looked up. The coherency controller 130 may prevent transmission of cache maintenance commands to caches which are not indicated in the snoop filter 132 as storing data from that page, so that the bandwidth and control overhead associated with transmitting and tracking the commands, and the overhead of searching the cache to determine whether it holds the required data, can be reduced).
Rossetti in view of Bhadauria in further view of Waclawsky does not teach determining, by the NIC, whether the data satisfies a heuristic; on a condition that the data satisfies the heuristic, loading the data into the cache from the NIC.
However, Bernath teaches determining, by the NIC, whether the data satisfies a heuristic; (Bernath paragraph [0039], The NIC 102 determines a type of incoming data and filters incoming data into buffers 109 based on data type. The NIC itself determines if the data satisfies a condition) on a condition that the data satisfies the heuristic, loading the data into the cache from the NIC (Bernath paragraph [0046], MSI or DMA allows a piece of hardware such as the NIC 102 to have access to memory within the GPU 104 independently of a CPU and store the data in the buffers 109 in memory of the GPU. Store data directly into the cache/buffer from the NIC device).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria and Waclawsky with those of Bernath. Bernath teaches determining and loading data into the cache with the NIC device directly. This can help The GPU 104 may receive all network interface information from the NIC 102 without requiring communication from the CPU 106. This may be accomplished by configuring the GPU 104 (e.g., by the CPU 106 or other processing device) to allow access to ranges of memory for the NIC 102 to use as buffers and having the NIC 102 set indexes for inbound and outbound packets).

Regarding claim 18, Rossetti in view of Bhadauria in further view of Waclawsky and Parker in further view of Bernath teaches The method of claim 17, wherein the data is written by the NIC to the main memory of the GPU using a direct memory access (DMA) (Bernath paragraph [0046], As the packets come into the NIC 102, they are inserted into buffers 109 in GPU memory by the NIC 102 using direct memory access (DMA). Using DMA to store the NIC data to the buffer/cache).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti, Bhadauria and Waclawsky and Parker with those of Bernath. Bernath allows the NIC to operate independently of the CPU which can improve system functioning and reliability (Bernath paragraph [0046], As the packets come into the NIC 102, they are inserted into buffers 109 in GPU memory by the NIC 102 using direct memory access (DMA) via message signaled interrupts (MSI), e.g., MSI-X. MSI or DMA allows a piece of hardware such as the NIC 102 to have access to memory within the GPU 104 independently of a CPU and store the data in the buffers 109 in memory of the GPU. The NIC 102 writes or transmits the interrupts, for example, to interrupt address locations in the GPU 104).

Regarding claim 19, Rossetti in view of Bhadauria in further view of Waclawsky and Parker in further view of Bernath teaches The method of claim 17, wherein the coherence information indicates a data type (Waclawsky column 1; lines 48-61, An example of a network that offers different types of services at different rates is a network that supports different Quality of Service (QoS) classes. Generally, in such a network, the header of each packet includes a Quality of Service (QoS) field that enables the network nodes (host computers and data communications devices) to classify that packet as belonging to one of the QoS classes (i.e., as containing one of a variety of data types). For example, packets of a video QoS class (i.e., packets carrying video data to provide video service) travel through the network at a high bandwidth, packets of an audio QoS class travel through the network at a relatively slower bandwidth, and packets of a general data QoS class travel through the network at an even slower bandwidth. The system determines whether or not the data includes a command packet or whether it’s part of a network header. It does this based on coherence information, which in this case is considered to be the data type) of the data (Rossetti paragraph [0029], a GPU memory management block, which comprises means suitable to send metadata to the GPU to control the reading or writing of data from/into the memory of the same GPU, on the basis of metadata received respectively from said reception block or said transmission block. NIC sends metadata (i.e., coherence information) to the memory management block (i.e., coherence directory)).

Regarding claim 20, Rossetti in view of Bhadauria in further view of Waclawsky and Parker in further view of Bernath The method of claim 19, wherein the NIC determines (Bernath paragraph [0039], The NIC 102 determines a type of incoming data and filters incoming data into buffers 109 based on data type. The NIC itself determines if the data satisfies a condition) whether the data satisfies a heuristic based on a data type of the data (Waclawsky column 1; lines 48-61, An example of a network that offers different types of services at different rates is a network that supports different Quality of Service (QoS) classes. Generally, in such a network, the header of each packet includes a Quality of Service (QoS) field that enables the network nodes (host computers and data communications devices) to classify that packet as belonging to one of the QoS classes (i.e., as containing one of a variety of data types). For example, packets of a video QoS class (i.e., packets carrying video data to provide video service) travel through the network at a high bandwidth, packets of an audio QoS class travel through the network at a relatively slower bandwidth, and packets of a general data QoS class travel through the network at an even slower bandwidth. The system determines whether or not the data includes a command packet or whether it’s part of a network header. It does this based on coherence information, which in this case is considered to be the data type).

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teachings of Rossetti with those of Bhadauria, Waclawsky, Parker and Bernath, as described previously in claims 16 and 17. 


Response to Arguments
Applicant’s arguments, see pages 1-4 (numbered pages 7-10), filed November 18th, 2020, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Rossetti in view of Bhadauria in further view of Waclawsky, as applied to the original independent claim, and in further view of Parker et al. (US Publication No. 2018/0032435 -- "Parker").

The teachings of the Parker reference are added to address the newly added claim limitation for the independent claim set. Specifically, Parker teaches a process by which the coherency information can be sent, acquired, and stored in such a way as to provide relevant cache information. Parker specifically describes using the coherency information in order to determine the validity or invalidity of given cache addresses/entries. The examiner asserts that the Parker reference sufficiently discloses the newly added limitations of independent claims 1, 9 and 17.




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONAH C KRIEGER whose telephone number is (571)272-3627.  The examiner can normally be reached on Monday - Friday 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/J.C.K./           Examiner, Art Unit 2136         

/CHARLES RONES/           Supervisory Patent Examiner, Art Unit 2136