DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to amendment filed on 04/12/2021. Claim 25 was canceled before. Claims 1-24 and 26 have been examined and are pending in this application.
Response to Arguments
Applicant's arguments filed 04/12/2021 have been fully considered but they are not persuasive.
Applicant argues, page 9 of the remarks, “the combination does not at least describe ‘wherein the predictor circuitry is to enable the load operation when a number of the accesses to the cache line made by the second processor via executions of the snapshot instruction exceeds a first threshold greater or equal to 1.’” Then Applicant states, in the same paragraph in the last sentence, “Bartik, which appears to focus only on non-coherent requests, does not overcome this deficiency.”
The Examiner respectfully disagrees. The Examiner submits that Bartik teaches the aforementioned claim limitation. Bartik teaches that cache lines may be frequently referenced by multiple processors, while one or more cores may be updating the state of the cache line. The processors referencing the cache lines may not need the most recent copies of the cache lines, para 0021 of Bartik. There are many cases in a computer system where values such as counters or tokens are read or fetched at a high frequency but are actually updated at a low frequency. These cases do not require data to be perfectly accurate or coherent. As such, a non-coherent value which is close and wherein the predictor circuitry is to enable the load operation when a number of the accesses to the cache line made by the second processor via execution of the snapshot instructions exceeds a first threshold greater than or equal to 1” as required by now amended independent claim 1, and similarly required by amended independent claims 10 and 19.
the number of accesses made via snapshot instructions because they track only access made to local cache (see McMinn, FIGS. 3-4: prefetch hit count is incremented at 64 where there is a hit in prefetch way at 52; FIGS. 1-2, prefetch way 32 is part of local cache 18).”
The Examiner respectfully submits that Applicant is interpreting McMinn in isolation without any consideration to the combination. Claims 2-6 were rejected by McMinn in combination with Steely and Bartik. Both Steely and Bartik have teachings analogous to the claimed “snapshot instruction”. Steely teaches uncached fill of a cache line from an owner processor without changing a cache coherence state of the cache line and the cache line remains with the owner processor, see para 0037 of Steely. Bartik also teaches a “Dirty-Read Fetch Instruction” which is a non-coherent read of a cache line from an owner processor to a requesting processor without changing a cache coherence state in the owner processor, para 0049 and FIG. 4 of Bartik. McMinn is merely relied upon teach tracking, using a counter, the number of accesses. For example, McMinn teaches that upon access to a prefetch cache line which is data, the corresponding counter may be incremented, col 4 lines 56-57 of McMinn. Thus, Steely and Bartik modified in a motivated combination by McMinn teaches the counter to track the number of accesses made via snapshot instructions.   
Applicant argues regarding the McMinn reference, pages 9-10 of the remarks, “in McMinn, when a counter reaches a threshold, the corresponding cache line is moved from one area (prefetch way) of the local cache to another area (non-prefetch way) of same cache. This is different from the load operation claimed herein which copies a cache line from a remote cache to the local cache”.
The Examiner respectfully submits that Applicant is again interpreting McMinn in isolation without any consideration to the combination. Claims 2-6 were rejected by McMinn in combination with Steely and Bartik. Both Steely and Bartik teaches a requesting processor and an owner processor of a cache line. Steely teaches an owner processor 18 with a cache 32 containing a plurality of cache lines, see paras 0019-0020 and FIG. 1 of Steely. Steely also teaches a requesting processor 16 with a cache 30 containing a plurality of cache lines, 0019-0020 and FIG. 1 of Steely. The owner 106 (e.g., processor 18) replies by providing a coherent fill of the requested cache line to the source processor 102 (requesting processor 16), para 0038 and FIG. 3. Thus, Steely teaches copying a cache line from an owner processor to a requesting processor.
Moreover, Bartik also teaches copying of a cache line between an owner processor and a requesting processor. FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045 of Bartik. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4 of Bartik. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0048 and FIG. 4 of Bartik. A core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 
McMinn was merely relied upon to teach transferring of a cache line when a counter tracking accesses to the cache line exceeds a threshold. For example, McMinn teaches that if the counter exceeds a predetermined threshold value, then the data cache line is eligible for transfer to one of ways 30A-30N, col 4 lines 57-59 of McMinn.
Thus, Steely and Bartik modified in a motivated combination by McMinn discloses copying of a cache line between two processors when a counter tracking accesses to the cache line exceeds a predetermined threshold.
In view of the foregoing remarks, independent claims 1, 10, and 19 are not in a condition for allowance. Claims depending therefrom, either directly or indirectly, are also not in a condition for allowance.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9-10, 18-19, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Steely et al. US 2005/0154836 (“Steely”) in view of Bartik et al. US 2019/0108125 (“Bartik”).
As per independent claim 1, Steely teaches A system (FIG. 1 illustrates a multiprocessor system, paras 0009, 0018) comprising: 
a main memory (Memory 22, para 0018 and FIG. 1),
a first processor (Processor 18, para 0019 and FIG. 1) communicatively coupled to the main memory (Processor 18 and memory 22 define nodes in the system that can communicate with each other via requests and corresponding responses through a system interconnect 24, para 0019 and FIG. 1) and comprises a first cache to store a cache line (Processor 18 includes a cache 32 containing a plurality of cache lines, para 0020 and FIG. 1), wherein the cache line is associated with a cache coherence state to indicate that the first cache has sole ownership of the cache line (Referring to FIG. 3, in the illustrated example, a cache line is initially exclusive to an owner node (e.g., processor 18) such that the owner node is in an exclusive state, para 0037 and FIG. 3),
a second processor (Processor 16, para 0019 and FIG. 1) communicatively coupled to the main memory (Processor 16 and memory 22 define nodes in the system that can communicate with each other via requests and corresponding responses through a system interconnect 24, para 0019 and FIG. 1) and comprises:
a second cache (Processor 16 includes a cache 30 containing a plurality of cache lines, para 0020 and FIG. 1),
predictor circuitry (Pre-fetch buffer 104, para 0037) to track accesses to the cache line by monitoring executions of the snapshot instruction by the execution unit of the second processor (A given pre-fetch buffer determines one or more blocks of data that are likely to be needed by its associated processor based upon the current activity of the processor. For example, where a processor 16 has failed to find a desired data block in its cache, its associated pre-fetch buffer can obtain one or more additional the predictor circuitry is further to control enablement of a load operation based on the tracked accesses (The source processor 102 (e.g., processor 16) executes the provided speculative fill but also requests a coherent copy of the cache line, para 0038 and FIG. 3), wherein an enablement of the load operation is to cause 1) a copy of the cache line to be stored into the second cache (The owner 106 (e.g., processor 18) replies by providing a coherent fill of the requested cache line to the source processor 102, para 0038 and FIG. 3. See FIG. 3 where the cache line is placed in the source processor’s cache in a “S” (shared) state) and 2) the cache coherence state of the cache line in the first cache to be changed to shared (The cache line then assumes a shared state as the owner node 106 no longer has an exclusive copy of the cache line, para 0038 and FIG. 3).
Although Steely teaches uncached fill of a cache line from an owner processor without changing a cache coherence state of the cache line and the cache line remains with the owner processor, see para 0037 of Steely, Steely does not explicitly teach “a decoder to decode a snapshot instruction specifying a memory address of the cache line to generate a decoded snapshot instruction” and “an execution unit to execute the decoded snapshot instruction and to access the first cache in the first processor to read data in the cache line without changing the cache coherence state associated with the cache line, the cache line to remain in the first cache after the read” and “and wherein the predictor circuitry is to enable the load operation when a number of the accesses to the cache line made by the second processor via execution of the snapshot instructions exceeds a first threshold greater than or equal to 1”.
However, in an analogous art in the same field of endeavor, Bartik teaches a decoder to decode a snapshot instruction specifying a memory address of the cache line to generate a decoded snapshot instruction (FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0048 and FIG. 4. A core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 0049 and FIG. 4. Since cores 1 and 2 execute the Dirty-Read Fetch instructions, it is inherent that the instructions are decoded by a decoder to generate a decoded instruction prior to execution of the instructions by the cores 1 and 2);
an execution unit to execute the decoded snapshot instruction and to access the first cache in the first processor to read data in the cache line without changing the cache coherence state associated with the cache line, the cache line to remain in the first cache after the read (To facilitate multiple processors' attempting to read to a single cache line without interrupting existing operations of core 0, the Dirty-Read Fetch instruction is configured to allow the 
and wherein the predictor circuitry is to enable the load operation when a number of the accesses to the cache line made by the second processor via execution of the snapshot instructions exceeds a first threshold greater than or equal to 1 (Cache lines may be frequently referenced by multiple processors, while one or more cores may be updating the state of the cache line. The processors referencing the cache lines may not need the most recent copies of the cache lines, para 0021. There are many cases in a computer system where values such as counters or tokens are read or fetched at a high frequency but are actually updated at a low frequency. These cases do not require data to be perfectly accurate or coherent. As such, a non-coherent value which is close enough to the current value is sufficient. Given the frequency at which these data are read, the non-coherent value (which is a stale copy) would be close enough to the current value (i.e., the actual value), para 0035. FIG. 5 is a flowchart of a method to make a non-coherent request for a cache line (e.g., enabling a load operation of a cache line), para 0052. At 502, a first processor core (e.g., core 1 of the multiprocessor cores 16) sends a non-coherent fetch (e.g., Dirty-Read Fetch instruction) to a shared cache 32. At 504, in response to a second processor core having exclusive ownership of the cache line in the shared cache 32, the first processor 
Given the teaching of Bartik, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely with “a decoder to decode a snapshot instruction specifying a memory address of the cache line to generate a decoded snapshot instruction” and “an execution unit to execute the decoded snapshot instruction and to access the first cache in the first processor to read data in the cache line without changing the cache coherence state associated with the cache line, the cache line to remain in the first cache after the read” and “and wherein the predictor circuitry is to enable the load operation when a number of the accesses to the cache line made by the second processor via execution of the snapshot instructions exceeds a first threshold greater than or equal to 1”. The motivation would be that the dirty-read fetch instruction eliminates the need for cross-invalidation and serialization overhead, para 0050 of Bartik.
As per dependent claim 9, Steely in combination with Bartik discloses the system of claim 1. Steely teaches wherein the enablement of the load operation is to cause issuance of a prefetch instruction, the prefetch instruction specifying a cache level into which the copy of the cache line is to be stored, wherein an execution of the prefetch instruction is to cause the copy of the cache line to be stored into the specified cache level (The source processor provides a speculative fill request to the pre-fetch buffer 104 in response to a cache miss on the cache line. The pre-fetch buffer provides the buffered copy of the cache line to the processor 102 as a speculative fill. The source processor 102 (e.g., processor 16) executes the provided speculative fill but also requests a coherent copy of the cache line. The owner 106 (e.g., processor 18) replies by providing a coherent fill of the requested cache line to the source processor 102, para 0038 and FIG. 3).
As per independent claim 10, this claim is rejected based on arguments provided above for similar rejected independent claim 1.
As per dependent claim 18, this claim is rejected based on arguments provided above for similar rejected dependent claim 9.
As per independent claim 19, this claim is rejected based on arguments provided above for similar rejected independent claim 1.
As per dependent claim 26, Steely in combination with Bartik discloses the system of claim 1. Steely teaches wherein the execution unit is further configured to store the data read from the cache line into a register of the second processor and not into the second cache (An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system. The uncached fill occurs in the pre-fetch buffer 104, para 0037 and FIG. 3).
  Claims 2-6, 8, 11-15, 17, and 20-24 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn US 6,219,760 (“McMinn”).
further comprising a table to track accesses to the cache line, wherein a table entry corresponding the cache line is to store a count of the number of the accesses to the cache line made by the second processor via executions of the snapshot instruction by the second processor (Prefetch way 32 may include storage for a counter corresponding to each prefetch cache line, col 4 lines 54-56 and FIGS. 1-2).
Given the teaching of McMinn, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely and Bartik with “further comprising a table to track accesses to the cache line, wherein a table entry corresponding the cache line is to store a count of the number of the accesses to the cache line made by the second processor via executions of the snapshot instruction by the second processor”. The motivation would be that cache pollution would be minimized, col 2 lines 40-42 of McMinn.
As per dependent claim 3, Steely in combination with Bartik and McMinn discloses the system of claim 2. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein the count is incremented each time the cache line is accessed by the second processor via execution of the snapshot instruction by the second processor (Upon access to a prefetch cache line which is data, the corresponding counter may be incremented, col 4 lines 56-57).

As per dependent claim 4, Steely in combination with Bartik and McMinn discloses the system of claim 3. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein the predictor circuitry is also to enable the load operation when a total number of snapshot instructions executed by the second processor within a time period exceeds a totals thresholds (If the counter exceeds a predetermined threshold value, then the data cache line is eligible for transfer to one of ways 30A-30N, col 4 lines 57-59. Upon access to a prefetch cache line which is data, the corresponding counter may be incremented, col 4 lines 56-57. Clearly the counter is incremented over a period of time as accesses are made to the cache).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 3 is equally applicable to claim 4.
As per dependent claim 5, Steely in combination with Bartik and McMinn discloses the system of claim 4. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein the count is decremented each time a predetermined amount of time has passed (Prefetch cache lines which are not referenced may be replaced with other prefetch cache lines, col 9 lines 60-62).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 4 is equally applicable to claim 5.
As per dependent claim 6, Steely in combination with Bartik and McMinn discloses the system of claim 5. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein the predictor circuitry is configured to delete the table entry corresponding to the cache line when the count falls below a second threshold (Prefetch cache lines which are not referenced may be replaced with other prefetch cache lines, col 9 lines 60-62).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 5 is equally applicable to claim 6.
As per dependent claim 8, Steely in combination with Bartik discloses the system of claim 1. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein the enablement of the load operation is to cause the copy of the cache line to be stored into the second cache as a least recently used (LRU) cache line so that the copy of the cache line is more likely to be evicted from the second cache than other cache lines in the second cache (Control unit 20 may employ a least recently used (LRU) replacement strategy for allocating a cache line for replacement, col 5 lines 56-58).
Given the teaching of McMinn, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely and Bartik with “wherein the enablement of the load operation is to cause the copy of the cache line to be stored into the second cache as a least recently used (LRU) cache line so that the copy of the cache line is more likely to be evicted from the second cache than other cache lines in the second cache”. The motivation would be that cache pollution would be minimized, col 2 lines 40-42 of McMinn.
As per claims 11-15 and 17, these claims are respectively rejected based on arguments provided above for similar rejected claims 2-6 and 8.
.
Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn and in further view of Reed US 2016/0357664 (“Reed”).
As per dependent claim 7, Steely in combination with Bartik and McMinn discloses the system of claim 2. Steely, McMinn, and Bartik may not explicitly disclose, but Reed teaches wherein the table is a hash table and the memory address of the cache line is hashed to determine the corresponding table entry for the cache line (Selected bits of a memory address are hashed to obtain a hash value, para 0053).
Given the teaching of Reed, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely, McMinn, and Bartik with “wherein the table is a hash table and the memory address of the cache line is hashed to determine the corresponding table entry for the cache line”. The motivation would be that hashing the memory address would require a smaller memory footprint.
As per dependent claim 16, this claim is rejected based on arguments provided above for similar rejected claim 7.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUBAIR AHMED whose telephone number is (571)272-1655.  The examiner can normally be reached on 7:30AM - 5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID X YI can be reached on (571) 270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 






/ZUBAIR AHMED/Examiner, Art Unit 2132                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2132