DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to amendment filed on 08/26/2021. Claims 1-25 have been examined and are pending in this application.
Response to Arguments
Applicant's arguments filed 08/26/2021 have been fully considered but they are not persuasive.
Applicant argues, page 9 of the remarks, the combination (i.e., Steely and Bartik) does not at least describe the claim limitation “snapshot read tracking circuitry to track the execution of the plurality of decoded snapshot read instructions and the plurality of non-snapshot read instructions by the execution circuitry and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instructions but not of the plurality of non-snapshot read instructions”.
As a basis for this argument, Applicant reproduces the Examiner’s rejection on page 10 of the Office Action of 04/26/2021 which is further reproduced below for the convenience of the Applicant.
There are many cases in a computer system where values such as counters or tokens are read or fetched at a high frequency but are actually updated at a low frequency. These cases do not require data to be perfectly accurate or coherent. As such, a non-coherent value which is close enough to the current value is sufficient. Given the frequency at which these data are read, the non-coherent value (which is a stale copy) would be close enough to the current value (i.e., the actual value), para 0035 of Bartik.

FIG. 4 of the Bartik reference depicts an example scenario using the new Dirty-Read Fetch instruction. A cache controller 206 is not illustrated in FIG. 4, para 0045 of Bartik.
Referring to FIG. 4 of Bartik, at time T0 a core 0 executes an exclusive fetch (different from the new Dirty-Read Fetch instruction) to cache line A in a shared cache 32 because of a miss in the local caches of core 0, para 0046 and FIG. 4 of Bartik.
At time T1 (FIG. 4 of Bartik), core 1 executes a Dirty-Read Fetch instruction to the cache line A in the shared cache 32 because there was a miss in the local caches of core 1. Because core 1 now executes a Dirty-Read Fetch instruction, no cross-invalidate (XI) is sent to core 0. The Dirty-Read Fetch instruction executed by core 1 allows core 1 access to a copy of the cache line A for a one-time reference/use with a snapshot of the current state/value which could be stale/old. Core 1 stores and marks this stale copy in its local caches as do not copy state (one time use only),  para 0047 and FIG. 4 of Bartik.

For the convenience of the Applicant, the above claim limitation with clear mappings is shown below in view of Bartik.
snapshot read tracking circuitry (Cache controller 206, para 0045 of Bartik) to track the execution of the plurality of decoded snapshot read instructions (Referring to FIG. 4, at time T1, core 1 executes a Dirty-Read Fetch instruction for a cache line A, para 0047 of Bartik. At time T2, core 2 executes a Dirty-Read Fetch instruction for the cache line A, para 0048 and FIG. 4 of Bartik) and the plurality of non-snapshot read instructions (Referring to FIG. 4, at time T0, core 0 executes an exclusive fetch (different from the Dirty-Read Fetch instruction) for the cache line A, para 0046 and FIG. 4 of Bartik) by the execution circuitry (Core 0, pare 0046. Core 1, para 0047. And Core 2, para 0048 and FIG. 4) and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instructions but not of the plurality of non-snapshot read instructions (To facilitate multiple processors' attempting to read to a single cache line without interrupting existing operations of core 0, the Dirty-Read Fetch instruction is configured to allow the requesting processor (e.g., core 1) access to a copy of the cache line A for a one-time reference/use with a snapshot of the current state/value (which could be stale/old)).


The Examiner respectfully submits that the Rafacz reference maintains a table of prefetched cache lines. For each prefetched cache line, there is an associated access bit to indicate whether the prefetched cache line was accessed, para 0023 of Rafacz. The state of the access bits therefore collectively indicate a ratio of accessed prefetched lines to non-accessed prefetched lines, para 0024 of Rafacz. “The prefetch monitor 219 monitors the prefetcher 107 and the cache 104 to determine when data has been prefetched to the cache 104, and also monitors the cache 104 to determine when prefetched data has been evicted from the cache 104. Based on this information, the prefetch monitor 219 updates the prefetch accuracy table 220 to reflect the amount of prefetched data, in cache lines, stored at the cache 104.” Paragraph [0029] and FIG. 2 of Rafacz (emphasis added). Thus, the cache is monitored when prefetched data have been evicted and based on this monitoring updates the prefetch accuracy table 220. Accordingly, based on the monitoring of the prefetched data that have been evicted from the cache, the ratio may be calculated between prefetched cache lines that have been accessed and the prefetched cache lines that have been evicted.
In view of the foregoing remarks, independent claims 1, 10, and 19 are not in a condition for allowance. Claims depending therefrom, either directly or indirectly, are also not in a condition for allowance.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 10-14, and 19-23 are rejected under 35 U.S.C. 103 as being unpatentable over Steely et al. US 2005/0154836 (“Steely”) in view of Bartik et al. US 2019/0108125 (“Bartik”).
As per independent claim 1, Steely teaches An apparatus (FIG. 1 depicts a system 10, para 0018) comprising:
whereas responsive to executing a plurality of decoded non-snapshot read instructions, the execution circuitry is to read data and cause a change in the location and/or cache coherence associated with one or more cache lines (Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time, para 0004);
snapshot read tracking circuitry (Based on the instant filed specification, a “snapshot read is a specialized data movement operation that moves a copy of the requested data from the source to the destination without altering the coherence and/or directory state of the requested data anywhere within the cache-coherent domain.” Para 0022 of the instant filed specification. Thus, Steely’s read and/or pre-fetch of a coherent cache line from an owner node without changing the coherency state of the cache line in the owner node meets the definition of a snapshot read. Steely describes that a pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3) to track the executions of the plurality of decoded snapshot read instructions and the plurality of non-snapshot read instructions by the execution circuitry (The pre-fetch buffer 104 generates a pre-fetch request that requests an uncached fill of the cache line for a source processor 102 from the owner node 106 of the cache line. The pre-fetch buffer 104 can contain a plurality of pre-fetched cache lines for use by the source processor 102, para 0037 and FIG. 3. Furthermore, in the background section of Steely, conventional coherence protocol is described. “Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time”. Para 0004 of Steely. Since, as described in para 0004, only one processor is able modify any part of the data at any one time, sharing data between two or more processors requires a change in coherency state of the data);
snapshot prefetch issuing circuitry (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3) to issue (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3), based on the detected snapshot read access stream (The pre-fetch , one or more snapshot prefetch requests, including a first snapshot prefetch request to prefetch data from a first cache line stored in, and owned exclusively by,  a first storage location outside the first processor core (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3), the snapshot prefetch issuing circuitry further to store the prefetched data in a second storage location within the first processor core and not in the coherent memory hierarchy (An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system. The uncached fill occurs in the pre-fetch buffer 104, para 0037 and FIG. 3), wherein after the prefetch, exclusive ownership of the first cache line is to remain with the first storage location (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3), and wherein the prefetched data stored in the second storage location is usable to fill a subsequent snapshot read instruction but not a subsequent non-snapshot read instruction executed by the execution circuitry (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of the cache line from the owner node. An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system, such that data is retrieved without changing the state associated with the data, para 0037).

However, in an analogous art in the same field of endeavor, Bartik teaches a decoder to decode a plurality of snapshot read instructions and a plurality of non-snapshot read instructions (The shared cache 32 (and the memory 28 in general) can operate under the MESI protocol as understood by one skilled in the art. The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols which support write-back caches, para 0036. FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 
execution circuitry to execute the plurality of decoded snapshot read instructions and the plurality of decoded non-snapshot read instructions to read data from one or more cache lines in a coherent memory hierarchy (The shared cache 32 (and the memory 28 in general) can operate under the MESI protocol as understood by one skilled in the art. The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols which support write-back caches, para 0036. FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0048 and FIG. 4. A core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 0049 and FIG. 4);
wherein responsive to executing the plurality of decoded snapshot read instructions, the execution circuitry is to read data without changing a location and cache coherence associated with the one or more cache lines (To facilitate multiple processors' attempting to read to a single cache line without interrupting existing operations of core 0, the Dirty-Read Fetch instruction is configured to allow the requesting processor (e.g., core 1) access to a copy of the cache line A for a one-time reference/use with a snapshot of the current state/value, para 0047 and FIG. 4. The core 2 receives the copy of the cache line A using the Dirty-Read Fetch instruction, para 0048 and FIG. 4. The core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 0049 and FIG. 4);
snapshot read tracking circuitry (Cache controller 206, para 0045 of Bartik) to track the execution of the plurality of decoded snapshot read instructions (Referring to FIG. 4, at time T1, core 1 executes a Dirty-Read Fetch instruction for a cache line A, para 0047 of Bartik. At time T2, core 2 executes a Dirty-Read Fetch instruction for the cache line A, para 0048 and FIG. 4 of Bartik) and the plurality of non-snapshot read instructions (Referring to FIG. 4, at time T0, core 0 executes an exclusive fetch (different from the Dirty-Read Fetch instruction) for the cache line A, para 0046 and FIG. 4 of Bartik) by the execution circuitry (Core 0, pare 0046. Core 1, para 0047. And Core 2, para 0048 and FIG. 4) and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instructions but not of the plurality of non-snapshot read instructions (To facilitate multiple processors' attempting to read to a single cache line without interrupting 
Given the teaching of Bartik, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely with “a decoder to decode a plurality of snapshot read instructions and a plurality of non-snapshot read instructions” and “execution circuitry to execute the plurality of decoded snapshot read instructions and the plurality of decoded non-snapshot read instructions to read data from one or more cache lines in a coherent memory hierarchy” and  “wherein responsive to executing the plurality of decoded snapshot read instructions, the execution circuitry is to read data without changing a location and cache coherence associated with the one or more cache lines” and “and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions”. The motivation would be that the dirty-read fetch instruction eliminates the need for cross-invalidation and serialization overhead, para 0050 of Bartik. 
As per dependent claim 2, Steely in combination with Bartik discloses the apparatus of claim 1. Steely teaches wherein each of the plurality of cache lines is associated with a respective cache coherence state (The cache line is initially exclusive to the owner node 106, para 0037 and FIG. 3).
As per dependent claim 3, Steely in combination with Bartik discloses the apparatus of claim 2. Steely teaches wherein each of the snapshot read requests made by the first processor core, when performed, is to cause a reading of data stored in a target cache line of the plurality of cache lines, wherein the cache coherence state associated with the target cache line and a location in which the target cache line is stored are to remain the same after the read (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3).
As per dependent claim 4, Steely in combination with Bartik discloses the apparatus of claim 1. Steely teaches wherein the first storage location is a cache of a second processor core (Cache line of the owner node 106, para 0037 and FIG. 3).
As per dependent claim 5, Steely in combination with Bartik discloses the apparatus of claim 1. Steely teaches wherein the first storage location is a system memory (Paragraph 0018 and FIGS. 1 and 3).
As per claims 10-14, these claims are respectively rejected based on arguments provided above for similar rejected claims 1-5.
As per claims 19-23, these claims are respectively rejected based on arguments provided above for similar rejected claims 1-5. For processor and memory see FIG. 1 of Steely.
Claims 6-8, 15-17, and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn US 6,219,760 (“McMinn”). 
As per dependent claim 6, Steely in combination with Bartik discloses the apparatus of claim 1. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein each entry in second storage location comprises an indicator to indicate whether data stored in the entry is snapshot prefetched data (A second state of a 
Given the teaching of McMinn, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely and Bartik with “wherein each entry in second storage location comprises an indicator to indicate whether data stored in the entry is snapshot prefetched data”. The motivation would be that cache pollution would be minimized, col 2 lines 40-42 of McMinn.
As per dependent claim 7, Steely in combination with Bartik and McMinn discloses the apparatus of claim 6. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein snapshot prefetched data can only be used to fill snapshot read requests issued by the first processor core (A cache storage comprises a plurality of ways and at least one prefetch way 32. Prefetch way 32 is used to store prefetch cache lines, col 4 line 32).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 6 is equally applicable to claim 7.
As per dependent claim 8, Steely in combination with Bartik and McMinn discloses the apparatus of claim 7. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein snapshot prefetched data is removed from the second storage location when the snapshot prefetched data has been accessed by a subsequent snapshot read request, when the snapshot prefetched data is evicted in accordance to an eviction policy associated with the second storage location (When a prefetch cache line is presented to the cache for storage, the prefetch cache-line may displace another prefetch cache line, col 2 lines 54-56), or when the snapshot prefetched data has timed out.
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 7 is equally applicable to claim 8. 
As per dependent claims 15-17, these claims are respectively rejected based on arguments provided above for similar rejected dependent claims 6-9.
As per dependent claims 24-25, these claims are respectively rejected based on arguments provided above for similar rejected dependent claims 6-7.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn and in further view of Rafacz et al. US 2014/0108740 (“Rafacz”).
As per dependent claim 9, Steely in combination with Bartik and McMinn discloses the apparatus of claim 8. Steely and Bartik discloses all of the claimed limitations above, but does not explicitly teach “further comprising snapshot prefetch throttling circuitry to track a ratio between a number of snapshot prefetched data in the second storage that have been accessed by subsequent snapshot read requests and a number of snapshot prefetched data that have been evicted from the second storage location or have timed out, the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch requests are issued by the snapshot prefetch circuitry based on the ratio”. 
However, McMinn teaches further comprising snapshot prefetch throttling circuitry to track a ratio between a number of snapshot prefetched data in the second storage that have been accessed by subsequent snapshot read requests and a number of snapshot prefetched data that have been evicted from the second storage location or have timed out (By placing prefetch cache lines into prefetch way 32 and transferring the prefetch cache lines into ways 30A-30N if the prefetch cache lines are actually accessed, control unit 20 may advantageously prevent pollution of the cached data (in ways 30A-30N) with prefetch data. Cache lines stored in ways 30A-30N have been accessed by microprocessor 12, and are not displaced by prefetch cache lines until the prefetch cache lines are accessed, col 4 line 62 col 5 line 2).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 8 is equally applicable to claim 9.
Steely in combination with Bartik and McMinn may not explicitly disclose, but in an analogous art in the same field of endeavor, Rafacz teaches the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch requests are issued by the snapshot prefetch circuitry based on the ratio (A prefetch throttle 105 maintains a table whereby each entry of the table stores a memory address associated with a prefetched cache line and an access bit to indicate whether a cache line associated with the memory address was accessed. The state of the access bits therefore collectively indicate a ratio of the accessed prefetch lines to non-accessed prefetch lines. The ratio can be used by a prefetcher 107 as a measure of the prefetch accuracy, para 0023. The prefetch throttle 105 compares an available memory bandwidth and the prefetch accuracy to corresponding threshold amounts and, based on the comparison, sends control signaling to the prefetcher 107 to also monitors the cache 104 to determine when prefetched data has been evicted from the cache 104. Based on this information, the prefetch monitor 219 updates the prefetch accuracy table 220 to reflect the amount of prefetched data, in cache lines, stored at the cache 104.” Paragraph [0029] and FIG. 2 of Rafacz (emphasis added). Thus, the cache is monitored when prefetched data have been evicted and based on this monitoring updates the prefetch accuracy table 220. Accordingly, based on the monitoring of the prefetched data that have been evicted from the cache, the ratio may be calculated between prefetched cache lines that have been accessed and the prefetched cache lines that have been evicted).
Given the teaching of Rafacz, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely, Bartik, and McMinn with “the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch requests are issued by the snapshot prefetch circuitry based on the ratio”. The motivation would be that the techniques improve processing efficiency by throttling the prefetching data to a cache, para 0009 of Rafacz. 
As per dependent claim 18, this claim is rejected based on arguments provided above for similar rejected dependent claim 9.
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUBAIR AHMED whose telephone number is (571)272-1655.  The examiner can normally be reached on 7:30AM - 5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID X YI can be reached on (571) 270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-






/ZUBAIR AHMED/Examiner, Art Unit 2132                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2132