DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to RCE filed on 03/01/2021. Claims 1-25 have been examined and are pending in this application.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/01/2021 has been entered.
Response to Arguments
Applicant's arguments filed 03/01/2021 have been fully considered but they are not persuasive.
A new reference (Rafacz et al. US 2014/0108740) is cited with respect to dependent claims 9 and 18. As such, Applicant’s argument with respect to dependent claims 9 and 18 are moot. 
Applicant argues, page 9 of the remarks, the combination does not at least describe ‘snapshot read tracking circuitry to track the executions of the plurality of decoded snapshot read instructions and the plurality of non-snapshot read instructions by the execution circuitry and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions.’”
The Examiner respectfully disagrees. The Examiner submits that the combination of Steely and Bartik teaches the above claim limitation. 
Steely teaches that a pre-fetch buffer 104 generates a pre-fetch request that requests an uncached fill of the cache line for a source processor 102 from the owner node 106 of the cache line. The pre-fetch buffer 104 can contain a plurality of pre-fetched cache lines for use by the source processor 102, para 0037 and FIG. 3 of Steely. Furthermore, in the background section of Steely, conventional coherence protocol is described. “Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time”. Para 0004 of Steely. Since, as described in para 0004, only one processor is able modify any part of the data at any one time, sharing data between two or more processors requires a change in coherency state of the data. Thus, Steely teaches “snapshot read tracking circuitry to track the executions of the plurality of decoded snapshot read instructions and the plurality of non-snapshot read instructions by the execution circuitry “ as required by independent claim 1 and similarly required by other independent claims 10 and 19.
Bartik teaches that there are many cases in a computer system where values such as counters or tokens are read or fetched at a high frequency but are actually updated at a low frequency. These cases do not require data to be perfectly accurate or detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions” as required by independent claim 1 and similarly required by other independent claims 10 and 19.
Applicant argues, page 9 of the remarks, “the combination does not at least describe ‘wherein the prefetched data stored in the second storage location is usable to fill a subsequent snapshot read instruction but not a subsequent non-snapshot read instruction executed by the execution circuitry.’”
The Examiner respectfully disagrees. Steely teaches that the pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of the cache line from the owner node. An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system, such that data is retrieved without changing the state associated with the data, para 0037 of Steely. Thus, at the time the pre-fetch buffer provides the buffered copy of the cache line to the source processor 102, the state of the cache line is non-coherent. 
In view of the foregoing remarks and the new reference, independent claims 1, 10, and 19 are not in a condition for allowance. Claims depending therefrom, either directly or indirectly, are also not in a condition for allowance.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 10-14, and 19-23 are rejected under 35 U.S.C. 103 as being unpatentable over Steely et al. US 2005/0154836 (“Steely”) in view of Bartik et al. US 2019/0108125 (“Bartik”).
As per independent claim 1, Steely teaches An apparatus (FIG. 1 depicts a system 10, para 0018) comprising:
whereas responsive to executing a plurality of decoded non-snapshot read instructions, the execution circuitry is to read data and cause a change in the location and/or cache coherence associated with one or more cache lines (Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time, para 0004);
snapshot read tracking circuitry (Based on the instant filed specification, a “snapshot read is a specialized data movement operation that moves a copy of the requested data from the source to the destination without altering the coherence and/or directory state of the requested data anywhere within the cache-coherent domain.” Para 0022 of the instant filed specification. Thus, Steely’s read and/or pre-fetch of a coherent cache line from an owner node without changing the coherency state of the cache line in the owner node meets the definition of a snapshot read. Steely describes that a pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3) to track the executions of the plurality of decoded snapshot read instructions and the plurality of non-snapshot read instructions by the execution circuitry (The pre-fetch buffer 104 generates a pre-fetch request that requests an uncached fill of the cache line for a source processor 102 from the owner node 106 of the cache line. The pre-fetch buffer 104 can contain a plurality of pre-fetched cache lines for use by the source processor 102, para 0037 and FIG. 3. Furthermore, in the background section of Steely, conventional coherence protocol is described. “Coherency protocols have been developed to ensure that whenever a processor reads or writes to a memory location it receives the correct or true data. Additionally, coherency protocols help ensure that the system state remains deterministic by providing rules to enable only one processor to modify any part of the data at any one time”. Para 0004 of Steely. Since, as described in para 0004, only one processor is able modify any part of the data at any one time, sharing data between two or more processors requires a change in coherency state of the data);
snapshot prefetch issuing circuitry (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3) to issue (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of a cache line from an owner node, para 0037 and FIG. 3), based on the detected snapshot read access stream (The pre-fetch , one or more snapshot prefetch requests, including a first snapshot prefetch request to prefetch data from a first cache line stored in, and owned exclusively by,  a first storage location outside the first processor core (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3), the snapshot prefetch issuing circuitry further to store the prefetched data in a second storage location within the first processor core and not in the coherent memory hierarchy (An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system. The uncached fill occurs in the pre-fetch buffer 104, para 0037 and FIG. 3), wherein after the prefetch, exclusive ownership of the first cache line is to remain with the first storage location (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3), and wherein the prefetched data stored in the second storage location is usable to fill a subsequent snapshot read instruction but not a subsequent non-snapshot read instruction executed by the execution circuitry (The pre-fetch buffer 104 can generate a pre-fetch request that requests an uncached fill of the cache line from the owner node. An uncached fill is the retrieval of a copy of a particular item of data outside of the cache coherency protocol of the system, such that data is retrieved without changing the state associated with the data, para 0037).
and a plurality of non-snapshot read instructions” and “execution circuitry to execute the plurality of decoded snapshot read instructions and the plurality of decoded non-snapshot read instructions to read data from one or more cache lines in a coherent memory hierarchy” and  “wherein responsive to executing the plurality of decoded snapshot read instructions, the execution circuitry is to read data without changing a location and cache coherence associated with the one or more cache lines” and “and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions”.
However, in an analogous art in the same field of endeavor, Bartik teaches a decoder to decode a plurality of snapshot read instructions and a plurality of non-snapshot read instructions (The shared cache 32 (and the memory 28 in general) can operate under the MESI protocol as understood by one skilled in the art. The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols which support write-back caches, para 0036. FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 
execution circuitry to execute the plurality of decoded snapshot read instructions and the plurality of decoded non-snapshot read instructions to read data from one or more cache lines in a coherent memory hierarchy (The shared cache 32 (and the memory 28 in general) can operate under the MESI protocol as understood by one skilled in the art. The MESI protocol is an Invalidate-based cache coherence protocol, and is one of the most common protocols which support write-back caches, para 0036. FIG. 4 depicts a scenario using the MESI protocol and a new Dirty-Read Fetch instruction, para 0045. At time T1, core 1 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0047 and FIG. 4. At time T2, core 2 executes a Dirty-Read Fetch instruction to cache line A in the shared cache 32 because there was a cache miss in the local caches L1, L2 of the core 1, and there is a cache hit in the shared cache 32 (L3), para 0048 and FIG. 4. A core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 0049 and FIG. 4);
wherein responsive to executing the plurality of decoded snapshot read instructions, the execution circuitry is to read data without changing a location and cache coherence associated with the one or more cache lines (To facilitate multiple processors' attempting to read to a single cache line without interrupting existing operations of core 0, the Dirty-Read Fetch instruction is configured to allow the requesting processor (e.g., core 1) access to a copy of the cache line A for a one-time reference/use with a snapshot of the current state/value, para 0047 and FIG. 4. The core 2 receives the copy of the cache line A using the Dirty-Read Fetch instruction, para 0048 and FIG. 4. The core 0 continues to have exclusive ownership of the cache line A of the shared cache 32 even after the Dirty-Read Fetch instructions have been requested for cache line A by cores 1 and 2, para 0049 and FIG. 4);
and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions (There are many cases in a computer system where values such as counters or tokens are read or fetched at a high frequency but are actually updated at a low frequency. These cases do not require data to be perfectly accurate or coherent. As such, a non-coherent value which is close enough to the current value is sufficient. Given the frequency at which these data are read, the non-coherent value (which is a stale copy) would be close enough to the current value (i.e., the actual value), para 0035).
Given the teaching of Bartik, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely with “a decoder to decode a plurality of snapshot and a plurality of non-snapshot read instructions” and “execution circuitry to execute the plurality of decoded snapshot read instructions and the plurality of decoded non-snapshot read instructions to read data from one or more cache lines in a coherent memory hierarchy” and  “wherein responsive to executing the plurality of decoded snapshot read instructions, the execution circuitry is to read data without changing a location and cache coherence associated with the one or more cache lines” and “and to detect a snapshot read access stream based on the tracked executions of the plurality of snapshot read instruction but not of the plurality of non-snapshot read instructions”. The motivation would be that the dirty-read fetch instruction eliminates the need for cross-invalidation and serialization overhead, para 0050 of Bartik. 
As per dependent claim 2, Steely in combination with Bartik discloses the apparatus of claim 1. Steely teaches wherein each of the plurality of cache lines is associated with a respective cache coherence state (The cache line is initially exclusive to the owner node 106, para 0037 and FIG. 3).
As per dependent claim 3, Steely in combination with Bartik discloses the apparatus of claim 2. Steely teaches wherein each of the snapshot read requests made by the first processor core, when performed, is to cause a reading of data stored in a target cache line of the plurality of cache lines, wherein the cache coherence state associated with the target cache line and a location in which the target cache line is stored are to remain the same after the read (The owner node 106 returns the requested uncached fill, but the cache line remains in an exclusive state with the owner node 106, para 0037 and FIG. 3).
wherein the first storage location is a cache of a second processor core (Cache line of the owner node 106, para 0037 and FIG. 3).
As per dependent claim 5, Steely in combination with Bartik discloses the apparatus of claim 1. Steely teaches wherein the first storage location is a system memory (Paragraph 0018 and FIGS. 1 and 3).
As per claims 10-14, these claims are respectively rejected based on arguments provided above for similar rejected claims 1-5.
As per claims 19-23, these claims are respectively rejected based on arguments provided above for similar rejected claims 1-5. For processor and memory see FIG. 1 of Steely.
Claims 6-8, 15-17, and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn US 6,219,760 (“McMinn”). 
As per dependent claim 6, Steely in combination with Bartik discloses the apparatus of claim 1. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein each entry in second storage location comprises an indicator to indicate whether data stored in the entry is snapshot prefetched data (A second state of a cache line indicates that the cache line was prefetched and has not yet been requested, col 8 lines 49-51. Referring to FIG. 7, referenced field 92 is a single bit which indicates the second state when cleared, col 9 lines 47-49).
Given the teaching of McMinn, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further 
As per dependent claim 7, Steely in combination with Bartik and McMinn discloses the apparatus of claim 6. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein snapshot prefetched data can only be used to fill snapshot read requests issued by the first processor core (A cache storage comprises a plurality of ways and at least one prefetch way 32. Prefetch way 32 is used to store prefetch cache lines, col 4 line 32).
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 6 is equally applicable to claim 7.
As per dependent claim 8, Steely in combination with Bartik and McMinn discloses the apparatus of claim 7. Steely and Bartik may not explicitly disclose, but McMinn teaches wherein snapshot prefetched data is removed from the second storage location when the snapshot prefetched data has been accessed by a subsequent snapshot read request, when the snapshot prefetched data is evicted in accordance to an eviction policy associated with the second storage location (When a prefetch cache line is presented to the cache for storage, the prefetch cache-line may displace another prefetch cache line, col 2 lines 54-56), or when the snapshot prefetched data has timed out.
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 7 is equally applicable to claim 8. 

As per dependent claims 24-25, these claims are respectively rejected based on arguments provided above for similar rejected dependent claims 6-7.
Claims 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Steely in view of Bartik and in further view of McMinn and in further view of Rafacz et al. US 2014/0108740 (“Rafacz”).
As per dependent claim 9, Steely in combination with Bartik and McMinn discloses the apparatus of claim 8. Steely and Bartik discloses all of the claimed limitations above, but does not explicitly teach “further comprising snapshot prefetch throttling circuitry to track a ratio between a number of snapshot prefetched data in the second storage that have been accessed by subsequent snapshot read requests and a number of snapshot prefetched data that have been evicted from the second storage location or have timed out, the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch requests are issued by the snapshot prefetch circuitry based on the ratio”. 
However, McMinn teaches further comprising snapshot prefetch throttling circuitry to track a ratio between a number of snapshot prefetched data in the second storage that have been accessed by subsequent snapshot read requests and a number of snapshot prefetched data that have been evicted from the second storage location or have timed out (By placing prefetch cache lines into prefetch way 32 and transferring the prefetch cache lines into ways 30A-30N if the prefetch cache lines are actually accessed, control unit 20 may advantageously prevent 
The same motivation that was utilized for combining Steely and McMinn as set forth in claim 8 is equally applicable to claim 9.
Steely in combination with Bartik and McMinn may not explicitly disclose, but in an analogous art in the same field of endeavor, Rafacz teaches the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch requests are issued by the snapshot prefetch circuitry based on the ratio (A prefetch throttle 105 maintains a table whereby each entry of the table stores a memory address associated with a prefetched cache line and an access bit to indicate whether a cache line associated with the memory address was accessed. The state of the access bits therefore collectively indicate a ratio of the accessed prefetch lines to non-accessed prefetch lines. The ratio can be used by a prefetcher 107 as a measure of the prefetch accuracy, para 0023. The prefetch throttle 105 compares an available memory bandwidth and the prefetch accuracy to corresponding threshold amounts and, based on the comparison, sends control signaling to the prefetcher 107 to throttle prefetching, para 0024).
Given the teaching of Rafacz, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to further modify the scope of the invention of Steely, Bartik, and McMinn with “the snapshot prefetch throttling circuitry further to adjust a rate at which the one or more snapshot prefetch 
As per dependent claim 18, this claim is rejected based on arguments provided above for similar rejected dependent claim 9.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZUBAIR AHMED whose telephone number is (571)272-1655.  The examiner can normally be reached on 7:30AM - 5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DAVID X YI can be reached on (571) 270-7519.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 






/ZUBAIR AHMED/Examiner, Art Unit 2132                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2132