Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTIONResponse to Amendment
This office action has been issued in response to the response filed 11/11/20.  Claims 1-14, 17-19, 21-23 are pending in this application. Applicant's arguments have been carefully considered, but are not persuasive in view of the “response to arguments” section below.  The examiner appreciates Applicant's effort to distinguish over the cited prior art by presenting arguments in an attempt to distinguish or clarify the claimed invention, however, upon further consideration and/or search, the claims remain unpatentable over the cited prior art for the reasons articulated in the “response to arguments” section below.  All claims pending in the instant application remain rejected and clarification and/or elaboration regarding why the claims are not in condition for allowance will hereafter be provided in order to efficiently further prosecution.  Accordingly, this action is made FINAL.

Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1-14, 17-19, 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Baartmans et al. (US PGPUB # 20170286260) in view of Arimilli et al (US PGPUB # 20040139305) further in view of Chen (US PGPUB # 20170213028) and further in view of non-patent literature titled independent claim 1, 12, 19 Baartmans/Arimilli/Chen/ ECMON discloses:   A computing device, comprising: 
              a plurality of processing units [plurality of cores 102a-n – Baartmans fig 1] [multicore processing system – ECMON abstract]; 
              a plurality of (N-1)-level caches [shared caches: As known in the art, when multiple cores share common resources such as a common cache or memory (not shown in this view), then data coherency may be tracked to ensure that stale data is not incorrectly used by the cores - Baartmans 0031] [exemplary plurality caches depicted as 104a-n but could include L1/L2/L3 cache hierarchy - Baartmans fig 1; cores 102a-n may have one or more caches and/or other local memory devices, representatively illustrated as caches 104a-m. Cores 102a-n, and where applicable, caches 104a-m may be coupled to one another and to main memory 108 through a system bus or interconnect 106 - Baartmans 0020]; 
              an N-level cache that is associated with two or more of the plurality of (N-I)-level caches, and that is configured as a backing store for the two or more (N-1)-level caches [shared cache hierarchy – Arimilli 0040-0042, fig 3-4] [shared main memory 108 is analogous to shared cache since a shared main memory corresponds to one level of shared memory in a memory hierarchy just as a shared cache is one level of shared memory in a memory hierarchy (understanding evidenced by 0069 of DIESTELHORST US PGPUB # 20170351517) and functions as backing store for the higher level cache memories within the memory hierarchy - Baartmans fig 1; shared cache(s) at least implicitly taught in 0031 of Baartmans]; and 
              control logic [tracing mechanisms - Baartmans 0002] that configures the computing device to record execution of an entity that executes parallel threads across two or more of the plurality of processing units [ECMon optimizes the detection of miss-speculation “we use this simple support to speculate past active barriers and achieve a speedup of 12% for the set of parallel programs considered” - ECMON abstract; “Record-replay based debugging [12, 17, 22, 35, 40] is a technique that helps ensure software reliability by recording program execution, so as to enable replay to help in debugging. In a multicore setting, ISMDs need to be recorded in addition to recording other non-deterministic events. Let us consider the example shown in Fig. 1(b) which shows the parallel execution of two functions (no speculation involved) fun1 and fun2. As we can see, fun2 reads the value written by fun1 and this RAW dependency must be recorded to ensure that ld2 gets the correct value (from st1) during replay”, in other words, record execution of parallel threads across two or more processing units - ECMON fig 3 on page 353 in view of introduction, right col on page 349 & ECMON p352, left col, section 2.1, first paragraph teaches “ECMon is complete, in that it is guaranteed to expose all ISMDs. ISMDs consisting of RAW, WAW and WAR dependences can be exercised via two modes: through the cache coherence system or through the memory. By exposing all invalidate events, we make sure that we expose all WAR dependences exercised through the coherence system. Similarly, by exposing all data value reply events, we make sure that we expose all RAW and WAW dependences exercised through coherence. However, not all dependences are exercised through cache coherence system; some are exercised through the main memory due to cache block replacements”], based on performing at least the following: 
              based at least on detecting an influx of memory data from a particular memory location in system memory to a particular cache location in the N-level cache that results from execution of a first thread of the entity by a first of the plurality of processing units [first thread executed by first processor – ECMON page 357 section 4.2.1, left col, (Thread switches); cache-miss (influx) event is exposed for implementing cache coherence in software [11] and implementing software controlled multithreading -  ECMON page 350, right col, 2nd paragraph] [wherein the first transaction comprises a cache access (cache access will be hit/miss) - Baartmans 0037], cause the influx of memory data to the particular cache location in the N-level cache to be logged into a repayable trace of the entity [time-deterministic replay to reproduce the execution of a program including its precise timing – Chen abstract, 0048, 0062, 0072-0073; deterministic replay, i.e., to ask the remote machine to record all nondeterministic inputs (such as messages, interrupts, etc.) along with the precise point in the program where each of them occurred, and to inject these inputs at the same points during replay -  Chen 0043 in view of 0048 teaching Memory: Different memory accesses during play and replay and/or different memory layouts can increase or decrease the number of cache misses at all levels, and/or affect their timing & [0072] To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices] [in addition to latencies, other tracing information can also be obtained at trace points, such as tracing information related to responses from the agents for transaction 200 (e.g., whether transaction 200 resulted in a retry, cache states of a cache line accessed, such as : dirty, exclusive, shared among multiple processors, etc. as known in the art). Furthermore, identifying and tagging transactions at the various trace points can also assist in deriving metrics such as cache hit /miss rates which comprise a cache access, (e.g., by studying the number of transactions which pass through trace point A 110a for transactions comprising to cache requests originating from core 102a to cache 104a, and of those, the number of transactions which miss in cache 104a to pass through trace point B 110b). In some cases, using the tagging mechanisms at trace points (e.g., as shown and described with reference to trace point A 110a in FIG. 2A), information regarding localities on the system fabric to which transactions are directed can also be identified. For example, for the multiple caches 104a-m in processing system 100 of FIG. 1, trace points A-D 110a-d may be configured to assist in identifying proximity of data to a consuming device, or locality of a consuming device of transaction 200, in processing system 100, by tagging transactions according to exemplary aspects described above (e.g., which one of caches 104a-m may comprise data requested from cores 102a-n can be identified based on transaction 200 identified at the various trace points). Determining such proximity can assist software or operating systems to organize data (e.g., in caches 104a-m) to be located in close proximity to corresponding consuming or requesting devices (e.g., cores 102a-n) - Baartmans 0029]; and 
              based at least on having caused the influx of memory data to the particular cache location in the N-level cache to be logged into the replayable trace of the entity, subsequently cause one or more (N-1)-level cache coherence protocol (CCP) transitions between the two or more (N-1)-level caches to be logged into the repayable trace of the entity, the (N-1)-level CCP transitions resulting from the particular cache location being accessed by a second of the plurality of processing units based on execution of a second thread of the entity by the second of the plurality of processing units [speculatively execute threads past active barriers [8, 14] and use ECMon to detect miss-speculation - ECMON page 350, right col, 3rd paragraph; Recording the execution of multithreaded programs involves the recording of ISMDs, since their order is non-deterministic … ECMon support can be used to record these dependences - ECMON page 354, right col, section 4, 1st paragraph] [shared/dirty line is the result of access by plurality of processing units, states of cache lines are traced/logged - Baartmans 0029 in view of Chen teaching time-deterministic replay to reproduce the execution of a program including its precise timing – Chen abstract, 0048, 0062, 0072-0073; deterministic replay, i.e., to ask the remote machine to record all nondeterministic inputs (such as messages, interrupts, etc.) along with the precise point in the program where each of them occurred, and to inject these inputs at the same points during replay -  Chen 0043 in view of 0048 teaching Memory: Different memory accesses during play and replay and/or different memory layouts can increase or decrease the number of cache misses at all levels, and/or affect their timing].
              Baartmans does not explicitly disclose a shared cache hierarchy, although Baartmans’s disclosure as cited above in the rejection rationale appears to implicitly teach and suggest the instant limitation.
Nevertheless, in the same field of endeavor Arimilli teaches hardware enabled instruction tracing in a multiprocessing system employing a cache hierarchy (Arimilli title, abstract, fig 3-4) further teaching a cache hierarchy including a large (shared) level two (L2) cache shared by multiple processor cores 108 (Arimilli 0040-0041).
It would have been obvious to one having ordinary skill in the art before the effective filing date of the invention to implement a shared cache hierarchy in the invention of Baartmans as taught by Arimilli because this would be advantageous for optimizing memory access and reducing access latency (Arimilli 0040-0042).
explicitly disclose a replayable trace of the entity.
Nevertheless, in the same field of endeavor Chen teaches time-deterministic replay (TDR) that can reproduce the execution of a program, including its precise timing and memory access operations … To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices.  - Chen abstract, 0048, 0062, 0072-0073. 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the invention to implement a replayable trace of the entity in the invention of Baartmans/Arimilli as taught by Chen because this would be advantageous for testing and ensuring normal execution of a program (Chen 0196-0197).
Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units.
Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph. 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the invention to implement parallel threads across multiple processing units and the ability to replay cache transactions performed by the parallel threads in the invention of Baartmans/Arimilli/Chen as taught by ECMON because this would be advantageous for optimizing multi-core and multi-threaded processing (ECMON page 349, right col, Introduction, 1st paragraph).
	With respect to dependent claim 2, 13 Baartmans/Arimilli/Chen discloses the computing device also comprises a plurality of (N-2)-level caches, and in which each (N-1)-level cache is associated with two or more of the plurality of (N-2)-level caches [Cache hierarchy 110 may include, for example, separate bifurcated level one (L1) instruction and data caches for each processor core 108 (two or more N-2 level caches) and a large shared level two (L2) cache ((N-1)-level cache is associated with two or more of the plurality of (N-2)-level caches) by multiple processor cores 108 – Arimilli fig 4 in view of 0040-0041;  Memories 104 of all of processing units 102 collectively form the lowest level of volatile memory (often called "system memory") within server computer system 100, which is generally accessible to all processing units 102 (and their constituent cache memories) - Arimilli fig 4 in view of 0041] [shared caches: As known in the art, when multiple cores share common resources such as a common cache or memory (not shown in this view), then data coherency may be tracked to ensure that stale data is not incorrectly used by the cores - Baartmans 0031], and the (N-1)-level cache is configured as a backing store for the two or more (N-2)-level caches [shared L2 cache functional as backing store for L1 caches which share the L2 cache and main memory functional as backing store for all cache/memory levels higher in system memory hierarchy – Arimilli 0040-0041; in other words, each shared relatively slower level cache is functional as a backing store for caches higher in the cache/memory hierarchy, i.e shared L2 cache/memory is backing store for two or more L1 caches - Baartmans 0031]; and the stored control logic also configures the computing device to cause one or more (N- 2)-level CCP transitions between the two or more (N-2)-level caches to be logged into the replayable trace of the entity [cache states are logged when a transaction is traced - Baartmans 0029 in view of Chen 0048, 0062, 0072-0073], the (N-2)- level CCP transitions resulting from the particular cache location being accessed by the two or more of the plurality of processing units [shared/dirty line is the result of access by plurality of processing units, states of cache lines are traced/logged - Baartmans 0029]. 
	With respect to dependent claim 3 Baartmans/Arimilli/Chen discloses wherein the influx of memory data results from one or more cache misses on the N-level cache as a result of execution of the entity [cache access will propagate/miss through hierarchy until it hits in an ultimate backing store, in other words, whether transaction 200 resulted in a retry, cache states of a cache line accessed, such as: dirty, exclusive, shared among multiple processors, etc. as known in the art - Baartmans 0029 in view of Chen 0048, 0062, 0072-0073].
	With respect to dependent claim 4 Baartmans/Arimilli/Chen discloses wherein the one or more (N-1)-level CCP transitions comprise one or more points of transition among periods of stores and periods of loads [load/store or read/write operations/transactions to cache/memory will influence cache coherence states and transitions therebetween - Baartmans 0031]. 
	With respect to dependent claim 5 Baartmans/Arimilli/Chen discloses wherein causing the one or more (N-1)-level CCP transitions to be logged comprises logging a value stored in the particular cache location at one or more of the points of transition [shared/dirty transition is logged with respect to a cache line & is the result of access, states of cache lines are traced/logged - Baartmans 0029 – other tracing information can also be obtained at trace points, such as tracing information related to responses from the agents for transaction 200 (e.g., whether transaction 200 resulted in a retry, cache states of a cache line accessed, such as: dirty, exclusive, shared among multiple processors, etc. as known in the art; trace data useable for debugging comprises logging values stored in locations - Baartmans 0002-0004].
	With respect to dependent claim 6 Baartmans/Arimilli/Chen discloses the computing device also comprises a buffer; and the control logic also configures the computing device to perform deferred logging into the replayable trace of the entity based on one or more one or more of: storing the influx of memory data from the particular memory location to the buffer, or storing the one or more (N-1)-level CCP transitions between the two or more (N-1)-level caches to the buffer [performance counter may function as buffer within trace point A performing deferred logging of an event such as an influx/miss which may later be used to analyze cache access - Baartmans fig 2A in view of 0029 further in view of Chen 0048, 0062, 0072-0073;  Furthermore, identifying and tagging transactions at the various trace points can also assist in deriving metrics such as cache hit/miss rates which comprise a cache access, (e.g., by studying the number of transactions which pass through trace point A 110a for transactions comprising to cache requests originating from core 102a to cache 104a, and of those, the number of transactions which miss in cache 104a to pass through trace point B 110b). In some cases, using the tagging mechanisms at trace points (e.g., as shown and described with reference to trace point A 110a in FIG. 2A), information regarding localities on the system fabric to which transactions are directed can also be identified - Baartmans  0029 in view of teaching in Arimilli fig 5 paragraph 0055-0056 that buffering may facilitate deferred operations]. 
	With respect to dependent claim 7 Baartmans/Arimilli/Chen discloses wherein the computing device also comprises one or more logging control bits that control whether one or more of the plurality of processing units participate in logging [declaration/initiation of trace point A  is understood to read on setting one or more control bits which control whether core 102a will participate in logging - Baartmans fig 1; Trigger 211 is a control signal which triggers or causes trace tagging logic 208 to tag transaction 200. By tagging transaction 200 in this manner, transaction 200 is identified as a transaction to be monitored - Baartmans 0023].
	With respect to dependent claim 8 Baartmans/Arimilli/Chen discloses wherein the control logic comprises one or more of circuitry or stored microcode [Baartmans 0018].
	With respect to dependent claim 9, 17 Baartmans/Arimilli/Chen discloses wherein at least the N-level cache includes one or more accounting bits that identify one or more of (i) whether a value at the particular cache location has been logged, or (ii) whether the value at the particular cache location should be logged [accounting bits indicating cache states of a cache line accessed, such as: dirty, exclusive, shared among multiple processors, etc – Baartmans 0029; a cache access transaction will make use of a trace tag identifier (additional accounting bits) which will identify whether values at cache locations are being logged or should be logged - Baartmans 0030, 0037;  Trigger 211 is a control signal which triggers or causes trace tagging logic 208 to tag (cache access) transaction 200. By tagging transaction 200 in this manner, transaction 200 is identified as a transaction to be monitored (indicating (ii) whether the value at the particular location should be logged - Baartmans 0023].
With respect to dependent claim 10 Baartmans/Arimilli/Chen discloses wherein the control logic also configures the computing device to set at least one accounting bit in the N-level cache based on a communication from one or more of the (N-1)-Level caches [setting an accounting bit in a slower cache based on a communication such as an access request from a faster cache in a cache hierarchy that results in a line being marked dirty/shared or otherwise indicates a cache access transaction, in other words, accounting bits indicating cache states of a cache line accessed, such as: dirty, exclusive, shared among multiple processors, etc may be set according to different cache access operations – Baartmans 0029; a cache access transaction will make use of a trace tag identifier (additional accounting bits) which will identify whether values at cache locations are being logged or should be logged - Baartmans 0030, 0037;  Trigger 211 is a control signal which triggers or causes trace tagging logic 208 to tag (cache access) transaction 200. By tagging transaction 200 in this manner, transaction 200 is identified as a transaction to be monitored (indicating (ii) whether the value at the particular location should be logged - Baartmans 0023]. 
With respect to dependent claim 11, 18 Baartmans/Arimilli/Chen discloses wherein the control logic also configures the computing device to use the one or more accounting bits to refrain from logging influxes of memory data resulting from speculative execution of an instruction [untagged transactions and the caches they may access will not be logged - Baartmans 0021, 0023].
With respect to dependent claim 14 Baartmans/Arimilli/Chen discloses causing one or more CCP transitions based on activity of a single processing unit to be logged [a CCP transition into exclusive state is based on the activity of a single processing unit – Baartmans 0029]. 
With respect to dependent claim 21 Baartmans/Arimilli/Chen/ECMON discloses wherein causing the one or more (N- 1)-level cache coherence protocol (CCP) transitions between the two or more (N-1)-level caches to be logged into the replayable trace of the entity comprises causing an identity of a point of the each (N-1)- level CCP transition to be logged into the replayable trace of the entity, including identifying the point of each (N-1)-level CCP transition as corresponding to one of (i) a first transition from a load CCP state of the particular cache location to a store CCP state of the particular cache location, or (ii) a second transition from the store CCP state of the particular cache location to the load CCP state of the particular cache location [loads are understood to be analogous to reads and stores to writes, in view of this understanding ECMON@ p352, left col, section 2.1, first paragraph teaches “ECMon is complete, in that it is guaranteed to expose all ISMDs. ISMDs consisting of RAW, WAW and WAR dependences can be exercised via two modes: through the cache coherence system or through the memory. By exposing all invalidate events, we make sure that we expose all WAR dependences exercised through the coherence system. Similarly, by exposing all data value reply events, we make sure that we expose all RAW and WAW dependences exercised through coherence. However, not all dependences are exercised through cache coherence system; some are exercised through the main memory due to cache block replacements].
With respect to dependent claim 22 Baartmans/Arimilli/Chen/ECMON discloses wherein the load CCP state comprises at least one of an owned CCP state or a shared CCP state, and wherein the store CCP state comprises at least one of modified CCP state or an exclusive CCP state [ECMon can be used to record shared memory dependences on multicores using no specialized hardware support  - ECMON abstract; “The status tells if the current word is shared across threads or exclusive to one thread, while the lockset indicates the set of locks used to access that memory location” – ECMON p. 350 table 1; Furthermore, when a processor proc i writes back its owned block to the directory, the directory continues to store the id of the processor in the owned field, in the record for the block. Recall that the above steps helped us to deal with cache block replacements. The architectural parameters for our implementation are presented in Table 3 – ECMON p357, section 5.1].
With respect to dependent claim 23 Baartmans/Arimilli/Chen/ECMON discloses wherein the first transition corresponds to a transition from a first period of one or more loads to a second period of one or more stores, and the second transition corresponds to a transition from a third period of one or more stores to a fourth period of one or more loads [loads are understood to be analogous to reads and stores to writes, in view of this understanding ECMON@ p352, left col, section 2.1, first paragraph teaches “ECMon is complete, in that it is guaranteed to expose all ISMDs. ISMDs consisting of RAW, WAW and WAR dependences can be exercised via two modes: through the cache coherence system or through the memory. By exposing all invalidate events, we make sure that we expose all WAR dependences exercised through the coherence system. Similarly, by exposing all data value reply events, we make sure that we expose all RAW and WAW dependences exercised through coherence. However, not all dependences are exercised through cache coherence system; some are exercised through the main memory due to cache block replacements].


 Response to Arguments
Applicant's arguments have been fully considered but are not persuasive in view of the prior art. All claims pending in the instant application remain rejected. Please note that any rejections/objection not maintained from the previous Office Action have been rectified either by applicant's amendment and/or persuasive argument(s). 
Regarding applicant’s argument on page 10 that “Notably, Baartmans tagging is generally limited to counting the number of times a transaction occurs, and its latencies, and does not anticipate, disclose, or suggest any form of logging memory data (influxes) as part of a trace of a multi-threaded entity executing across a plurality of processing units, including (i) recording an influx of memory data in an N-level cache to be logged due to activity by a first processing unit, and then (ii) recording a related CCP transition between two or more (N-1)-level caches due to activity by a second processing unit, as generally claimed by claim 1.”		
[The examiner respectfully submits that Baartmans does not explicitly disclose a shared cache hierarchy, although Baartmans’s disclosure as cited above in the rejection rationale appears to implicitly teach and suggest the instant limitation. Nevertheless, in the same field of endeavor Arimilli teaches hardware enabled instruction tracing in a multiprocessing system employing a cache hierarchy (Arimilli title, abstract, fig 3-4) further teaching a cache hierarchy including a large (shared) level two (L2) cache shared by multiple processor cores 108 (Arimilli 0040-0041).  Further, Baartmans/Arimilli does not explicitly disclose a replayable trace of the entity. Nevertheless, in the same field of endeavor Chen teaches time-deterministic replay (TDR) that can reproduce the execution of a program, including its precise timing and memory access operations … To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices.  - Chen abstract, 0048, 0062, 0072-0073.  Still further, Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units. Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph.  As such the combination of references renders the combination of claim limitations obvious in the context of a USC 103 analysis.]
Regarding applicant’s argument on page 10 that “While Arimilli can log an instruction trace, Arimilli also does not anticipate, disclose, or suggest logging memory data (influxes) as part of a trace of a multi-threaded entity executing across a plurality of processing units, including recording memory influxes in an N-level cache and a related CCP transition (N-1)-level caches, as generally claimed by claim 1.”
[The examiner respectfully submits that Baartmans/Arimilli does not explicitly disclose a replayable trace of the entity. Nevertheless, in the same field of endeavor Chen teaches time-deterministic replay (TDR) that can reproduce the execution of a program, including its precise timing and memory access operations … To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices.  - Chen abstract, 0048, 0062, 0072-0073.  Still further, Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units. Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph.  As such the combination of references renders the combination of claim limitations obvious in the context of a USC 103 analysis.]
Regarding applicant’s argument on page 10-11 that “Chen cannot be relevant to tracing of a multi-threaded entity executing across a plurality of processing units, including recording memory 
[The examiner respectfully submits that Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units. Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph.  As such the combination of references renders the combination of claim limitations obvious in the context of a USC 103 analysis.]
Regarding applicant’s argument on page 11 that “ECMON was cited in an apparent attempt to overcome the deficiencies of Chen (i.e., that Chen can only trace single- threaded execution, and is thus nonanaloagous to tracing of a multi-threaded entity executing across a plurality of processing units). However, Applicant notes that any combination of Chen/ECMON would be inoperable for exporting CCP data relating to multithreaded execution, since Chen can only execute code at a single processor. Since Chen executes code at a single processor, there would be no interprocessor shared memory dependence within Chen's environment; thus, Chen would be incapable of exporting ECMON's ISMD information-including any form of a CCP transition among lower-level caches, as is required by claim 1.”
[The examiner respectfully submits that Chen is not nonanaloagous to tracing of a multi-threaded entity at least because Chen teaches @0063 “To reduce the number of events that must be recorded, we implement a simple form of deterministic multithreading [38] in Sanity : threads are scheduled round-robin, and each runnable thread is given a deterministic condition that triggers the thread to yield control of processing resources to another thread” and further @0082 that “we implemented our prototype as a Linux kernel module with two threads. The TC thread runs on one core with interrupts and the NMI disabled; the SC thread runs on a different core and interacts with the TC as discussed in Section 3.4”. As such, Chen would be operable with ECMON as combined in the rejection rationale].
Regarding applicant’s argument on page 11 that “Additionally, while the cited art references may touch on the topics of multi-tier caches (e.g., Baartmans and Arimilli), the recording of execution trace data in single-threaded environments (e.g., Chen), and the exportation of ISMD information from a processor (e.g., ECMON), there is simply no combination of these references that can teach the interrelated limitations of (i) first causing the data of an influx to a higher level cache to be record to a trace (i.e., "based at least on detecting an influx of memory data from a particular memory location in system memory to a particular cache location in the N-level cache that results from execution of a first thread of the entity by a first of the plurality of processing units, cause the influx of memory data to the particular cache location in the N-level cache to be logged into a replayable trace of the entity"), and (ii) 
[The examiner respectfully submits that the combination of cited art, as combined in the context of the 103 obviousness rejection, as combined with ECMON, have rendered the argued claim limitations obvious in view of the cited art, with emphasis on the following sections of ECMON: 
@ p350, left col, first paragraph teaches “In addition to speculation and record-replay based debugging, the detection of ISMDs is crucial to a variety of other monitoring applications, as illustrated in Table 1. Each of the monitoring applications listed in Table 1 associates meta data with original memory locations in corresponding shadow memory locations [23]. When run on multicores, races between original memory accesses and meta data accesses can result in inconsistent meta data values [3, 19, 20, 23]. Knowledge of ISMDs between original memory operations enables us to enforce the corresponding meta data dependences and thus maintain consistent meta data [20]”, further 
@ p350, right col, 2nd-3rd paragraph teaches a system/method where cache-miss events are exposed for implementing cache coherence in software [11] and implementing software controlled multithreading [18], this event is not adequate for exposing ISMDs needed for detecting miss-speculation during software speculation and enabling replay for debugging, and further teaching 
@ p352, left col, section 2.1, first paragraph teaches “ECMon is complete, in that it is guaranteed to expose all ISMDs. ISMDs consisting of RAW, WAW and WAR dependences can be exercised via two modes: through the cache coherence system or through the memory. By exposing all invalidate events, we make sure that we expose all WAR dependences exercised through the coherence system. Similarly, by exposing all data value reply events, we make sure that we expose all RAW and WAW dependences exercised through coherence. However, not all dependences are exercised through cache coherence system; some are exercised through the main memory due to cache block replacements
Regarding applicant’s argument pertaining to new claims 21-23 [New/amended  grounds of rejection necessitated by amendments to the claims have rendered the instant remarks moot].
All remarks are understood to have been addressed herein and by the amended grounds of rejection.  If any issues remain which may be clarified by the examiner, the applicant is invited to contact the examiner to set up a telephone interview.
When responding to the office action, any new claims and/or limitations should be accompanied by a reference as to where the new claims and/or limitations are supported in the original disclosure.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARWAN AYASH at (571)270-1179.  The examiner may be reached via email at marwan.ayash@uspto.gov – provided that applicant files form PTO/SB/439 to authorize internet communication, found online at http://www.uspto.gov/sites/default/files/documents/sb0439.pdf   
The examiner can normally be reached 9a-730p M-R.  Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jared Rutz can be reached on 571-272-5535.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Marwan  Ayash/ - Examiner - Art Unit 2133

/JARED I RUTZ/Supervisory Patent Examiner, Art Unit 2133