PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/898,372
Filing Date: 16 Feb 2018
Appellant(s): MOLA, Jordi



__________________
Kirk Coombs Reg.#63249
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 06/11/2021.

(1) Grounds of Rejection to be Reviewed on Appeal
Every ground of rejection set forth in the Office action dated 01/21/21 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”
(2) Response to Argument
Appellant’s argument are presented below and [Examiner’s responses are bracketed and italicized].
A1       (page 13) “However, Baartmans does not anticipate, disclose, or suggest any form of logging a replayable trace of an executing entity—including recording an influx of memory data to a particular cache location in an N-level cache that results from execution of a first thread of an entity by a first of a plurality of processing units into a replayable trace of the entity. Further, Baartmans also does not anticipate, disclose, or suggest, based at least on having caused the foregoing influx of memory data to be logged into the replayable trace of the entity, subsequently causing one or more (N-1)-level cache coherence protocol (CCP) transitions between the two or more (N-1)-level caches to also be logged into the replayable trace of the entity based on execution of a second thread of the entity by a second of the plurality of processing units. Notably, in rejecting the above-quoted limitation, the Office references paragraph [0029] of Baartmans. (FOA, p. 5.) However, this paragraph merely discloses an ability to obtain “other” tracing information at trace points, such as tracing information related to responses from the agents for a transaction (e.g., whether the transaction resulted in a retry, cache states of a cache line accessed, such as: dirty, exclusive, shared among multiple processors, etc.). Appellant respectfully notes that recording influxes of memory data, and recording related CCP transitions, into a replayable trace of an entity is distinct from obtaining cache states of a cache line as part of tracking a transaction.” 
[The examiner respectfully submits that Chen appears to be relied primarily on for teaching the limitation of caus(ing) the influx of memory data to the particular cache location in the N-level cache to be logged into a repayable trace of the entity, as “time-deterministic replay to reproduce the execution of a program including its precise timing – Chen abstract, 0048, 0062, 0072-0073; deterministic replay, i.e., to ask the remote machine to record all nondeterministic inputs (such as messages, interrupts, etc.) along with the precise point in the program where each of them occurred, and to inject these inputs at the same points during replay -  Chen 0043 in view of 0048 teaching Memory: Different memory accesses during play and replay and/or different memory layouts can increase or decrease the number of cache misses at all levels, and/or affect their timing & [0072] To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices”.  The disclosure in 0029 of Baartmans is cited to show that Baartmans suggests the possibility of obtaining ‘other’ tracing information. Moreover 0030 of Baartmans teaches “Accordingly, metrics such as … address locations targeted, cache states …for particular transactions can be obtained from trace points A-D 110a-d. Since desired information for particular transactions can be gathered in a targeted manner from the various trace points…” such that when combined with Chen appears to disclose a broadest reasonable interpretation of the instant limitation].

A2       (page 14) “Turning to Arimilli, it is noted that while Arimilli refers to computing environments that include a bifurcated L1/L2 cache hierarchy and a CCP, Arimilli is generally directed to logging an instruction trace. An instruction traces is far less rigorous than the claimed replayable trace, since an instruction trace lacks a record of data consumed by those instructions. In particular, while Arimilli discloses recording an instruction trace log 260 of instructions dispatched by an ISU 270 to execution units 282-290, Arimilli lacks any disclosure or suggestion of recording cache influxes or related CCP messages into its trace. The claimed replayable trace, on the other hand, records influxes of memory data and CCP transitions related thereto, enabling traced instructions to be replayed.”
[The examiner respectfully submits that Arimilli does not appear to be relied upon for teaching the limitation in question, but rather is combined with Baartmans because Baartmans does not explicitly disclose a shared cache hierarchy and Arimilli teaches hardware enabled instruction tracing in a multiprocessing system employing a cache hierarchy (Arimilli title, abstract, fig 3-4) further teaching a cache hierarchy including a large (shared) level two (L2) cache shared by multiple processor cores 108 (Arimilli 0040-0041).  The feature of a rigorous replayable trace is disclosed by the cited combination of reference, in both ECMON & Chen as: “Record-replay based debugging [12, 17, 22, 35, 40] is a technique that helps ensure software reliability by recording program execution, so as to enable replay to help in debugging. In a multicore setting, ISMDs need to be recorded in addition to recording other non-deterministic events. Let us consider the example shown in Fig. 1(b) which shows the parallel execution of two functions (no speculation involved) fun1 and fun2. As we can see, fun2 reads the value written by fun1 and this RAW dependency must be recorded to ensure that ld2 gets the correct value (from st1) during replay”, in other words, record execution of parallel threads across two or more processing units - ECMON fig 3 on page 353 in view of introduction, right col on page 349 & ECMON p352, left col, section 2.1, first paragraph teaches “ECMon is complete, in that it is guaranteed to expose all ISMDs. ISMDs consisting of RAW, WAW and WAR dependences can be exercised via two modes: through the cache coherence system or through the memory. By exposing all invalidate events, we make sure that we expose all WAR dependences exercised through the coherence system. Similarly, by exposing all data value reply events, we make sure that we expose all RAW and WAW dependences exercised through coherence. However, not all dependences are exercised through cache coherence system; some are exercised through the main memory due to cache block replacements”.  In Chen the following disclosure is pertinent to a rigorous trace:  time-deterministic replay to reproduce the execution of a program including its precise timing – Chen abstract, 0048, 0062, 0072-0073; deterministic replay, i.e., to ask the remote machine to record all nondeterministic inputs (such as messages, interrupts, etc.) along with the precise point in the program where each of them occurred, and to inject these inputs at the same points during replay -  Chen 0043 in view of 0048 teaching Memory: Different memory accesses during play and replay and/or different memory layouts can increase or decrease the number of cache misses at all levels, and/or affect their timing & [0072] To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices].

A3       (page 14) “Chen’s TDR mechanism (Sanity) can reproduce the execution of a program, including its precise timing. However, in order to preserve this precise timing, Chen’s TDR system is implemented at a single- core JVM to avoid dealing with asynchronous events, such as hardware interrupts, which can strike at any point during execution directly on hardware (e.g., as would be the case for x86-based implementations). Specifically, Sanity implements a timed core (TC) that executes the JVM itself (i.e., as a TC thread that runs on one core with interrupts and the NMI disabled). While Sanity does utilize plural cores—the TC and a supporting core (SC)—it is the TC on which the JVM exclusively executes (as the TC thread). Thus, all threads of any traced application being executed via the JVM are executed exclusively on the TC; the SC is only there to handle interrupts and I/O on the TC's behalf. This means that all threads of multi-threaded application whose execution is being traced through Chen’s Sanity JVM are scheduled to execute one thread at a time on this TC, by interleaving execution of those threads in a round-robin manner on the TC. As such, given the configuration of Sanity, it would be impossible for Sanity to be able to both (a) log a cache influx based on execution of a first thread of an entity by a first of a plurality of processing units, and also (b) log a CCP transition based on execution of a second thread of the entity by a second of the plurality of processing units, as is required by claim 1. For example, there is no instance in which Sanity would execute a first thread on the TC and a second thread on the SC. Additionally, it is noted that, by limiting Sanity JVM execution to a single core (i.e., the TC), all cache influxes cause by the JVM’s execution on that core would affect the same cache levels. Thus, Sanity can record influxes at a single cache level to fully capture influxes consumed by any program executing on the JVM (but, as a result of this single-cored execution, cripples performance of any multi-threaded program). Chen would, therefore, provide no motivation to also record CCP transitions at a lower cache level to capture cache activity caused by other cores (e.g., the SC).”
The examiner respectfully submits just because Chen happens not to teach multi-threaded execution, it does not mean that it would be impossible to use the teachings of Chen in a multi-threaded environment, in other words, Chen would just need to be modified, consistent with 103-obviousness practice, just as Chen is modified/combined with ECMON such that the combination teaches multi-threaded functionality, as noted in the rejection rationale. The 103 rejection relies on a combination of the references to teach multi-threaded functionality as follows: Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units. Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph. It would have been obvious to one having ordinary skill in the art before the effective filing date of the invention to implement parallel threads across multiple processing units and the ability to replay cache transactions performed by the parallel threads in the invention of Baartmans/Arimilli/Chen as taught by ECMON because this would be advantageous for optimizing multi-core and multi-threaded processing (ECMON page 349, right col, Introduction, 1st paragraph).]

A4       (page 14-15) “ECMon discloses exposing cache events to software via handler function. However, as noted, ECMon only discloses exposing the following cache events: (i) a processor sending/receiving an invalidate message, (ii) a processor sending/receiving a data value reply, (iii) a processor experiencing a read miss for a block uncached in any processor, and (iv) a processor about to write back a block. None of these cache events are “one or more (N-1)-level cache coherence protocol (CCP) transitions between the two or more (N-1)-level caches,” as required by claim 1. Notably, even assuming arguendo that ECMon could be interpreted to disclose exposing CCP transitions between the two or more (N-1)-level caches, there would be no motivation to combine ECMon with any of Baartmans, Arimilli, and/or Chen to arrive at the limitation of: 	“based at least on having caused the influx of memory data to the particular cache location in the N-level cache to be logged into the replayable trace of the entity, subsequently cause one or more (N-1)-level cache coherence protocol (CCP) transitions between the two or more (N-1)-level caches to be logged into the replayable trace of the entity, the (N-1)-level CCP transitions resulting from the particular cache location being accessed by a second of the plurality of processing units based on execution of a second thread of the entity by the second of the plurality of processing units.”
[The examiner respectfully submits that “CCP transitions” are disclosed as cache events (cache states are logged when a transaction is traced - Baartmans 0029 in view of Chen 0048, 0062, 0072-0073) such that the cache events disclosed by ECMon read on a broadest reasonable interpretation of the claimed limitation.  The motivation to combine ECMon is indicated as: Baartmans & Arimilli does not explicitly disclose a replayable trace of the entity. 
Nevertheless, in the same field of endeavor Chen teaches time-deterministic replay (TDR) that can reproduce the execution of a program, including its precise timing and memory access operations … To maximize the similarity between play and replay timing, Sanity must ensure that the machine is in the same state when the execution begins. This not only involves CPU state, but also memory contents, stable storage, and key devices.  - Chen abstract, 0048, 0062, 0072-0073. 
It would have been obvious to one having ordinary skill in the art before the effective filing date of the invention to implement a replayable trace of the entity in the invention of Baartmans/Arimilli as taught by Chen because this would be advantageous for testing and ensuring normal execution of a program (Chen 0196-0197).
Baartmans/Arimilli/Chen does not explicitly disclose executing parallel threads across multiple processing units.
Nevertheless, in the same field of endeavor ECMON teaches exposing cache events for monitoring wherein parallel threads operating in a multicore environment may be monitored, tracked and recorded so that cache events may be replayed –ECMON page 349, left col, first two paragraphs & page 354, right col, section 4, 1st paragraph.

A5       (page 15) “With respect to Baartmans, Baartmans’ disclosure is generally limited to identifying and tagging transactions when they pass through trace points, and to using those trace points to track the flow of a transaction through a processing system (including latencies). Since Baartmans is merely concerned with transaction tracking via tagging, there would be no need or motivation for Baartmans to expose CCP transitions as part of that tagging. With respect to Arimilli, Arimilli’s disclosure is directed to logging an instruction trace that logs instructions dispatched by an ISU to execution units. However, since an instruction trace records executed instructions only (and not data values accessed), there would be no need or motivation for Arimilli to include any CCP transition information in an instruction trace. With respect to Chen, Chen discloses that its JVM is intentionally designed to only trace code execution at a single core (processing unit)—the TC; thus, Chen would be incapable of exposing CCP transitions that occur as a result of execution of a multithreaded entity at a plurality of processing units, and as such there would be no need or motivation to export CCP data as it relates to cache activity caused by the TC.”
[The examiner respectfully submits just because Chen happens not to teach multi-threaded execution, it does not mean that one of ordinary skill would be incapable of using the teachings of Chen in a multi-threaded environment, in other words, Chen would just need to be modified, consistent with 103-obviousness practice, just as Chen is modified/combined with ECMON such that the combination teaches multi-threaded functionality, as noted in the rejection rationale.  The examiner respectfully submits that all of the cited references are within the same field of endeavor and are analogous art.  The statements regarding why one of ordinary skill in the art at the effective filing date of the invention would be motivated to combine them are herein incorporated by reference].

A6       (page 15-16) “In summary, the cited references touch on the topics of transaction tracing (Baartmans), instruction tracing (Arimilli), the recording of execution trace data in single-threaded environments (Chen), and the exportation of ISMD information from a processor (ECMon). However, there is simply no combination of these references (nor motivation to combine those references) that can be used to teach or suggest (i) first causing the data of an influx to a higher-level cache to be recorded to a trace due to execution activity at a first processing unit,” and (ii) later recording a related CCP transition among lower- level caches due to related execution activity at a second processing unit.?”
[The examiner respectfully submits that since the instant arguments are substantially identical to A1-A5 addressed above, they are unpersuasive for the same reasons as provided above in A1-A5 above].



Remaining arguments on pages 16, pertaining to independent claims 12 & 19 in addition to dependent claims are unpersuasive on dependency merits.

For the above reasons, it is believed that the rejections should be sustained.

Respectfully submitted,

/Marwan Ayash/Examiner, Art Unit 2133                                                                                                                                                                                                        

Conferees:
       /JARED I RUTZ/       Supervisory Patent Examiner, Art Unit 2133                                                                                                                                                                                                                                                                                                                                                                                               /DANIEL KINSAUL/
      Quality Assurance Specialist, TC 2100                                                                                                                                                                                      


Requirement to pay appeal forwarding fee.  In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.