Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is responsive to application filed on 12/27/2018. Claims 1, 11, 21 and 25 are independents. Claims 1-25 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/14/2019 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Objections
In claim 1 and other independent claims, HPC when first time appeared in the claim, should be fully spelled out. Appropriate correction required.
“side channel” or “side-channel” in all claims should be used consistently. That is, all need to be in the form of “side channel” or “side-channel”. Appropriate correction required.

Claim Rejections -35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-24 are rejected under 35 U.S.C. 103 as being unpatentable over Khorrami et al. (US 20190340392 A1), hereinafter Khorrami, In view of (KR20160008509A, Unsupervised anomaly-based malware detection using hardware features, 2014), hereinafter D1.

	
Regarding claims 1, 11, 21 and 25, Khorrami teaches a side channel attack detection system comprising (Khorrami para. 0012, side channel attacks):
processor circuitry (Khorrami para. 0013, HPCs are integrated into all modern processors);
a plurality of HPCs coupled to the processor circuitry (Khorrami para. 0013, HPCs are integrated into all modern processors);
collect information representative of a side-channel attack dataset for each respective one of a plurality of HPCs as the processor circuitry executes at least one side channel attack detection instruction set (Khorrami para. 0013, HPCs are processor dependent and provide information on instructions executed, branches that were taken, hardware interrupts, memory loads and stores, cache misses and accesses, etc. FIG. 1 shows how one can characterize code execution by the total occurrences of hardware events as well as by temporal patterns and relationships among events);
detect whether each of the plurality of HPCs demonstrates anomalous activity as the processor circuitry executes the at least one side-channel attack instruction set (Khorrami para. 0014 and 0041, HPCs have been used to detect malicious modifications in applications [14], to detect rootkits [15,16], and to detect firmware modifications [17,18]. This proposal extends prior approaches [14-20] See recent survey article [12] “The Cybersecurity Landscape in Industrial Control Systems,” Proceedings of the IEEE, May 2016 (incorporated herein by reference), and “perspective” article [13] on cyber-security techniques for CPS, “Cybersecurity for Control System: A Process Aware Perspective,” IEEE Design and Test Magazine, September 2016 (incorporated herein by reference)); and
 select the at least one HPC for inclusion in a side-channel attack detection HPC sub-set based on the demonstrated anomalous activity of the respective at least one HPC (Khorrami para. 0058, HPCs are measured separately for each of the threads in the multi-threaded process and the anomaly detection addresses the multidimensional measurement stream comprising of all HPCs separately measured for each of the threads in the process. For monitoring a target process, there are multiple ways to acquire HPC measurements from the process).
	Although Khorrami teaches collect information representative of a side-channel attack dataset for each respective one of a plurality of HPCs as the processor circuitry executes at least one side channel attack detection instruction set; detect whether each of the plurality of HPCs demonstrates anomalous activity as the processor circuitry executes the at least one side-channel attack instruction set; and select the at least one HPC for inclusion in a side-channel attack detection HPC sub-set based on the demonstrated anomalous activity of the respective at least one HPC (as shown above), Khorrami does not explicitly discloses these operations are performed by a  hardware performance anomaly detection circuitry. In other word, Khorrami does not explicitly mention a hardware circuitry is used to perform these operation for anomalous activity.
However, hardware performance anomaly detection circuitry is old and well known in the art of computer security as illustrated by D1 (paragraph 0054). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include hardware performance anomaly detection circuitry as illustrated by D1 in the malicious detection system of Khorrami, in order to achieve the predictable result of detecting malware and malicious activities by using a dedicated hardware circuitry.
Regarding claim 2, the combination of Khorrami and D1 teaches all of the limitations of claim 1 as described above. Khorrami in view of D1 further teaches the hardware performance anomaly detection circuitry to further: collect information representative of a baseline dataset for each respective one of a plurality of HPCs as the processor circuitry executes at least one application instruction set (Khorrami para. 0063, a time series of HPC measurements are collected for the target process running on the embedded device under known good conditions to establish a baseline. When monitoring a device, the observed code execution characteristics are probabilistically matched against expected (baseline) nominal characteristics to detect anomalies).

Regarding claim 3, the combination of Khorrami and D1 teaches all of the limitations of claim 2 as described above. Khorrami in view of D1 further teaches the hardware performance anomaly detection circuitry to further: collect information representative of a side-channel attack dataset for each respective one of a plurality of HPCs as the processor circuitry contemporaneously executes at least one side channel attack detection instruction set and the at least one application instruction set (para/ 0063 and 0064, HPC measurements are collected to a file, which is then transferred to an analysis system on a separate computational device (e.g., a workstation computer), or can be streamed on-line to the analysis computer. Since the processor (e.g., ARM) in the embedded device is often distinct from the deployment/analysis computer, the lightweight measurer to collect the HPC measurements is cross-compiled to a native binary (for the target embedded device) and then transferred. On the embedded device, the light-weight measurer can use multiple methods to read HPC measurements for the target process including low-level register access, perf_events or perfctr interfaces in the Linux kernel, high-level PAPI (Performance Application Programming Interface) library, Intel PCM (Performance Counter Monitor) for Windows and Linux. In the implementation of the system, a PLC is considered as a representative embedded device and the PAPI library (See, e.g., PAPI (Performance Application Programming Interface). http://icl.utk. edu/papi (incorporated herein by reference).) is used to implement the measurer).

Regarding claim 4, the combination of Khorrami and D1 teaches all of the limitations of claim 3 as described above. Khorrami in view of D1 further teaches wherein the hardware performance anomaly detection circuitry includes data collection circuitry to: collect information representative of the side-channel attack dataset for each respective one of a plurality of HPCs as the processor circuitry contemporaneously executes the at least one side channel attack detection instruction set and the at least one application instruction set (Khorrami para. 0058-0061, Depending on the device type and application context, HPC-based code monitoring can be defined at various levels of granularity. The “code blocks” being considered can range in granularity from functions (e.g., some crucial functions in system libraries) to individual processes to the set of all kernel/user-space processes running on the device. To address these levels of granularity, HPCs measurements can be acquired for the entire device, for specific processes therein, for individual threads in a process, or for function libraries (such as system calls) or other application-specific static and dynamic libraries. While the approach can scale to these levels of granularity, we consider monitoring of a specific process (e.g., a crucial process on the target device such as the control logic process on a PLC), which is a particularly relevant application in the context of embedded devices in CPS. The target process will, in general, be multi-threaded, as is typical in real time control logic processes on embedded controllers such as PLCs. HPCs are measured separately for each of the threads in the multi-threaded process and the anomaly detection addresses the multidimensional measurement stream comprising of all HPCs separately measured for each of the threads in the process. For monitoring a target process, there are multiple ways to acquire HPC measurements from the process. These methods include: i) In-process, by a priori instrumenting the code of the target process. ii) Connecting from an external monitoring program according to a fixed sampling rate. iii) Hooking into specific parts of the monitored code (e.g., particular functions) by dynamic instrumentation to invoke the code); and
 collect information representative of a baseline dataset for each respective one of a plurality of HPCs as the processor circuitry executes at least one application instruction set (Khorrami para. 0063, 0071 and 0084, sequence of measurements forms the baseline data set. During run-time monitoring, the time series of measurements is the test data set and the problem addressed here is the development of a robust matching approach to decide if the test data set matches the characteristics of the baseline data set or is anomalous. For this purpose, feature extraction algorithms are utilized to extract low-dimensional feature representations from the HPC measurements over time windows. The same feature extraction algorithms are used for both the baseline and the test data sets. A machine learning approach is used to learn a model of feature patterns from the baseline data set. Thereafter, the trained machine learning based system is used to classify the test data set as baseline or anomalous. A primary motivation in the development of the proposed approach and indeed a central characteristic of embedded CPS devices, which enables the proposed approach to provide robust anomaly detection, is that the typical code structures in such devices have well-defined and typically periodic patterns. As illustrated in FIG. 8, typical implementations of control logic processes in embedded devices are essentially comprised of periodically repeated iterations of sensor reading, control algorithm computations, and actuator writing steps. Hence, the HPC measurement time series for these processes tends to have approximately periodically repeated patterns although with significant stochastic variations due to various non-determinacy effects as discussed above in Section 4.2.2.1.2, which essentially create stochastic “noise” in HPC readings).

Regarding claim 5, the combination of Khorrami and D1 teaches all of the limitations of claim 4 as described above. Khorrami in view of D1 further teaches wherein the hardware performance anomaly detection circuitry includes time series feature extraction circuitry to:
convert the baseline dataset from each of the plurality of HPCs from a time domain to a time/frequency domain (Khorrami para. 0073-0081 and 0125-0129, the utilization of multiple temporal lengths provides a multi-resolution approach that facilitates learning of temporal patterns that are apparent over different time scales. The possible values of γ are picked to be a discrete set Γ depending on the typical time scales of the time series signals in the specific application (e.g., depending on the typical control loop sampling periods when monitoring a control logic process, time scales of local features in the time series signals, etc.). Over each considered time window of the signal, features of multiple types can be extracted from the measurement sequence in that time window including: i) Basic statistics such as max, min, mean, root mean square, variance, skewness, and kurtosis of the measurement data points (HPC measurement samples fnt) within the time window. These statistics are extracted over sliding time window segments (in general, of different lengths and with overlaps of successive window segments). These statistics are extracted separately for the different threads and for the different HPC modalities. Statistics such as mean and root mean square characterize levels of activity (within the time window segments and in terms of the different HPC measurement modalities such as number of instructions and number of branches). [0075] ii) Inter-sample rates of changes-based features. Statistics of inter-sample changes include, for example, the means of absolute values of pair-wise differences of HPC measurements between successive sampling times. The computation of the mean of absolute values of point-wise derivatives of the time series signal uses three or more successive points for numerical robustness. Statistics of inter-sample changes characterize patterns of time variations of activity (i.e., derivatives of the activity patterns). iii) Histogram based methods (e.g., percentage of samples over the mean, percentage of samples in highest 25%, etc.). iv) Frequency domain methods such as Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (Discrete Wavelet Transform), e.g., frequencies (or the mean of these frequencies) corresponding to highest few peaks in the DFT. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization approach to encapsulate time windows of HPC measurements as low dimensional feature vectors. v) Autocorrelation methods, e.g., lag for which highest autocorrelation is achieved (i.e., time shift other than 0 of the sample window segment for which highest autocorrelation is achieved). This feature extracts periodicity characteristics of the time series signal. vi) Cross-correlation across threads and across HPC measurement modalities. These features extract characteristics of temporal relationships between activity patterns in different threads and different types of activity patterns. vii) Polynomial-based methods, e.g., coefficients of a polynomial representation (e.g., cubic splines and Chebyshev polynomials) computed as the closest fit for the time series signal window segment. viii) Compressibility based methods, i.e., a measure of the compressibility (or equivalently information content) of the signal window segment, e.g., number of bits of most compact representation (to within some approximation threshold). This feature can be computed separately for each thread and/or each HPC modality or can be computed as a combined metric for the multidimensional measurement sequence comprising of HPC measurements from all threads. The Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. So, general wavelet includes Haar wavelet); and
 convert the side-channel attack dataset from each of the plurality of HPCs from the time domain to the time/frequency domain (Khorrami para. 0082 and 0125-0129, [0127] The HPC and stack trace measurements over sliding windows of time are used to form time domain and frequency-domain feature characteristics using transform techniques and kernel methods. While TRACE measures HPCs as numerical values (e.g., numbers of instructions and branches over a time interval), one can represent the stack traces using discrete labels. The most frequently appearing stack traces for a code block are labeled as labels 1, . . . , N. The less often occurring stack traces are categorized using a catch-all label N+1 (This is analogous to the “background” tag in semantic segmentation in image processing applications). For time-domain signal aggregation over sliding time windows (in general, of different lengths and with overlaps of successive windows), features are extracted using multiple techniques [25-31] including basic statistics (such as max, min, mean, root mean square, and statistics of inter-sample changes), histograms, autocorrelations (e.g., lags for autocorrelation peaks), and kernel methods such as the kernel principal component analysis. Combinations of low-dimensional feature extractors provide semantic hashes comprising of low-dimensional feature representations of the measurements over time windows. TRACE extracts the frequency-domain features using Fourier and wavelet transform techniques according to the empirically observed signal characteristics. These features include frequencies (in sorted order) of a few of the highest peaks in the Fourier transform. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization framework to encapsulate time windows of HPC and stack trace measurements in low-dimensional feature vectors. The Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. So, general wavelet includes Haar wavelet).

Regarding claim 6, the combination of Khorrami and D1 teaches all of the limitations of claim 3 as described above. Khorrami in view of D1 further teaches wherein the time feature extraction circuitry comprises Haar wavelet transform circuitry (Khorrami para. 0073-0081 and 0129, Frequency domain methods such as Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (Discrete Wavelet Transform), e.g., frequencies (or the mean of these frequencies) corresponding to highest few peaks in the DFT. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization approach to encapsulate time windows of HPC measurements as low dimensional feature vectors. v) Autocorrelation methods, e.g., lag for which highest autocorrelation is achieved (i.e., time shift other than 0 of the sample window segment for which highest autocorrelation is achieved). This feature extracts periodicity characteristics of the time series signal. vi) Cross-correlation across threads and across HPC measurement modalities. These features extract characteristics of temporal relationships between activity patterns in different threads and different types of activity patterns. vii) Polynomial-based methods, e.g., coefficients of a polynomial representation (e.g., cubic splines and Chebyshev polynomials) computed as the closest fit for the time series signal window segment. viii) Compressibility based methods, i.e., a measure of the compressibility (or equivalently information content) of the signal window segment, e.g., number of bits of most compact representation (to within some approximation threshold). This feature can be computed separately for each thread and/or each HPC modality or can be computed as a combined metric for the multidimensional measurement sequence comprising of HPC measurements from all threads).

Regarding claim 7, the combination of Khorrami and D1 teaches all of the limitations of claim 3 as described above. Khorrami in view of D1 further teaches wherein the hardware performance anomaly detection circuitry includes anomaly detection circuitry to: detect whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0109, 0131 and 0133, from the time series of measurements, various types of low-dimensional features are extracted by TRACE over sliding time windows as described above. Examples of TRACE feature extraction are shown in FIG. 6. Using these extracted features, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline. TRACE uses a machine learning approach to model the empirically observed probability distributions of time series of feature vectors over time windows and to detect deviations from expected baseline behavior. For example, from a time series {f.sub.1, . . . , f.sub.j} of feature vectors over a time interval, TRACE machine learning-based classifier determines P (ζ|{f.sub.1, . . . , f.sub.j}) where ζ denoted different possible hypotheses of the device state. For example, in the simplest case, ζ could denote the hypotheses of baseline versus anomalous for the device).

Regarding claim 8, the combination of Khorrami and D1 teaches all of the limitations of claim 5 as described above. Khorrami in view of D1 further teaches wherein the anomaly detection circuitry comprises: one-class support vector anomaly detection circuitry (Khorrami para. 0131, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline).

Regarding claim 9, the combination of Khorrami and D1 teaches all of the limitations of claim 5 as described above. Khorrami in view of D1 further teaches wherein the hardware performance anomaly detection circuitry includes hardware performance counter identification circuitry to:
detect, for each of the plurality of HPCs, whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs; and select the at least one HPC for inclusion in a side-channel attack detection HPC sub-set based on the demonstrated anomalous activity of the respective at least one HPC (Khorrami para. 0058, HPCs are measured separately for each of the threads in the multi-threaded process and the anomaly detection addresses the multidimensional measurement stream comprising of all HPCs separately measured for each of the threads in the process. For monitoring a target process, there are multiple ways to acquire HPC measurements from the process).

Regarding claim 10, the combination of Khorrami and D1 teaches all of the limitations of claim 1 as described above. Khorrami in view of D1 further teaches further comprising: input/output (I/O) interface circuitry; wherein the processor circuitry to: 
generate an output signal that includes data indicative of the side-channel attack detection HPC sub-set; and communicate the output signal to one or more external processor-based devices (Khorrami para. 0031, anomaly detection over sliding time windows using the proposed approach without the majority voting over sequences of time windows). The first row corresponds to anomaly detection in a test data set from baseline operation and the second row corresponds to a test data set corresponding to the malware/modification A.sub.5. In each plot, values of 1 and −1 indicated that the classifier generated an estimate of non-anomalous (baseline) or anomalous, respectively, when given a sliding time window of data ending at that time instant. Hence, in the first row, points which are at −1 indicate misclassifications while, in the second row, points which are at 1 indicate misclassifications. The right-side figures in each row show a zoomed-in view over a smaller time interval to visualize the (sparse) misclassification errors).

Regarding claim 12, the combination of Khorrami and D1 teaches all of the limitations of claim 11 as described above. Khorrami in view of D1 further teaches further comprising:
executing, by the processor circuitry, at least one application instruction set; and
 collecting, via the hardware performance anomaly detection circuitry, information representative of a baseline dataset for each respective one of a plurality of HPCs (Khorrami para. 0063 and 0071, a time series of HPC measurements are collected for the target process running on the embedded device under known good conditions to establish a baseline. When monitoring a device, the observed code execution characteristics are probabilistically matched against expected (baseline) nominal characteristics to detect anomalies.).

Regarding claim 13, the combination of Khorrami and D1 teaches all of the limitations of claim 12 as described above. Khorrami in view of D1 further teaches wherein executing at least one side channel attack instruction set further comprises:
executing, by the processor circuitry, the at least one side channel attack instruction set contemporaneous with executing the at least one application instruction set (Khorrami para. 0107, HPC measurements of numbers of instructions and numbers of branches for these malware/modifications are shown in FIGS. 13 and 14. It is to be noted that these modifications are extremely small (e.g., just one additional line of code in each of attacks A1 and A6). Hence, the HPC measurement time series for the baseline and for the malware/modifications listed above are very similar in their macroscopic aspects. Note that the intermittent spikes in HPC measurements are due to various non-deterministic effects as was discussed in § 4.2.2.1.2 and cannot reliably be used to distinguish between baseline and anomalous data sets. Instead, a robust and accurate classification of baseline vs. anomalous has to rely upon the subtle temporal patterns in the time series of the HPC measurements. For this purpose, sliding time windows are considered as discussed in § 4.2.2.1.6 and feature vectors are extracted, which are then utilized for SVM-based classification of baseline vs. anomalous); and 
wherein collecting information representative of a side-channel attack dataset for each respective one of a plurality of HPCs further comprises: collecting, by the hardware performance anomaly detection circuitry, information representative of a side-channel attack dataset for each respective one of a plurality of HPCs as the processor circuitry contemporaneously executes the at least one side channel attack instruction set and the at least one application instruction set (Khorrami para. 0058-0061, Depending on the device type and application context, HPC-based code monitoring can be defined at various levels of granularity. The “code blocks” being considered can range in granularity from functions (e.g., some crucial functions in system libraries) to individual processes to the set of all kernel/user-space processes running on the device. To address these levels of granularity, HPCs measurements can be acquired for the entire device, for specific processes therein, for individual threads in a process, or for function libraries (such as system calls) or other application-specific static and dynamic libraries. While the approach can scale to these levels of granularity, we consider monitoring of a specific process (e.g., a crucial process on the target device such as the control logic process on a PLC), which is a particularly relevant application in the context of embedded devices in CPS. The target process will, in general, be multi-threaded, as is typical in real time control logic processes on embedded controllers such as PLCs. HPCs are measured separately for each of the threads in the multi-threaded process and the anomaly detection addresses the multidimensional measurement stream comprising of all HPCs separately measured for each of the threads in the process. For monitoring a target process, there are multiple ways to acquire HPC measurements from the process. These methods include: i) In-process, by a priori instrumenting the code of the target process. ii) Connecting from an external monitoring program according to a fixed sampling rate. iii) Hooking into specific parts of the monitored code (e.g., particular functions) by dynamic instrumentation to invoke the code);.

Regarding claim 14, the combination of Khorrami and D1 teaches all of the limitations of claim 13 as described above. Khorrami in view of D1 further teaches wherein detecting whether each of the plurality of HPCs demonstrates anomalous activity as the processor circuitry executes the at least one side-channel attack instruction set further comprises: detecting, by the hardware performance anomaly detection circuitry, whether each of the plurality of HPCs demonstrates anomalous activity based on a deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0109, 0131 and 0133, from the time series of measurements, various types of low-dimensional features are extracted by TRACE over sliding time windows as described above. Examples of TRACE feature extraction are shown in FIG. 6. Using these extracted features, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline. TRACE uses a machine learning approach to model the empirically observed probability distributions of time series of feature vectors over time windows and to detect deviations from expected baseline behavior. For example, from a time series {f.sub.1, . . . , f.sub.j} of feature vectors over a time interval, TRACE machine learning-based classifier determines P (ζ|{f.sub.1, . . . , f.sub.j}) where ζ denoted different possible hypotheses of the device state. For example, in the simplest case, ζ could denote the hypotheses of baseline versus anomalous for the device).

Regarding claim 15, the combination of Khorrami and D1 teaches all of the limitations of claim 12 as described above. Khorrami in view of D1 further teaches wherein collecting information representative of the baseline dataset for each respective one of the plurality of HPCs comprises:
collecting, via data collection circuitry disposed in the hardware performance anomaly detection circuitry, the information representative of the baseline dataset for each respective one of the plurality of HPCs (Khorrami para. 0109, 0131 and 0133, from the time series of measurements, various types of low-dimensional features are extracted by TRACE over sliding time windows as described above. Examples of TRACE feature extraction are shown in FIG. 6. Using these extracted features, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline. TRACE uses a machine learning approach to model the empirically observed probability distributions of time series of feature vectors over time windows and to detect deviations from expected baseline behavior. For example, from a time series {f.sub.1, . . . , f.sub.j} of feature vectors over a time interval, TRACE machine learning-based classifier determines P (ζ|{f.sub.1, . . . , f.sub.j}) where ζ denoted different possible hypotheses of the device state. For example, in the simplest case, ζ could denote the hypotheses of baseline versus anomalous for the device); and
 wherein collecting information representative of the side-channel attack dataset for each respective one of the plurality of HPCs: collecting, by the data collection circuitry, information representative of the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0013, HPCs are processor dependent and provide information on instructions executed, branches that were taken, hardware interrupts, memory loads and stores, cache misses and accesses, etc. FIG. 1 shows how one can characterize code execution by the total occurrences of hardware events as well as by temporal patterns and relationships among events).

Regarding claim 16, the combination of Khorrami and D1 teaches all of the limitations of claim 15 as described above. Khorrami in view of D1 further teaches wherein detecting whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs further comprises:
causing, by time series feature extraction circuitry disposed in the hardware performance anomaly detection circuitry, a conversion of the baseline dataset for each respective one of the plurality of HPCs from a time domain to a time/frequency domain (Khorrami para. 0073-0081 and 0125-0129, the utilization of multiple temporal lengths provides a multi-resolution approach that facilitates learning of temporal patterns that are apparent over different time scales. The possible values of γ are picked to be a discrete set Γ depending on the typical time scales of the time series signals in the specific application (e.g., depending on the typical control loop sampling periods when monitoring a control logic process, time scales of local features in the time series signals, etc.). Over each considered time window of the signal, features of multiple types can be extracted from the measurement sequence in that time window including: i) Basic statistics such as max, min, mean, root mean square, variance, skewness, and kurtosis of the measurement data points (HPC measurement samples fnt) within the time window. These statistics are extracted over sliding time window segments (in general, of different lengths and with overlaps of successive window segments). These statistics are extracted separately for the different threads and for the different HPC modalities. Statistics such as mean and root mean square characterize levels of activity (within the time window segments and in terms of the different HPC measurement modalities such as number of instructions and number of branches). [0075] ii) Inter-sample rates of changes-based features. Statistics of inter-sample changes include, for example, the means of absolute values of pair-wise differences of HPC measurements between successive sampling times. The computation of the mean of absolute values of point-wise derivatives of the time series signal uses three or more successive points for numerical robustness. Statistics of inter-sample changes characterize patterns of time variations of activity (i.e., derivatives of the activity patterns). iii) Histogram based methods (e.g., percentage of samples over the mean, percentage of samples in highest 25%, etc.). iv) Frequency domain methods such as Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (Discrete Wavelet Transform), e.g., frequencies (or the mean of these frequencies) corresponding to highest few peaks in the DFT. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization approach to encapsulate time windows of HPC measurements as low dimensional feature vectors. v) Autocorrelation methods, e.g., lag for which highest autocorrelation is achieved (i.e., time shift other than 0 of the sample window segment for which highest autocorrelation is achieved). This feature extracts periodicity characteristics of the time series signal. vi) Cross-correlation across threads and across HPC measurement modalities. These features extract characteristics of temporal relationships between activity patterns in different threads and different types of activity patterns. vii) Polynomial-based methods, e.g., coefficients of a polynomial representation (e.g., cubic splines and Chebyshev polynomials) computed as the closest fit for the time series signal window segment. viii) Compressibility based methods, i.e., a measure of the compressibility (or equivalently information content) of the signal window segment, e.g., number of bits of most compact representation (to within some approximation threshold). This feature can be computed separately for each thread and/or each HPC modality or can be computed as a combined metric for the multidimensional measurement sequence comprising of HPC measurements from all threads. The Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. So, general wavelet includes Haar wavelet); and
 causing, by the time series feature extraction circuitry, a conversion of the side-channel attack dataset for each respective one of the plurality of HPCs from the time domain to the time/frequency domain (Khorrami para. 0082 and 0125-0129, [0127] The HPC and stack trace measurements over sliding windows of time are used to form time domain and frequency-domain feature characteristics using transform techniques and kernel methods. While TRACE measures HPCs as numerical values (e.g., numbers of instructions and branches over a time interval), one can represent the stack traces using discrete labels. The most frequently appearing stack traces for a code block are labeled as labels 1, . . . , N. The less often occurring stack traces are categorized using a catch-all label N+1 (This is analogous to the “background” tag in semantic segmentation in image processing applications). For time-domain signal aggregation over sliding time windows (in general, of different lengths and with overlaps of successive windows), features are extracted using multiple techniques [25-31] including basic statistics (such as max, min, mean, root mean square, and statistics of inter-sample changes), histograms, autocorrelations (e.g., lags for autocorrelation peaks), and kernel methods such as the kernel principal component analysis. Combinations of low-dimensional feature extractors provide semantic hashes comprising of low-dimensional feature representations of the measurements over time windows. TRACE extracts the frequency-domain features using Fourier and wavelet transform techniques according to the empirically observed signal characteristics. These features include frequencies (in sorted order) of a few of the highest peaks in the Fourier transform. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization framework to encapsulate time windows of HPC and stack trace measurements in low-dimensional feature vectors. The Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. So, general wavelet includes Haar wavelet)..

Regarding claim 17, the combination of Khorrami and D1 teaches all of the limitations of claim 16 as described above. Khorrami in view of D1 further teaches wherein causing the conversion of the baseline dataset for each respective one of the plurality of HPCs from the time domain to the time/frequency domain, further comprises: causing, by Haar wavelet transform circuitry included in the time series feature extraction circuitry disposed in the hardware performance anomaly detection circuitry, the conversion of the baseline dataset for each respective one of the plurality of HPCs from the time domain to the time/frequency domain (Khorrami para. 0073-0081 and 0129, Frequency domain methods such as Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (Discrete Wavelet Transform), e.g., frequencies (or the mean of these frequencies) corresponding to highest few peaks in the DFT. The time-domain and frequency-domain dimensionality reduction methods provide an information quantization approach to encapsulate time windows of HPC measurements as low dimensional feature vectors. v) Autocorrelation methods, e.g., lag for which highest autocorrelation is achieved (i.e., time shift other than 0 of the sample window segment for which highest autocorrelation is achieved). This feature extracts periodicity characteristics of the time series signal. vi) Cross-correlation across threads and across HPC measurement modalities. These features extract characteristics of temporal relationships between activity patterns in different threads and different types of activity patterns. vii) Polynomial-based methods, e.g., coefficients of a polynomial representation (e.g., cubic splines and Chebyshev polynomials) computed as the closest fit for the time series signal window segment. viii) Compressibility based methods, i.e., a measure of the compressibility (or equivalently information content) of the signal window segment, e.g., number of bits of most compact representation (to within some approximation threshold). This feature can be computed separately for each thread and/or each HPC modality or can be computed as a combined metric for the multidimensional measurement sequence comprising of HPC measurements from all threads).

Regarding claim 18, the combination of Khorrami and D1 teaches all of the limitations of claim 16 as described above. Khorrami in view of D1 further teaches wherein detecting whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs further comprises: detecting, by anomaly detection circuitry disposed in the hardware performance anomaly detection circuitry, whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0058, HPCs are measured separately for each of the threads in the multi-threaded process and the anomaly detection addresses the multidimensional measurement stream comprising of all HPCs separately measured for each of the threads in the process. For monitoring a target process, there are multiple ways to acquire HPC measurements from the process).

Regarding claim 19, the combination of Khorrami and D1 teaches all of the limitations of claim 18 as described above. Khorrami in view of D1 further teaches wherein detecting whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs further comprises: detecting, by one-class support vector anomaly detection circuitry disposed in the hardware performance anomaly detection circuitry, whether each of the plurality of HPCs demonstrates anomalous activity based on the deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0131, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline)..

Regarding claim 20, the combination of Khorrami and D1 teaches all of the limitations of claim 11 as described above. Khorrami in view of D1 further teaches further comprising:
generating, by the processor circuitry, an output signal that includes data indicative of the side-channel attack detection HPC sub-set; and communicating, via input/output circuitry coupled to the processor circuitry, the output signal to one or more external processor-based devices (Khorrami para. 0031, anomaly detection over sliding time windows using the proposed approach without the majority voting over sequences of time windows). The first row corresponds to anomaly detection in a test data set from baseline operation and the second row corresponds to a test data set corresponding to the malware/modification A.sub.5. In each plot, values of 1 and −1 indicated that the classifier generated an estimate of non-anomalous (baseline) or anomalous, respectively, when given a sliding time window of data ending at that time instant. Hence, in the first row, points which are at −1 indicate misclassifications while, in the second row, points which are at 1 indicate misclassifications. The right-side figures in each row show a zoomed-in view over a smaller time interval to visualize the (sparse) misclassification errors).

Regarding claim 22, the combination of Khorrami and D1 teaches all of the limitations of claim 21 as described above. Khorrami in view of D1 further teaches wherein the machine-readable instruction set further causes the hardware performance anomaly detection circuitry to: collect information representative of a baseline dataset for each respective one of the plurality of HPCs responsive to execution of at least one application instruction set by the processor circuitry (Khorrami para. 0063, a time series of HPC measurements are collected for the target process running on the embedded device under known good conditions to establish a baseline. When monitoring a device, the observed code execution characteristics are probabilistically matched against expected (baseline) nominal characteristics to detect anomalies).

Regarding claim 23, the combination of Khorrami and D1 teaches all of the limitations of claim 22 as described above. Khorrami in view of D1 further teaches wherein the instructions that cause the hardware performance anomaly detection circuitry to collect information representative of a side-channel attack dataset for each respective one of a plurality of HPCs responsive to execution of at least one side channel attack instruction set by the processor circuitry further cause the hardware performance anomaly detection circuitry to: collect information representative of a side-channel attack dataset for each respective one of a plurality of HPCs responsive to a contemporaneous execution of the at least one side channel attack instruction set and the at least one application instruction set by the processor circuitry (para/ 0063 and 0064, HPC measurements are collected to a file, which is then transferred to an analysis system on a separate computational device (e.g., a workstation computer), or can be streamed on-line to the analysis computer. Since the processor (e.g., ARM) in the embedded device is often distinct from the deployment/analysis computer, the lightweight measurer to collect the HPC measurements is cross-compiled to a native binary (for the target embedded device) and then transferred. On the embedded device, the light-weight measurer can use multiple methods to read HPC measurements for the target process including low-level register access, perf_events or perfctr interfaces in the Linux kernel, high-level PAPI (Performance Application Programming Interface) library, Intel PCM (Performance Counter Monitor) for Windows and Linux. In the implementation of the system, a PLC is considered as a representative embedded device and the PAPI library (See, e.g., PAPI (Performance Application Programming Interface). http://icl.utk. edu/papi (incorporated herein by reference).) is used to implement the measurer).

Regarding claim 24, the combination of Khorrami and D1 teaches all of the limitations of claim 23 as described above. Khorrami in view of D1 further teaches wherein the instructions that cause the hardware performance anomaly detection circuitry to detect whether each of the plurality of HPCs demonstrates anomalous activity as the processor circuitry executes the at least one side-channel attack instruction set further cause the hardware performance anomaly detection circuitry to: detect whether each of the plurality of HPCs demonstrates anomalous activity based on a deviation between the baseline dataset and the side-channel attack dataset for each respective one of the plurality of HPCs (Khorrami para. 0109, 0131 and 0133, from the time series of measurements, various types of low-dimensional features are extracted by TRACE over sliding time windows as described above. Examples of TRACE feature extraction are shown in FIG. 6. Using these extracted features, TRACE uses algorithms based on machine learning approaches such as one-class Support Vector Machine (SVM) and Recurrent Neural Network (RNN) based probability distribution modeling to anomalies as deviations from the baseline. TRACE uses a machine learning approach to model the empirically observed probability distributions of time series of feature vectors over time windows and to detect deviations from expected baseline behavior. For example, from a time series {f.sub.1, . . . , f.sub.j} of feature vectors over a time interval, TRACE machine learning-based classifier determines P (ζ|{f.sub.1, . . . , f.sub.j}) where ζ denoted different possible hypotheses of the device state. For example, in the simplest case, ζ could denote the hypotheses of baseline versus anomalous for the device).






Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHU CHUN GAO whose telephone number is (571)270-5999. The examiner can normally be reached on Monday -Thursday 6:00-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KRISTINE KINCAID can be reached on 571-272-4063. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHU CHUN GAO/Examiner, Art Unit 2437 

/ALI S ABYANEH/Primary Examiner, Art Unit 2437