DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to communication filed on 12/03/2019.
Status of claims in the instant application:
Claims 1-8 are pending.
Information Disclosure Statement
Information Disclosure Statements (IDS) filed on 12/03/2019 have been considered, and signed copies of the IDS forms have been attached to this office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 6, 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Pub. No.: US 2012/0131674 A1 to Wittenschlaeger (hereinafter “Wittenschlaeger”) in view of “NPL: Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach” to Izakian et al. (hereinafter “Izakian”), “IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER 2014”.
Regarding Claim 1. Wittenschlaeger discloses A computer implemented method of detecting anomalous behavior in a set of computer systems communicating Wittenschlaeger, Abstract, FIG. 3: Methods of detecting anomalous behaviors associated with a fabric are presented. A network fabric can comprise many fungible networking nodes, preferably hybrid-fabric apparatus capable of routing general purpose packet data and executing distributed applications …), the method comprising:
generating a baseline set of vector representations for each of a plurality of network 5communications occurring in the computer network during a baseline time period when the plurality of network communications are absent malicious intervention (Wittenschlaeger, Abstract, FIG. 2, Para [0012, 0026, 0028]: A nominal behavior can be established for the fabric and represented by a baseline vector of behavior metrics. Anomaly detection criteria can be derived as a function of a variation from the baseline vector based on measured vectors of behavior metrics … One aspect of the inventive subject matter includes a method of detecting anomalous behavior within a network fabric. A nominal behavior of the fabric can be characterized as a baseline vector comprising a plurality of behavior metrics … FIG. 3 provides an overview of a method 300 for detecting an anomalous behavior within a fabric based on one or more measured vectors of behavior metrics as compared to vectors representing nominal behavior … the baseline vector is more than a mere list of metrics having values. The vector can include a representation of how the metrics behave with time, how the metrics correlate with each other, or reflect other properties representing a dynamic behavior of a fabric … ), each vector representation being derived from a neural network trained using training data defined from the plurality of network communications Wittenschlaeger, Para [0028]: … In some embodiments, the baseline vector is derived by monitoring the fabric over a period of time and establishing the baseline vector by looking for correlated metrics. For example, multi-variate analysis can be performed with respect to metrics to determine if one metric is correlated with another. Other algorithms could also be used to establish baseline vectors or anomalous conditions including genetic algorithms, neural networks, bloom filters, or other known AI techniques …);
However, Wittenschlaeger does not explicitly teach, but Izakian from same or similar field of endeavor teaches:
“iteratively generating a runtime set of vector representations for each of a plurality of 10network communications occurring in the computer network during subsequent runtime time periods (Izakian, Abstract, Page 1615: … Our objective is to detect and characterize any unexpected changes in a subsequence of a set of spatially neighboring time series. Fig. 2 shows the overall scheme of the proposed method by presenting a bird’s eye view at the introduced approach. Let us briefly highlight the main processing realized here. At the first step, a sliding window moves across the time coordinate of data. Since there are N spatial time series data, the time window at each step includes N subsequences. By considering the spatial information and the generated subsequences, we form a set of spatiotemporal subsequences W1,W2, . . . , Wk . In fact, the sliding window allows us to look at the data at different time intervals. At the second step, the available structure in each set of spatiotemporal subsequences Wi, i = 1, 2, . . . k is revealed using a spatiotemporal clustering approach proposed in [3]. The result of this step is a collection of partition matrices U1, U2, . . . , Uk , each describing a set of clusters existing within the spatiotemporal subsequences. In the next step, the revealed partition matrices are exploited in two different ways. The first one is to assign an anomaly score to the revealed clusters in each time window, and the second way is to construct a fuzzy relation between clusters present in successive time steps to visualize the behavior of data in time. For the sake of completeness, we briefly recall the method in [3] for clustering spatiotemporal data. Let us consider N spatiotemporal data x1 , x2, . . . ,xN . FCM aims to describe these N data by the use of c information granules (cluster centers) v1 , v2, . . . , vc and a partition matrix U = [uik ], i = 1, 2, . . . , c, and k = 1, 2, . . . , N. In clustering spatiotemporal data, the challenging
problem is to control the effect of each part of data (spatial and temporal components) in the clustering process. To deal with this problem, the following composite (aggregate) distance function was proposed where _._ denotes Euclidean distance, xk (s) is the spatial part of kth data, and xk (t) stands for the time series part of kth data (or its representation). The parameter λ standing in the above distance function helps us strike a sound balance between the impact of the spatial and temporal components of the data in the clustering process. By assigning λ = 0, we remove the effect of temporal part of data and consider only a spatial part when forming clusters. Assigning higher value to λ increases the impact of the temporal part in the clustering process. By considering the distance (9), one calculates the cluster centers and the partition matrix in an iterative fashion …);
Izakian, Abstract, Page 1615: … In this paper, we consider fuzzy c-means (FCM) as a conceptual and algorithmic setting to deal with the problem of anomaly detection. Using a sliding window, the time series are divided into a number of subsequences, and the available spatiotemporal structure within each time window is discovered using the FCMmethod. In the sequel, an anomaly score is assigned to each cluster, and using a fuzzy relation formed between revealed structures, a propagation of anomalies occurring in consecutive time intervals is visualized … Our objective is to detect and characterize any unexpected changes in a subsequence of a set of spatially neighboring time series. Fig. 2 shows the overall scheme of the proposed method by presenting a bird’s eye view at the introduced approach. Let us briefly highlight the main processing realized here. At the first step, a sliding window moves across the time coordinate of data. Since there are N spatial time series data, the time window at each step includes N subsequences. By considering the spatial information and the generated subsequences, we form a set of spatiotemporal subsequences W1,W2, . . . , Wk . In fact, the sliding window allows us to look at the data at different time intervals. At the second step, the available structure in each set of spatiotemporal subsequences Wi, i = 1, 2, . . . k is revealed using a spatiotemporal clustering approach proposed in [3]. The result of this step is a collection of partition matrices U1, U2, . . . , Uk , each describing a set of clusters existing within the spatiotemporal subsequences. In the next step, the revealed partition matrices are exploited in two different ways. The first one is to assign an anomaly score to the revealed clusters in each time window, and the second way is to construct a fuzzy relation between clusters present in successive time steps to visualize the behavior of data in time …);
15determining a level of activity of each computer system during the baseline time periods, the level of activity being determined according to a sum of differences of vector representations for the computer system between sub-periods of the baseline time period (Izakian; Page 1613 Left Column;  Page 1616, Left Column: … a sliding window to generate a set of subsequences over the spatial time series data has been considered and FCM is employed to visualize the structure available within the resulting spatiotemporal subsequences. For each single temporal subsequence, an anomaly score is determined by taking into account its historical behavior, and the estimated anomaly scores are aggregated within each cluster …  Clustering spatiotemporal subsequences within different time windows leads to revealing a set of structures within data present at different time steps. Here, for each single subsequence inside a time window, an anomaly score is estimated based on its historical behavior, and then, the estimated anomaly scores are aggregated to determine an anomaly score for each cluster inside each time window … Let us consider the jth time window Wj . Since there are N spatial time series in dataset, Wj contains N subsequences. To assign an anomaly score for each subsequence, there are a number of methods proposed in the literature and some of them were reviewed in Section II. The strategy to estimate an anomaly score for a subsequence depends on the nature of data and the application purpose. In this paper, the anomaly score of a subsequence is considered as the average squared Euclidean distance to its previous subsequences. Formally, considering xkj to be a subsequence of spatial time series xk falling within the window Wj , its anomaly score is expressed as follows: eq 14 … After computing an anomaly score for each single subsequence inside Wj , the anomaly scores are aggregated to estimate an anomaly score for each cluster inside Wj . Assuming that U is the partition matrix resulting from clustering of spatiotemporal data corresponding to the time window Wj , the anomaly scores for the clusters located in Wj , sj = {si, i = 1, 2, . . . , cj } can be estimated using – eq 15 …);”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Izakian into the teachings of Wittenschlaeger, because it discloses that, “Detecting anomalies in spatial time series data using spatiotemporal clustering is a novel idea proposed in this study. The method introduced here visualizes structures present in different time windows to make them understandable to the end-user. Moreover, using the fuzzy relation-based model of relationships, the revealed clusters in spatiotemporal subsequences can be tracked from the structure identified in the past, leading to a thorough temporal analysis of propagation of anomalies (Izakian: Page 1613)”.
Wittenschlaeger further discloses:
“for each of the computer systems, determining anomalous behavior (Wittenschlaeger, Para [0043]: Step 370  includes detecting satisfaction of the anomaly detection criteria as a function of the anomaly criterion statuses, where satisfaction appears to indicate an anomalous behavior is present …) responsive to any of:
20an evaluation of a measure of difference between the baseline and runtime vector representations for the computer system exceeding a threshold level of difference (Wittenschlaeger, Para [0031]: …Step 320 includes establishing anomaly detection criteria as a function of a variation from the baseline vector where the criteria represent an aspect of a possible anomalous behavior (see FIG. 4). The criteria can be constructed to represent various conditions reflecting an anomalous behavior. The detection criteria can reflect that (a) an anomalous behavior has occurred, (b) an anomalous behavior is about to occur, or (c) an anomalous behavior is likely to occur. One should note the criteria depend on a variation from the baseline vector as opposed to solely based on deviations of one or more metrics. The variation can be calculated based on a variation function applied to measured behavior metric having the same member elements as the baseline vector. One example variation function can include a Chi-Square fit to members of the measure vector compared to the baseline vector. If the Chi-Squire value exceeds a threshold, an anomalous behavior might be present. It should be appreciated that temporal properties can be incorporated in the measurement of the variation. All vector-based variation functions are contemplated …),
28Attorney Docket No. 4359.282US01an evaluation of a change of cluster membership of a vector representation for the computer system between the baseline and runtime sets of clusters,

5an evaluation of a difference in a level of activity of the computer system between the baseline and the runtime time periods;” and
responsive to the determination of anomalous behavior, implementing one or more protective measures for the computer network (Wittenschlaeger, [0048-0049]: … Step 380 can include notifying a manager of the anomalous behavior. The manager can be a node operation as a fabric manager, a human being, or other entity having responsibility for managing the fabric. Notifications can be sent via an electronic communication channel (e.g., SMS, email, network management application, SNMP, etc.) as desired … A notification of the anomalous behavior can include one or more instructions on how to respond to the anomalous behavior. In some embodiments, step 383 can include migrating anomalous traffic to a monitored data channel. For example, if an intrusion is detection, the fabric can automatically reconfigure a routing topology used by the intruder so that the intruder's packets are routed toward a network operations center, a data sink, a honey pot, or the location so the anomalous traffic can be further analyzed in real time …).”
Regarding Claim 3. The combination of Wittenschlaeger-Izakian discloses the method of claim 1, Wittenschlaeger further discloses, “wherein the protective measures include one or more of:
deploying or configuring a firewall at one or more computer systems connected via the computer network;

deploying or configuring an antivirus facility at one or more computer systems connected via the computer network;
adjusting a sensitivity or a level of monitoring of a security facility in one or more computer systems connected via the computer network; or
10selectively disconnecting one or more computer systems from the computer network (Wittenschlaeger Para [0049-0050]: …  A notification of the anomalous behavior can include one or more instructions on how to respond to the anomalous behavior. In some embodiments, step 383 can include migrating anomalous traffic to a monitored data channel … When the anomaly type is known, as suggested by step 385, the notification could also include instructions on what actions should be taken to respond to the anomalous behavior based on anomaly type. Actions can include storing historical data within the black box memory, migrating black box data to another node's black box, reconfiguring a routing topology, locking one or more data channels or connected devices, or other actions …; Examiner’s Note: when devices on network are locked, they are effectively disconnected from the network for any communication).”
Regarding Claim 6. The combination of Wittenschlaeger-Izakian discloses the method of claim 1, Izakian further discloses, “wherein a difference between vector representations is evaluated 5by applying a vector similarity function (Izakian, Page 1613-1614]: …  One straightforward method for anomaly detection in time series data is to assign an anomaly score to each time series according to its similarity to the other time series existing in dataset. A suitable distance function or resemblance measure can be considered to be a similarity/ dissimilarity measure. In [14], an anomaly detection technique has been proposed for light curves in catalogues of periodic variable stars. By considering N time series x1 , x2, . . . ,xN present in the dataset, the anomaly score of a certain time series xi was expressed … Clustering is another method used for anomaly detection in time series data. In this method, time series are clustered using an appropriate clustering technique and the revealed cluster centers are exploited to assign an anomaly score to each time series. In [15], an FCM clustering was used to cluster a set of time series data, and a reconstruction criterion [4] was employed to reconstruct time series with the aid of the revealed cluster centers. Finally, a reconstruction error was used to assign an anomaly score to each time series. In [17], a set of training sequences was clustered using a k-medoids clustering, and for each test sequence, its inverse similarity to its closest medoid was considered as the anomaly score … The AR model assumes that for a value of the time series in time t, xt can be approximated using the values of its p values present in the previous time instants …  The regions of the map that the AR process was expected to move are identified and the anomalous changes of the AR process has been detected. The method was applied to a real-world industrial process. In [5], multivariate time series are modeled using a weighted graph representation, where each node of the graph corresponds to a data point or a subsequence in a time series and each edge was weighted through a similarity measure between nodes. Considering that p is the number of variables in multivariate time series, the similarity between timestamps i and j in time series was calculated with the aid of the RBF function …).”
The motivation to further combine Izakian remains same as in claim 1.
Regarding Claim 7. This is a system claim corresponding to the method claim 1 containing all the same or similar limitations as claim 1, hence similarly rejected as claim 1.
**** Wittenschlaeger also discloses processor and memory storing computer program code performing the method (Wittenschlaeger, Para [0018]: … It should be noted that while the following description is drawn to fabric networking nodes, some operating as a fabric manager, various alternative configurations are also deemed suitable and may employ various computing devices including routers, switches, interfaces, systems, databases, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclose apparatus …).
Regarding Claim 8. This is a system claim corresponding to the method claim 1 containing all the same or similar limitations as claim 1, hence similarly rejected as claim 1.
Wittenschlaeger also discloses processor and memory storing computer program code performing the method (Wittenschlaeger, Para [0018]: … It should be noted that while the following description is drawn to fabric networking nodes, some operating as a fabric manager, various alternative configurations are also deemed suitable and may employ various computing devices including routers, switches, interfaces, systems, databases, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclose apparatus …).
Claims 2, 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Pub. No.: US 2012/0131674 A1 to Wittenschlaeger (hereinafter “Wittenschlaeger”) in view of “NPL: Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach” to Izakian et al. (hereinafter “Izakian”), “IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 6, DECEMBER 2014”, as applied to claim 1 above, and further in view of Pub. No.: US 2020/0067969 A1 to ABBASZADEH et al. (hereinafter “ABBASZADEH”).
Regarding Claim 102. The combination of Wittenschlaeger-Izakian discloses the method of claim 1, Wittenschlaeger further discloses, “wherein generating each of the baseline and runtime vector representations comprises:
Wittenschlaeger , Para [0022, 0029]: …  Each node 130 preferably comprises a networking switch operating as a hybrid-fabric apparatus cable of transporting data across fabric 100 from one networking node 130 to another while also providing a distributed application engine. Distributed applications can be deployed on nodes 130 and executed as software instructions. Each node 130 can include processors, memory, ports, or other apparatus components that can be individually assigned to data transport operations, an application's execution, or other role or responsibility …  Each member of a vector can be constructed to represent various aspects of the fabric including the fabric as a whole, an apparatus, or a component as desired. Still further, a vector can also comprise member elements that reflect non-fabric elements possibly including remote devices, remote addresses, weather conditions, sensor data, geo-locations, news events, sporting events, stock prices, or other attributes associated with non-fabric information. Vectors can also comprise other vectors as a member element …; Examiner’s note: port, address/location are considered attributes of communication);
However, the combination of Wittenschlaeger-Izakian does not explicitly teach, but ABBASZADEH from same or similar field of endeavor teaches:
“15generating, for each of at least a subset of the data records, a training data item for a neural network, the training data item being derived from at least a portion of the attributes of the record and the neural network having input units and output units corresponding to items in a corpus of attribute values for communications occurring via ABBASZADEH, Para [0074, 0122]: … Under normal operation, features may be extracted from overlapping batches of time series data. The process may be continued over each overlapping batch resulting in a new time series of feature evolution in the feature space. Then, the feature time series may be used for performing system identification (i.e., dynamic modeling) to model the time evolution of features. A selected subset of the features may be used for dynamic modeling using state space system identification methods. The dynamic models may be in state space format. The dynamic modeler may use a multivariate Vector Auto-Regressive (“VAR”) model or regression models for fitting dynamic models into feature time series data at different time scales … As the number of lags in the VAR model increase, the model fits better into the training data set but there are more parameters n of the model to be estimated. The order of the VAR model, p, may selected automatically using Bayesian Information Criterion (“BIC”) or Akaike Information Criterion (“AIC”). Note that BIC may provide a good balance between the model fitness and complexity (e.g., to avoid over-fitting). The system may use a weighted average of features to compute the BIC per different lag numbers. In computing the weighted average BIC, the BIC value of each feature might be weighted by the magnitude of the feature so that the features with higher magnitudes are weighted more, and as such fitting a better model to those features becomes more important. The number of lags in the model, p, is then selected based on the value of p, that minimize the weighted averaged BIC. The identified VAR(p) model may then be converted into standard state space structure. This process may be done separately for each monitoring node, which may be the result of different values of p for each monitoring node  …); and
20training the neural network using the training data items so as to define a vector representation for each attribute value in the corpus based on weights in the neural network for an input unit corresponding to the attribute value (ABBASZADEH, Para [0110-0112, 0122]: … As the number of lags in the VAR model increase, the model fits better into the training data set but there are more parameters n of the model to be estimated. The order of the VAR model, p, may selected automatically using Bayesian Information Criterion (“BIC”) or Akaike Information Criterion (“AIC”). Note that BIC may provide a good balance between the model fitness and complexity (e.g., to avoid over-fitting). The system may use a weighted average of features to compute the BIC per different lag numbers. In computing the weighted average BIC, the BIC value of each feature might be weighted by the magnitude of the feature so that the features with higher magnitudes are weighted more, and as such fitting a better model to those features becomes more important. The number of lags in the model, p, is then selected based on the value of p, that minimize the weighted averaged BIC. The identified VAR(p) model may then be converted into standard state space structure. This process may be done separately for each monitoring node, which may be the result of different values of p for each monitoring node …).”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of ABBASZADEH into the combined teachings of Wittenschlaeger-Izakian, because it discloses that, Each decision boundary is computed by training a classifier, such as an Extreme Learning Machine as a binary classifier in a supervised training framework. An Extreme Learning Machine (“ELM”) is a special type of feed-forward neural networks that has been recently introduced. ELM was originally developed for the Single-hidden Layer Feedforward Neural-networks (“SLFNs”) and was later extended to the generalized SLFNs where the hidden layer need not be neuron alike. Unlike traditional feed-forward neural networks, where training the network involves finding all connection weights and bias, in ELM connections between input and hidden neurons are randomly generated and fixed. That is, they do not need to be trained. Thus, training an ELM becomes finding connections between hidden and output neurons only, which is simply a linear least squares problem whose solution can be directly generated by the generalized inverse of the hidden layer output matrix … (ABBASZADEH: Para [0110])”.
Regarding Claim 4. The combination of Wittenschlaeger-Izakian- ABBASZADEH discloses the method of claim 2, Wittenschlaeger further discloses, “herein the attributes of a communication include one or more of:
an address of a source of the communication (Wittenschlaeger, Para [0027, 0029] At step 310, a nominal behavior can be characterized as a baseline vector. The vector can comprise behavior metrics related to the fabric where the behavior metrics can be associated with internal aspects of the fabric or external elements (e.g., remote devices, remote addresses, etc.) beyond the edge of the fabric, as least to the extent visible to the node measuring the metrics … a vector can also comprise member elements that reflect non-fabric elements possibly including remote devices, remote addresses, weather conditions, sensor data, geo-locations, news events, sporting events, stock prices, or other attributes associated with non-fabric information. Vectors can also comprise other vectors as a member element …);
an address of a destination of the 15communication (Wittenschlaeger, Para [0027, 0029] At step 310, a nominal behavior can be characterized as a baseline vector. The vector can comprise behavior metrics related to the fabric where the behavior metrics can be associated with internal aspects of the fabric or external elements (e.g., remote devices, remote addresses, etc.) beyond the edge of the fabric, as least to the extent visible to the node measuring the metrics … a vector can also comprise member elements that reflect non-fabric elements possibly including remote devices, remote addresses, weather conditions, sensor data, geo-locations, news events, sporting events, stock prices, or other attributes associated with non-fabric information. Vectors can also comprise other vectors as a member element …);
an identification of a communications port at a source of the communication;
an identification of a communications port at a destination of the communication; an identifier of a protocol of the communication;
a size of the communication;  20a number of packets of the communication;
a set of network protocol flags used in the communication; a timestamp of the communication; or
a duration of the communication.”
Regarding Claim 5. The combination of Wittenschlaeger-Izakian discloses the method of claim 1, however it does not explicitly teach, but ABBASZADEH from same or similar field of endeavor teaches, “wherein the neural network has a single layer of hidden units logically arranged between the input units and the output units (ABBASZADEH, Fig. 19, Para [0110-0111]: … Because of the special design of the network, ELM training becomes very fast. The structure of a one-output ELM network 1900 is depicted in FIG. 19, including an input layer 1910, a hidden layer 1920, and the singe node output lager 1930 …).”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of ABBASZADEH into the combined teachings of Wittenschlaeger-Izakian, because it discloses that, “Each decision boundary is computed by training a classifier, such as an Extreme Learning Machine as a binary classifier in a supervised training framework. An Extreme Learning Machine (“ELM”) is a special type of feed-forward neural networks that has been recently introduced. ELM was originally developed for the Single-hidden Layer Feedforward Neural-networks (“SLFNs”) and was later extended to the generalized SLFNs where the hidden layer need not be neuron alike. Unlike traditional feed-forward neural networks, where training the network involves finding all connection weights and bias, in ELM connections between input and hidden neurons are randomly generated and fixed. That is, they do not need to be trained. Thus, training an ELM becomes finding connections between hidden and output neurons only, which is simply a linear least squares problem whose solution can be directly generated by the generalized inverse of the hidden layer output matrix … (ABBASZADEH: Para [0110])”.
Pertinent Prior Arts: The following prior arts made of record and not relied upon are considered pertinent to applicant's disclosure:
PAT US 10949534 B2, Martin et al.: Martin discloses a method for predicting and characterizing cyberattacks that includes: receiving, from a sensor implementing deep packet inspection to detect anomalous behaviors on the network, a first signal specifying a first anomalous behavior of a first asset on the network at a first time; representing the first signal in a first vector representing frequencies of anomalous behaviors—in a set of behavior types—of the first asset within a first time window; calculating a first malicious score representing proximity of the first vector to malicious vectors defining sets of behaviors representative of security threats; calculating a first benign score representing proximity of the first vector to a benign vector representing an innocuous set of behaviors; and in response to the first malicious score exceeding the first benign score and a malicious threshold score, issuing a first alert to investigate the network for a security threat.
Calculating a first malicious score based on proximity of the first vector to a set of malicious vectors defining sets of behaviors representative of network security threats; calculating a first benign score proportional to proximity of the first vector to a benign vector representing an innocuous set of behaviors; in response to the first malicious score exceeding the first benign score and a malicious threshold score, issuing a first alert to investigate the network for a network security threat; in response to the first benign score and the malicious threshold score exceeding the first malicious vector; and, in response to the first malicious score differing from the first benign score, by less than a threshold difference, issuing a prompt to investigate the first asset.
Aggregating the first set of signals into the first data structure comprises generating a first multi-dimensional vector representing frequencies of behaviors of each behavior type, in the predefined set of behavior types, of the first asset within a preset duration of two weeks terminating at a current time; wherein calculating the first magnitude of deviation comprises: accessing the corpus of historical data structures, each data structure in the corpus of historical data structures comprising a multi-dimensional vector representing a frequency of behaviors of an asset, in the set of assets, within a duration of two weeks prior to the current time; training a replicator neural network on the corpus of historical data structures; passing the first multi-dimensional vector through the replicator neural network to generate a first output vector; and calculating an outlier score based on a difference between the first multidimensional vector and the first output vector; and wherein generating the first alert to investigate the first asset comprises generating the first alert to investigate the first asset in response to the outlier score exceeding the deviation threshold.
PGPUB US 20120072983 A1, McCUSKER et al.: McCUSKER discloses  a method of determining, within a deployed environment over a data communication network, network threats and their associated behaviors. The method includes the steps of acquiring sensor data that identifies a specific contact, normalizing the acquired sensor data to generate transformed sensor data, deriving, for the specific contact from the transformed sensor data, a contact behavior feature vector for each of a plurality of 
Embodiments described herein relate generally to network monitoring and network forensics, more specifically to a detection system that monitors network activity, comparing current network behaviors to historical and pre-stored behaviors that are used to identify suspicious network activity.
NPL: Anomaly Detection for Discrete Sequences: A Survey, Varun Chandola: Chandola discloses a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete/symbolic sequences. The objective is to provide a global understanding of the sequence anomaly detection problem and how existing techniques relate to each other. The key contribution of this survey is the classification of the existing research into three distinct categories, based on the problem formulation that they are trying to solve. These problem formulations are: 1) identifying anomalous sequences with respect to a database of normal sequences; 2) identifying an anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence whose frequency of occurrence is anomalous. We show how each of these problem formulations is characteristically distinct from each other and discuss their relevance in various application domains. We review techniques from many disparate and disconnected application domains that 
Chandola also discloses Window-based techniques to extract fixed-length overlapping windows from a test sequence. Each window is assigned an anomaly score. The anomaly scores of all windows within a test sequence are aggregated to obtain an anomaly score for the entire test sequence. These techniques are particularly useful when the cause of anomaly can be localized to one or more shorter substring within the actual sequence [16]. If the entire sequence is analyzed as a whole, the anomaly signal might not be distinguishable (as in similarity-based techniques) from the inherent variation that exists across sequences. By analyzing a short window at a time, window based techniques try to localize the cause of anomaly within one or a few windows. The standard technique to obtain short windows from a sequence is to slide a fixed-length window, one symbol at a time, along the sequence.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAHABUB S AHMED whose telephone number is (571)272-0364.  The examiner can normally be reached on 9AM-5PM EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kambiz Zand can be reached on (571)272-3811.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MAHABUB S AHMED/Examiner, Art Unit 2434