DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 15/843,949, filed December 15, 2017.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed September 22, 2021 has been entered. Examiner acknowledges receipt of Amendments to Application 15/843,949, which include: Amendments to the Specification p.2, Amendments to the Claims pp.3-9, Appendix of amended Drawings (6 pages), and Remarks pp.10-14 (containing applicant’s amendments). 
Regarding applicant’s Remarks on p.10, examiner has acknowledged Claims 1, 5-7, 10, and 17-20 have been amended. Examiner has acknowledged Claims 2-4, 11, 13, and 15, have been canceled. Claims 1, 5-10, 12, 14, and 16-20 remain pending in the application. 
Regarding applicant’s Remarks on p.10, examiner has acknowledged applicant’s Amendments to the Specification and Amendments to the Claims have overcome each and every respective specification and claim objection previously set forth in the Non-Final Office Action mailed June 30, 2021. However, examiner has noted that the amendments have introduced new claim objections for amended Claim 1 and original Claim 12, which are indicated in the sections listed below.
Regarding applicant’s Remarks on p.10, examiner has acknowledged applicant’s Appendix of amended Drawings as replacement sheets to the original drawings, and while the majority of the drawing objections have been resolved and have been confirmed to not introduce new matter, examiner has identified one drawing objection previously set forth in the Non-Final Office Action mailed June 30, 2021 that has not been resolved, with this drawing objection identified in the section listed below. 
Regarding applicant’s Remarks on p.11, examiner acknowledges applicant’s Amendments to the Claims have resolved the indefiniteness/lack of antecedent issues identified in Claims 10-11, 13, 15, and 18-19, and therefore the respective §112(b) rejections previously set forth in the Non-Final Office Action mailed June 30, 2021 for Claims 10-11, 13, 15, and 18-19 are withdrawn. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 15/843,949, which include: Remarks pp.10-14 (containing applicant’s arguments). 
Regarding applicant’s Remarks on pp.10-11 for Claims 1 and 17-20 under 35 U.S.C. 101, examiner acknowledges applicant’s Amendments to the Claims regarding the independent Claims 1, 17, and 20 now incorporate features from dependent Claims 2-4 which were not rejected under 35 U.S.C. 101, and as such, the earlier §101 rejections previously set forth in the Non-Final Office Action mailed June 30, 2021 for Claims 1 and 17-20 are withdrawn. 
Applicant's arguments regarding examiner’s 35 U.S.C §102(a)(1) and 35 U.S.C §103 rejections have been fully considered but they are not persuasive. 
Regarding applicant’s Remarks on pp.12-13:
“Application No.: 15/843,949The independent claims, as amended, features, inter alia: 
classifying the collected, non-stationary data according to a non-Markovian, stateful classification, based on an inference model, wherein the inference model is a trained, unsupervised machine learning model, implemented as an auto-encoder by a neural network, and wherein classifying the collected, no-stationary data comprises: 
forming data points from the collected, non-stationary data; and 
for each data point of the formed data points: 
feeding the auto-encoder with said each data point for the auto-encoder to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder; 
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores; 
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree; and 
feeding the selected outputs into a supervised, machine learning model, for it to further classify the selected outputs, whereby said anomalies are detected based on outputs from the supervised model; and 
Applicant submits that the cited references do not disclose, teach, or fairly suggest the12 Docket No.: P201702562US01above emphasized claimed features. Herein, the data points are scored according to the auto- encoder reconstruction error. The scoring of the degree of anomaly is what dictates the feeding of the outputs of the unsupervised stage to the supervised stage. Williams, on the other hand, simply uses the auto-encoder to reduce the dimensionality before any other stage. Additionally, Williams does not contemplate using reconstruction errors to score an anomaly before feeding the data into the next stage. While, An, references using reconstruction errors to score anomalies, this is not analogous to the two-step process featured in the present claims using unsupervised auto-encoding and a supervised model to further classify anomalies. For these reasons, Williams and An do not disclose, teach, or fairly suggest each and every claimed feature. The other references do not cure these above mentioned deficiencies. For at least these reasons the independent claims are allowable and the subsequent dependent claims are likewise allowable by virtue of their dependency from otherwise allowable independent claims. Applicant respectfully requests withdrawal of the present claim rejections.”
Examiner has considered this argument, and has found the argument to be not persuasive. Examiner has noted that the applicant’s above arguments are directed to the amended independent claims which now incorporate features from the now-cancelled dependent Claims 2-4, as well as introducing an additional claim limitation further describing the selection of outputs from the classification as being performed based on the degree of anomaly (“selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree”), which requires further analysis and re-examination of the amended and related original claims. The additional rejections and updated claim mappings according to the applicant’s amended claims are provided in the sections indicated below.
Regarding applicant’s argument that the present claims feature a two-step process that is different from what is taught in the Williams and An references, applicant is vague as to where in the claims the applicant is defining this two-step process. Examiner assumes that the applicant is referring to the two-step encoding and decoding process performed within the auto-encoder. Williams teaches a combination model that includes an auto-encoder and a nearest-neighbor classifier to perform anomaly detection (Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”). Applicant points out that the auto-encoder in the Williams reference is primarily directed to performing dimensionality reduction, but examiner also notes that the applicant’s specification indicates that the auto-encoder in the claimed invention also performs dimensionality reduction (paragraphs [0071]-[0072]: “… Yet, an under-complete auto-encoder it typically used, for the purpose of dimensionality reduction. I.e., an under-complete auto-encoder is an auto-encoder whose code dimension is lower than the input dimension (the number of nodes in the hidden layer is less than the input layer). Thus, an under-complete auto-encoder as used herein constrains the code to have smaller dimension than the input data point, which, in turn, forces the auto-encoder to capture the most prominent features of the input data. … In some implementations, the auto-encoder takes, for each data point, n = 27 features (i.e., 27 characteristics of parsed data) in input, reduces the dimensionality internally, and then revert to 27 features.”). As known in the art, an auto-encoder is a unsupervised learning model and is defined to have both encoding and decoding functionalities, and one of the primary functions of any auto-encoder is to perform a dimensionality reduction as part of the encoding phase (interpreted here as being the “first step” mentioned by the applicant). While Williams does not explain in detail the functionality of the auto-encoder in its decoding phase (interpreted as being the “second step” mentioned by the applicant), the An reference teaches the decoding part of an auto-encoder, with respect to the reconstruction error as indicated in the original dependent Claim 3, in which this decoding phase is used to identify and detect anomalies. As indicated earlier, Williams teaches a combination of an unsupervised learning model (auto-encoder in a first stage) followed by a supervised learning model (k-nearest neighbors in a second stage). It would have been obvious for a person having ordinary skill in the art to combine the teachings of Williams and An to teach the functionality of the auto-encoder (where the input data passes through the two-step process of an encoding phase that performs dimensionality reduction as taught by Williams, followed by a decoding phase to perform reconstruction error calculation and anomaly detection as taught by An) to produce output data that is provided as input into a separate supervised learning model (also taught in Williams) to further identify, detect, and classify the anomalies identified from the output data of the auto-encoder. Hence, the combined teachings from the Williams and An references are sufficient and relevant to satisfy the claim limitations of performing this two-step process. 

Drawings
The drawings are objected to due to the following informality: Figure 3: It is unclear why there are two arrows associated with the term “SELECTION OF ANOMALY SCORES & DATA POINTS” terminating into both blocks S31 QUERY NEAREST NEIGHBORS and S29 SAMPLE. Both arrows are shown as originating from “SELECTION OF ANOMALY SCORES & DATA POINTS”. However, corresponding Figure 2 only shows one arrow labelled “SELECTION OF ANOMALY SCORES & DATA POINTS”. Hence, it is not clear whether the second arrow shown in Figure 3 is an extraneous arrow that should not be there, or whether it indicates that both anomaly scores and data points information are for both blocks S31 and S29, or whether this information is actually split into two partitions going to two different termination points (i.e., anomaly scores are for block S31 and data points are for block S29, or vice versa). Appropriate correction is required.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
The amended claims are objected to because of the following informality: 
Claim 1: A typographical error in the following claim limitation: “wherein the inference model is a trained, unsupervised machine learning model, implemented as an auto-encoder by a neural network, and wherein classifying the collected, no[n]-stationary data comprises: …”. Appropriate correction is required.
Claim 12: This claim indicates it is a dependent claim of Claim 11, which is now cancelled, and hence needs to be re-connected to the amended claim tree. Given that the now-cancelled Claim 11 was a dependent claim of Claim 10, it appears that Claim 12 should be re-connected to the claim tree as a dependent claim of Claim 10 (i.e., “The computer-implemented method according to Claim [[11]]10, wherein detecting anomalies further comprises: …”). Appropriate correction is required.

Claim Interpretation
Applicant has provided the following definitions in the specification, which will be used as part of the examination:
Non-Markovian, stateful classification: 
According to paragraphs [0046] and paragraphs [0047] in the specification: “ … non-Markovian processes involved herein keep track of prior states of the non-stationary data collected. Moreover, the stateful (also called memoryful) processes involved herein track information about the sender and/or the receiver of the non-stationary data collected 510. This can be achieved by forming data points (e.g., in the form of vectors of n features each), where data points are formed by aggregating data related to data flows from respective sources and for given time periods.
Unsupervised model: 
According to paragraph [0008], an inference model is “a trained, unsupervised machine learning model… This model can be implemented as an auto-encoder by a neural network. .. Still, the unsupervised model may be a multi-layer perceptron model, yet implemented in a form of an auto-encoder by the neural network.”. Hence the term  “unsupervised model” will be interpreted as “an inference model, implemented as an auto-encoder by a neural network”.
Supervised model: 
According to paragraph [0016], a supervised model “is configured as a nearest-neighbor classifier”. Hence the term “supervised model” will be interpreted as “a nearest-neighbor classifier”.
Cognitive algorithm: 
According to paragraph [0063], a cognitive algorithm, “cognitive model”, “machine learning model” or the like are interchangeably used. Hence the term “cognitive algorithm” will be interpreted as an algorithm or machine learning model where applicable.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 5, 16-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, published December 27, 2015, 18 pages [hereafter referred as An].
Regarding amended Claim 1, Williams teaches
(Currently Amended) A computer-implemented method for detecting anomalies in non-stationary data in a network of computing entities, the method comprising: 
collecting non-stationary data in the network (Williams Figure 5: examiner’s note: Williams teaches classifying and detecting anomalies in data using a plurality of network computers (Williams paragraph [0083]: “FIGS. 5-9 represent the generalized operation for classifying data using machine learning that may be incrementally refined based on expert input in accordance with at least one of the various embodiments. … processes 500, 600, 700, 800, and 900 described in conjunction with FIGS. 5-9 may be implemented by and/or executed on a single network computer, … and/or executed on a plurality of network computers … and/or executed on one or more virtualized computers, such as, those in a cloud-based environment). Referring to the logical flow diagram in Williams Figure 5, Williams teaches at step 502 data is collected for classification and incrementally refined for further analysis (Williams paragraph [0084]: “FIG. 5 illustrates a logical diagram of process 500 that may be arranged to classify data using machine learning that may be incrementally refined based on expert input … In step 502, data may be collected for submission to the system. The term data is used broadly to describe information requiring analysis.” and Williams paragraph [0026]: “…the data provided for classification may be real-time network information, captured/buffered network information, or the like. Also, in at least one of the various embodiments, a sensor computer may be employed to monitor and buffer some or all of the data, such as, network information in real-time”, where captured/buffered real-time network information is interpreted as non-stationary data.).); and 
while collecting the non-stationary data: 
classifying the collected, non-stationary data according to a non-Markovian, stateful classification (Williams paragraphs [0129]-[0130]: “…a sensor computer may be configured to buffer a particular amount/type of network information depending on the type of information the machine learning model may be used to classify. … one sensor computer may be arranged to buffer web server traffic information, while another sensor computer may be employed to monitor network traffic that may be associated with one or more particular users and/or user groups … the sensor computers may group the captured network information into time buckets, such that each window include the network information that was captured over a defined time interval. The duration of the time interval may be defined using configuration. For example, in at least one of the various embodiments, a time interval may be defined to be, 1 second, 10 seconds, 1 minute, 1 hour, 4 hours, 1 day, 1 week, and so on.”, where capturing monitored network traffic is done over a period of time intervals, and “associating the traffic with one or more particular users and/or user groups” represents (under broadest reasonable interpretation) identifying and capturing the data according to flows based on certain network state information provided in network packet headers such as source/destination IP addresses and corresponding source/destination ports, all of which collectively represents the definition of a non-Markovian, stateful classification of data.), 
based on an inference model, wherein the inference model is a trained, unsupervised machine learning model, implemented as an auto-encoder by a neural network (Williams Figure 5, elements 512, 516: examiner’s note: Referring to Williams Figure 5, Williams teaches the network computer/sensor computer system includes a training process using a model structure that defines the machine learning model used for classification and analysis in the system (Williams paragraph [0092]), which can include unsupervised and supervised models (Williams paragraph [0095]) as well as including a combination of models (Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”), where the auto-encoder represents an inference model performing the initial data classification.), and 
wherein classifying the collected, no[n]-stationary data comprises:
forming data points from the collected, non-stationary data (Williams Figure 5, element 504: examiner’s note: Williams teaches capturing the monitored network traffic (Williams paragraphs [0129]-[0130]) over a period of time intervals, with Williams paragraphs [0026], [0130], [0145]), where the grouping/ordering of data and performing feature extraction during data ingestion step 504 according to the data characteristics are interpreted as steps for “forming data points from the collected, non-stationary data” (Williams paragraphs [0088]-[0089]: “In step 504, data may be ingested into the system and prepared for processing. Data preparation may include a number of processes that may be required to ensure the system can interpret and handle data from various sources. The configuration of a data ingestion process depends upon the system needs and data characteristics. In at least one of the various embodiments, it may include high-level feature extraction where the output of the process is a collection of numeric values that represent all of the data upon which the system performs a classification decision.”).); and 
for each data point of the formed data points: 
feeding the auto-encoder with said each data point for the auto-encoder  (Williams Figure 5, elements 504, 512, 516, 518; paragraphs [0090], [0092], [0097]: Referring to Williams Figure 5, Williams teaches the network computer/sensor computer system includes a training process using a model structure that defines the machine learning model used for classification and analysis, where a combination of models are specified (i.e., auto-encoder and k-nearest-neighbor model (Williams paragraph [0097])) as the models used for Williams paragraph [0092]) receives training data (separated from test data; Williams paragraph [0090]) from the data ingestion step 504, corresponding to “feeding the auto-encoder with said each data point for the auto-encoder (Williams paragraph [0097]: “During the Training Process 512, the training data is processed through a training algorithm and computes the biases, weights, and transfer functions which are stored in Model(s) 518.”).) … 
… feeding the selected outputs into a supervised, machine learning model, for it to further classify the selected outputs, whereby said anomalies are detected based on outputs from the supervised model (Williams Figure 5, element 516: examiner’s note: Referring to Williams Figure 5, Williams teaches the network computer/sensor computer system includes a training process and a model structure process that defines the machine learning model used for classification and analysis (Williams paragraph [0092]), where the model structure defines the structure of each model implemented in the system, where the system can include unsupervised and supervised models (Williams paragraph [0095]), as well as including a combination of models (Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”), where the auto-encoder represents an inference model, and the outputs of the auto-encoder represent the selected outputs resulting from the inference model, and the k-nearest neighbor model represents a supervised machine learning model ); and 
detecting anomalies in the classified data (Examiner’s note: Williams teaches the system performing detection of new entities as anomalies (Williams paragraphs [0147]-[0149]: “At decision block 910, … if the network information for the detected entity is buffered, control may flow to block 912; otherwise, control may flow to decision block 914. … newly detected entities may initially be marked and/or tagged as new entities. … threshold values may be defined in configuration to indicate the amount of network information that must be captured for a given class and/or entity.  … At block 912, in at least one of the various embodiments, anomalies and/or classifications associated with the detected entity may now be included in the report information.”), where the new detected entities are described as network traffic captured by a sensor computer, the new network entities described as including (Williams paragraph [0141]: “the detection of an previously unknown/unseen instance of an application, such as, a web server, database, domain name server, user applications (e.g., games, office applications, and so on), file sharing applications, or the like.”), which requires the detection of flows through parsing of packet header information such as source/destination IP addresses and corresponding source/destination ports.).  
While Williams teaches the encoding phase of an auto-encoder, Williams does not explicitly teach
… for each data point of the formed data points: 
 … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder;
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores;
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree; …
	An teaches
… for each data point of the formed data points: 
[feeding the auto-encoder] … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder (An p.4 Algorithm 2 Autoencoder based anomaly detection algorithm: examiner’s note: Referring to Algorithm 2, An teaches 𝛉 and 𝛟 representing auto-encoder parameters that are initialized and trained (learned) by the auto-encoder during classification and determination of reconstruction errors (An p.4 2nd paragraph: “Autoencoder based anomaly detection is a deviation based anomaly detection method using semi-supervised learning. It uses the reconstruction error as the anomaly score. Data points with high reconstruction are considered to be anomalies. Only data with normal instances are used to train the autoencoder. After training, the autoencoder will reconstruct normal data very well, while failing to do so with anomaly data which the autoencoder has not encountered. Algorithm 2 shows the anomaly detection algorithm using reconstruction errors of autoencoders.”).);
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores (An p.4 Algorithm 2 Autoencoder based anomaly detection algorithm: examiner’s note: An teaches obtaining anomaly scores by calculating the auto-encoder An p.3 2nd paragraph: “Deviation based anomaly detection is mainly based on spectral anomaly detection, which uses reconstruction errors as anomaly scores. The first step is to reconstruct the data using dimension reduction methods such as principal components analysis or autoencoders. Reconstructing the input using k-most significant principal components and measuring the difference between its original data point and the reconstruction leads to the reconstruction error which can be used as an anomaly score. Data points with high reconstruction error are defined as anomalies.” and An p.4 Algorithm 2: within the for loop of the algorithm (for each input data point), calculate a reconstruction error using: “reconstruction error(i) = ∥                        
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    )
                                
                            
                            -
                            
                                
                                    g
                                
                                
                                    θ
                                
                            
                            (
                            
                                
                                    f
                                
                                
                                    ϕ
                                
                            
                            
                                
                                    
                                        
                                            x
                                        
                                        
                                            
                                                
                                                    i
                                                
                                            
                                        
                                    
                                
                            
                            )
                        
                    ∥”, where the reconstruction error corresponds to an anomaly score, with the calculated distance between the input data point and decoded and encoded phases of the auto-encoder corresponding to the degree of the anomaly.);
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree (An p.4 Algorithm 2: examiner’s note: Referring to Algorithm 2, An teaches within the for loop of the algorithm the calculation of a reconstruction error, and selecting the identified input data point as either being anomalous or not according to the reconstruction error (representing the degree of anomaly) being above a threshold α, thus corresponding to “selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold” (An p.4 Algorithm 2: within the for loop of the algorithm (for each data point): “if reconstruction error(i) > α then                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    )
                                
                            
                        
                     is an anomaly else                         
                            
                                
                                    x
                                
                                
                                    (
                                    i
                                    )
                                
                            
                        
                     is not an anomaly end if”).); …

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scoring process of Williams and incorporate the algorithm steps of calculating an auto-encoder reconstruction error of An as a way to generate anomaly scores. The motivation to combine is taught by An, as auto-encoders already reduce the original data to lower dimensional embeddings, which separates anomalies and normal data by taking out noise and other unimportant features. Thus, using lower-dimensional embeddings not only makes it easier to detect anomalies, but also using the reconstruction error as an anomaly score provides a built-in scoring process for identifying more anomalous data, resulting in improved anomaly detection behavior of the system (An p.1 last paragraph – p.2 1st paragraph: “Among many anomaly detection methods, spectral anomaly detection techniques try to find the lower dimensional embeddings of the original data where anomalies and normal data are expected to be separated from each other. After finding those lower dimensional embeddings, they are brought back to the original data space which is called the reconstruction of the original data. By reconstructing the data with the low dimension representations, we expect to obtain the true nature of the data, without uninteresting features and noise. Reconstruction error of a data point, which is the error between the original data point and its low dimensional reconstruction, is used as an anomaly score to detect anomalies. … With the advent of deep learning, autoencoders are also used to perform dimension reduction by stacking up layers to form deep autoencoders. By reducing the number of units in the hidden layer, it is expected that the hidden units will extract features that well represent the data. Moreover, by stacking autoencoders we can apply dimension reduction in a hierarchical manner, obtaining more abstract features in higher hidden layers leading to a better reconstruction of the data.”).
Regarding amended Claim 5, Williams in view of An teaches
(Currently Amended) The computer-implemented method according to claim 1, 
wherein: the unsupervised model is implemented as an under-complete auto-encoder by the neural network (Examiner’s note: An teaches an auto-encoder that has a reduced number of units in the hidden layer is by definition an ‘under-complete’ auto-encoder (An p.2 2nd paragraph: “With the advent of deep learning, autoencoders are also used to perform dimension reduction by stacking up layers to form deep autoencoders. By reducing the number of units in the hidden layer, it is expected that the hidden units will extract features that well represent the data.”).), and 
wherein classifying the collected data further comprises, performing a dimension reduction, based on said each data point (Examiner’s note: An teaches an auto-encoder that finds lower dimensional embeddings for the original input data, which is a form of dimension reduction (An p.1 last paragraph – p.2 1st paragraph: “Among many anomaly detection methods, spectral anomaly detection techniques try to find the lower dimensional embeddings of the original data where anomalies and normal data are expected to be separated from each other. After finding those lower dimensional embeddings, they are brought back to the original data space which is called the reconstruction of the original data. By reconstructing the data with the low dimension representations, we expect to obtain the true nature of the data, without uninteresting features and noise. Reconstruction error of a data point, which is the error between the original data point and its low dimensional reconstruction, is used as an anomaly score to detect anomalies. …”).).  
Regarding original Claim 16, Williams in view of An teaches
(Original) The computer-implemented method according to claim 1, 
wherein the method further comprising: 
while collecting the non-stationary data and classifying the collected non- stationary data, training a cognitive algorithm corresponding to said inference model, based on non-stationary data collected from the network, to obtain a trained model (Examiner’s note: Williams teaches training a fast learning model (representing the cognitive algorithm) in parallel with a deep learning neural network model (representing the inference model) (Williams paragraphs [0173]-[0174]: “… the DLNN model is combined with a machine learning model that can be trained quickly to recognize new sets of data, or a Fast Learning Model. … A Fast Learning Model is a machine learning model which may be less accurate than a DLNN but can be trained more quickly based on the characteristics of the algorithm, or because a subset of training data and recent feedback is presented for training … Some examples of a Fast Learning Model include, but are not limited to decision trees, and random forests. Fast Learning Models may be incrementally trained very quickly upon the introduction of new data. Those skilled in the art will appreciate that the specific algorithm utilized is chosen according to the characteristics of the data and goals of the system”).); 
substituting the inference model, as currently used to classify the non-stationary data, with the trained model (Examiner’s note: Williams teaches both DLNN and the fast learning model run the logical flow of classifying and detecting new anomalies as shown in Williams Figure 5, with a combination function process that performs analysis of the outputs between the two models, such that Williams paragraph [0175]: “the multiple output classifications are handled by Combination Function 520 responsible for assigning the best class to the data. Combination Function 520 analyzes the scores predicted by both models in combination with the confidence and performance of each model, finally selecting the class representing the highest probability of accuracy”, where this selection of the class representing the Williams paragraph [0176]: “Whenever a user, such as a Domain Expert, adjusts the predicted output of the DLNN, that data element may be submitted to the training process of the Fast Learning Model, quickly modifying and improving future output of the Fast Learning Model. Subsequent runtime scoring of the Fast Learning Model may have a higher accuracy and confidence (compared to the DLNN) for data similar to the type that have been submitted through Fast Learning Model training process. Conversely, the DLNN may have lower accuracy and confidence for the same data, but a high degree of accuracy and confidence for data that has not been submitted to the Fast Learning Model. The Combination Function 520 chooses as output whichever class or classes represent the higher accuracy and confidence.”).); and 
further classifying non-stationary data collected according to a non-Markovian, stateful classification, based on the substituted model, so as to be able to detect new anomalies in further classified data (Williams paragraph [0176]: “Whenever a user, such as a Domain Expert, adjusts the predicted output of the DLNN, that data element may be submitted to the training process of the Fast Learning Model, quickly modifying and improving future output of the Fast Learning Model. Subsequent runtime scoring of the Fast Learning Model may have a higher accuracy and confidence (compared to the DLNN) for data similar to the type that have been submitted through Fast Learning Model training process. Conversely, the DLNN may have lower accuracy and confidence for the same data, but a high degree of accuracy and confidence for data that has not been submitted to the Fast Learning Model. The Combination Function 520 chooses as output whichever class or classes represent the higher accuracy and confidence.”, where it follows that after switching over to the fast learning model, the flow in Figure 5 proceeds with using the fast learning model (“the substituted model”) to detect new anomalies until the DLNN model has been re-trained to increase its accuracy and performance (Williams paragraph [0177]-[0178]).).  
Regarding amended Claim 17, Williams teaches
(Currently Amended) A computerized system adapted to interact with a network of computing entities for detecting anomalies in non-stationary data, 
wherein the system (Williams Figure 3, elements 300, 302, 326; paragraphs [0066], [0067], [0070]-[0072]: examiner’s note: Williams teaches a network computer (representing a sensor computer; Williams paragraph [0066]) that contains a processor (Williams paragraph [0067]) and memory, storing computer readable instructions and program modules such as a classifier application and machine learning engine, thus corresponding to a computerized system.) is configured for: 
collecting non-stationary data in the network (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
while collecting said non-stationary data: 
classifying the collected, non-stationary data according to a non-Markovian, stateful classification (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
based on an inference model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
wherein the inference model is a trained, unsupervised machine learning model, implemented as an auto-encoder by a neural network (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), and 
wherein classifying the collected, non-stationary data comprises:
forming data points from the collected, non-stationary data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
for each data point of the formed data points: 
feeding the auto-encoder with said each data point for the auto-encoder (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) … 
… feeding the selected outputs into a supervised, machine learning model, for it to further classify the selected outputs, whereby said anomalies are detected based on outputs from the supervised model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
detecting anomalies in the classified data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).  
While Williams teaches the encoding phase of an auto-encoder, Williams does not explicitly teach
… for each data point of the formed data points: 
[feeding the auto-encoder] … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder;
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores;
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree; …
	An teaches
… for each data point of the formed data points: 
[feeding the auto-encoder] … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); …
	Both Williams and An are analogous art since they both teach using auto-encoders for anomaly classification and detection.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scoring process of Williams and incorporate the algorithm steps of calculating an auto-encoder reconstruction error of An as a way to generate anomaly scores. The motivation to combine is taught by An, as auto-encoders already reduce the original data to lower dimensional embeddings, which separates anomalies and normal data by taking out noise and other unimportant features. Thus, using lower-dimensional embeddings not only makes it easier to detect anomalies, but also using the reconstruction error as an (An p.1 last paragraph – p.2 1st paragraph: “Among many anomaly detection methods, spectral anomaly detection techniques try to find the lower dimensional embeddings of the original data where anomalies and normal data are expected to be separated from each other. After finding those lower dimensional embeddings, they are brought back to the original data space which is called the reconstruction of the original data. By reconstructing the data with the low dimension representations, we expect to obtain the true nature of the data, without uninteresting features and noise. Reconstruction error of a data point, which is the error between the original data point and its low dimensional reconstruction, is used as an anomaly score to detect anomalies. … With the advent of deep learning, autoencoders are also used to perform dimension reduction by stacking up layers to form deep autoencoders. By reducing the number of units in the hidden layer, it is expected that the hidden units will extract features that well represent the data. Moreover, by stacking autoencoders we can apply dimension reduction in a hierarchical manner, obtaining more abstract features in higher hidden layers leading to a better reconstruction of the data.”).
Regarding amended Claim 20, Williams teaches
(Currently Amended) A computer program product for detecting anomalies in non-stationary data in a network of computing entities, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by one or more processors (Williams Figure 3, elements 300, 302, 326; paragraphs [0066]-[0067], [0070]-[0072]: a network computer (representing a sensor computer; Williams paragraph [0066]) that contains a processor (Williams paragraph [0067]) and memory, storing computer readable instructions and program modules such as a classifier application and machine learning engine.), 
to cause to: 
collect non-stationary data in the network (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
while collecting said non-stationary data: 
classify the collected, non-stationary data according to a non-Markovian, stateful classification (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
based on an inference model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), 
wherein: the inference model is a trained, unsupervised machine learning model, implemented as an auto-encoder by a neural network (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.), and 
wherein classifying the collected, non-stationary data comprises:
forming data points from the collected, non-stationary data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
for each data point of the formed data points: 
feeding the auto-encoder with said each data point for the auto-encoder (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) … 
… feeding the selected outputs into a supervised, machine learning model, for it to further classify the selected outputs, whereby said anomalies are detected based on outputs from the supervised model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); and 
detect anomalies in the non-stationary data collected, according to the classified data (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).  
While Williams teaches the encoding phase of an auto-encoder, Williams does not explicitly teach
… for each data point of the formed data points: 
[feeding the auto-encoder] … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder;
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores; 
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree; …
	An teaches
… for each data point of the formed data points: 
[feeding the auto-encoder] … to reconstruct said each data point according to one or more parameters learned by a cognitive algorithm of the auto-encoder (This claim limitation is similar in scope to a corresponding claim limitation in Claims 1 and 17, and hence is rejected under similar rationale.); …
scoring a degree of anomaly of said each data point, according to a reconstruction error in reconstructing said each data point, to obtain anomaly scores (This claim limitation is similar in scope to a corresponding claim limitation in Claims 1 and 17, and hence is rejected under similar rationale.);
selecting outputs from the classification performed based on the degree of anomaly, wherein the outputs selected have a degree of anomaly above a threshold degree (This claim limitation is similar in scope to a corresponding claim limitation in Claims 1 and 17, and hence is rejected under similar rationale.); …
	Both Williams and An are analogous art since they both teach using auto-encoders for anomaly classification and detection.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scoring process of Williams and incorporate the algorithm steps of calculating an auto-encoder reconstruction error of An as a way to generate anomaly scores. The motivation to combine is taught by An, as auto-encoders already reduce the original data to lower dimensional embeddings, which separates anomalies and normal data by taking out noise and other unimportant features. Thus, using lower-dimensional embeddings not only makes it easier to detect anomalies, but also using the reconstruction error as an anomaly score provides a built-in scoring process for identifying more anomalous data, resulting in improved anomaly detection behavior of the system (An p.1 last paragraph – p.2 1st paragraph: “Among many anomaly detection methods, spectral anomaly detection techniques try to find the lower dimensional embeddings of the original data where anomalies and normal data are expected to be separated from each other. After finding those lower dimensional embeddings, they are brought back to the original data space which is called the reconstruction of the original data. By reconstructing the data with the low dimension representations, we expect to obtain the true nature of the data, without uninteresting features and noise. Reconstruction error of a data point, which is the error between the original data point and its low dimensional reconstruction, is used as an anomaly score to detect anomalies. … With the advent of deep learning, autoencoders are also used to perform dimension reduction by stacking up layers to form deep autoencoders. By reducing the number of units in the hidden layer, it is expected that the hidden units will extract features that well represent the data. Moreover, by stacking autoencoders we can apply dimension reduction in a hierarchical manner, obtaining more abstract features in higher hidden layers leading to a better reconstruction of the data.”).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, published December 27, 2015, 18 pages [hereafter referred as An] as applied to Claim 1; in further view of Zhou et al., Distributed Anomaly Detection by Model Sharing, 2009 International Conference on Apperceiving Computing and Intelligence Analysis, IEEE 2009, pp.297-300 [hereafter referred as Zhou].
Regarding amended Claim 6, Williams in view of An as applied to Claim 1 teaches
(Currently Amended) The computer-implemented method according to claim 
However, Williams in view of An does not teach 
wherein classifying the collected data further comprises: sorting the data points according to their corresponding anomaly scores.
Zhou teaches
wherein classifying the collected data further comprises: sorting the data points according to their corresponding anomaly scores ([Zhou p.297 col.2 Section 2 Methodology, 1st paragraph: “In this paper, we propose a novel framework for anomaly detection from distributed data sources (or sites)…”] [Zhou p.299 col.1 last paragraph; p.300 Table 1: examiner’s note: Zhou teaches using an auto-encoder neural network as an anomaly detection model.] [Zhou p.299 col.1 3rd paragraph (Section 3.2 Description of combining methods): examiner’s note: Zhou teaches anomaly detection models Mj from different sites (“network entities/computers”) computing anomaly scores ASi for each data record (“data points”) (Zhou p.298.col.1 Section 3.1 General framework for distributed anomaly detection, 1st – 2nd paragraph), and performing a combining method for the scores ASi, where one of the combining methods is an averaging method that sorts anomaly scores (“Average anomaly score. This method takes anomaly score vectors AS(j), j = 1, …, n from all the anomaly detection models Mj that are built at distributed sites and then computes an average anomaly score vector ASF. … Alternatively, we can sort anomaly score vector ASF and thus rank all new test data records from being most anomalous to less anomalous. The higher value of anomaly score means the higher probability that the new test data record is anomalous one.”).]).
	Both Williams in view of An and Zhou are analogous art since they both teach using auto-encoders for anomaly classification and detection in a plurality of network entities, and calculating anomaly scores for the classified outputs.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scoring process of Williams in view of An and incorporate the sorting step of Zhou as a way to create an sorted list of classified outputs according to their anomaly scores. The motivation to combine is taught by Zhou, as a way to aggregate outputs with associated anomaly scores collected from different network computers. Identifying those outputs/data points that are more anomalous through sorting will aid the system to identify and flag candidate data points that contain anomalous state information for further analysis and alerting, resulting in overall improvement of the anomaly detection performance in the system (Zhou p.298 col.2 1st paragraph: “Our objective is to achieve the best possible detection performance that is comparable to the performance of anomaly detection model applied when all data sets are merged together.” and Zhou p.298 col.2 Section 3.2 Description of combining methods, 1st paragraph: “The major goal of combining local anomaly detection models built at distributed sites is to improve the quality, robustness and prediction performance of the ensemble of the models.”).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, .
Regarding amended Claim 7, Williams in view of An, in further view of Zhou as applied to Claim 6 teaches
(Currently Amended) The computer-implemented method according to claim 6.
However, Williams in view of An, in further view of Zhou does not teach 
wherein classifying the collected data further comprises: normalizing the anomaly scores to obtain normalized anomaly scores.  
Tuor teaches
wherein classifying the collected data further comprises: normalizing the anomaly scores to obtain normalized anomaly scores (Examiner’s note: Tuor teaches using an auto-encoder to perform anomaly detection (Tuor p.4 col.1 last paragraph – p.4 col.2 1st paragraph) and computing a weighted moving average estimate of the mean and variance for anomaly scores and standardizing each score, where the standardization of scores is interpreted as a form of normalization (Tuor p.4 col.2 Detecting Insider Threat, 1st-2nd paragraphs: “We assume the following conditions: our model produces anomaly scores … Because our model is trained in an online fashion, the anomaly scores start out quite large (when the model knows nothing about normal behavior) and trend lower over time (as normal behavior patterns are learned). To place the anomaly score for user u at time t in the proper context, we compute an exponentially weighted moving average estimate of the mean and variance of these anomaly scores and standardize each score as it arrives.”).).  
Williams in view of An, in further view of Zhou and Tuor are analogous art since they both teach using auto-encoders for anomaly detection and calculating anomaly scores for the classified outputs.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the scoring process (producing sorted anomaly scores associated with the classified outputs) of Williams in view of An, in further view of Zhou and incorporate the normalization step of Tuor as a way to normalize the sorted anomaly scores for the classified outputs. The motivation to combine is taught by Tuor, as a way to standardize the anomaly scores from data received in an online, real-time system (i.e., non-stationary data) through calculation of a weighted moving average of mean and variance. Giving the anomaly scores the appropriate context and scale relative to other surrounding data occurring within the same time interval will make it easier to perform comparisons and analysis against recent occurring data, thus improving the anomaly detection reliability and accuracy of the system (Tuor p.4 col.2 Detecting Insider Threat, 1st-2nd paragraphs: “We assume the following conditions: our model produces anomaly scores, … Because our model is trained in an online fashion, the anomaly scores start out quite large (when the model knows nothing about normal behavior) and trend lower over time (as normal behavior patterns are learned). To place the anomaly score for user u at time t in the proper context, we compute an exponentially weighted moving average estimate of the mean and variance of these anomaly scores and standardize each score as it arrives.”).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, published December 27, 2015, 18 pages [hereafter referred as An], in further view of Zhou et al., Distributed Anomaly Detection by Model Sharing, 2009 International Conference on Apperceiving Computing and Intelligence Analysis, IEEE 2009, pp.297-300 [hereafter referred .
Regarding original Claim 8, Williams in view of An, in further view of Zhou, in even further view of Tuor as applied to Claim 7 teaches
(Original) The computer-implemented method according to claim 7.
 However, Williams in view of An, in further view of Zhou, in even further view of Tuor does not teach
wherein classifying the collected data further comprises: thresholding the normalized anomaly scores to obtain a selection of anomaly scores and a corresponding selection of data points.  
Elovici teaches
wherein classifying the collected data further comprises: thresholding the normalized anomaly scores to obtain a selection of anomaly scores and a corresponding selection of data points (Elovici Figure 5, elements 245, 255, 265: examiner’s note: Elovici teaches performing comparison of normalized anomaly scores against thresholds to determine whether a sequence ‘p’ (“data point”) is considered anomalous (Elovici paragraph [0042]), where data sequences ‘p’ consist of temporal series of multi-valued events (Elovici Summary paragraph [0002]), thus generating a selection of data points (Elovici paragraph [0050]: “If at step 245, the gain is greater than the test threshold, typically zero, then "p" may have sufficient affinity to be considered "normal". This is determined by a calculation at a step 255. An anomaly score, AS, for "p" is calculated, as described above, and the score is compared with a threshold. Typically the score is normalized within a range of 0 to 1. If the AS if greater than the threshold, then "p" is considered anomalous. … At a step 265, following either step 255 or 260, the result of the anomaly test is output to a system that will apply the test, typically a classification system in one of the domains described above (e.g., machine testing, computer operations, behavior analysis, etc.).”).).  
Both Williams in view of An, in further view of Zhou, in even further view of Tuor and Elovici are analogous art since they both teach anomaly detection of data sequences and calculating anomaly scores for the classified outputs.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take scoring process (producing sorted, normalized scores associated with the classified outputs) of Williams in view of An, in further view of Zhou, in even further view of Tuor and incorporate the thresholding step of Elovici as a way to obtain a final selection of anomaly scores with associated classified outputs. The motivation to combine is taught by Elovici, as providing thresholds allows a user the ability to specify additional constraints on a set of data sequences (“data points”) in order to focus on a set of detected patterns that may be deemed as being more anomalous than others, thereby improving the anomaly detection reliability and accuracy of the system (Elovici Summary paragraph [0004]: “… generating the second support level may further include determining a second set of patterns in the second set of interaction sequences, patterns of each set may satisfy one or more pre-defined constraints, and the first and second support levels are indicative of the incidence of the first and second sets of patterns in the respective first and second sets of interactive sequences. … In some embodiments, the interaction sequences are temporally ordered, and the one or more pre-defined constraints include a sustainability constraint that a pattern shall appear as a common motif in interaction sequences generated within a predefined period of time. Alternatively or additionally, the one or more pre-defined constraints may include a frequency constraint that a pattern shall appear as a common motif within a minimum number of interaction sequences. Alternatively or additionally, the one or more pre-defined constraints include a recognition constraint that an aggregate affinity measure of the pattern shall exceed a pre-defined threshold, include the aggregate affinity measure is an aggregation of all of the affinities represented by the pattern.”).
Claims 9-10, 12, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, published December 27, 2015, 18 pages [hereafter referred as An], in further view of Zhou et al., Distributed Anomaly Detection by Model Sharing, 2009 International Conference on Apperceiving Computing and Intelligence Analysis, IEEE 2009, pp.297-300 [hereafter referred as Zhou], in even further view of Tuor et al., Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams, arXiv:1710.00811v1, October 2, 2017, 9 pages [hereafter referred as Tuor], in even further view of Elovici et al., WO2018/037411, filed 8/23/2017 [hereafter referred as Elovici] as applied to Claim 8; in even further view of Casas et al., UNADA: Unsupervised Network Anomaly Detection Using Sub-space Outliers Ranking, In Networking 2011, Part I, LNCS 6640, 2011 IFIP International Federation for Information Processing, pp.40-51 [hereafter referred as Casas].
Regarding original Claim 9, Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici as applied to Claim 8 teaches
(Original) The computer-implemented method according to claim 8, 
wherein classifying the collected non-stationary data further comprises: 
feeding the selection of data points into a supervised, machine learning model, for it to further classify the selection of data points (Williams Figure 5, elements 512, 516: examiner’s note: Referring to Williams Figure 5, Williams teaches the network computer/sensor computer system (Williams Figure 3, elements 300, 302, 326; paragraphs [0070]-[0072]) includes a training process using a model structure that defines the machine learning model used for classification and analysis in the system (Williams paragraph [0092]), which can include unsupervised and supervised models (Williams paragraph [0095]) as well as including a Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”), where the auto-encoder represents an inference model performing the initial data classification, the outputs of the auto-encoder represent the selected outputs resulting from the inference model (“selecting outputs from the classification performed thanks to the inference model”), and the k-nearest neighbor model represents a supervised machine learning model classifier for detecting anomalies (where under the broadest reasonable interpretation, the detection of anomalies is considered a form of “further classification”). Given the specified model combination of the auto-encoder, followed by a k-nearest-neighbor model, it logically follows that the outputs from the auto-encoder will be selected to be used as inputs into the k-nearest-neighbor-model.) …  
However, Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici does not teach
[feeding the selection of data points] … for [the supervised, machine learning model] to further classify the selection of data points, whereby said anomalies are detected based on outputs from the supervised model.
Casas teaches
[feeding the selection of data points] … for [the supervised, machine learning model] to further classify the selection of data points, whereby said anomalies are detected based on outputs from the supervised model (Examiner’s note: Casas teaches applying the DBSCAN clustering algorithm (which is a form of a nearest-neighbors algorithm) by performing a query on a subset of selected data points Xi of lower dimension (Casas p.45 1st paragraph) provided into the DBSCAN algorithm (where the data points Xi were selected through a set of constraints for Pi (each of which represents a class label for those data points within each cluster) and an associated set of q(i) outliers (“said anomalies are detected based on outputs from the supervised model”), where the DBSCAN clustering algorithm represents a “supervised, machine learning model” as it is performing further classification of the data points into clusters (Casas pp.44 last paragraph – p.45 1st paragraph (Section 4.1 Clustering Ensemble and Sub-space Clustering): “Each of the N sub-spaces Xi ⊂ X is obtained by selecting k features from the complete set of m attributes. … Each partition Pi is obtained by applying DBSCAN [13] to sub-space Xi. DBSCAN is a powerful clustering algorithm that discovers clusters of arbitrary shapes and sizes [7], relying on a density-based notion of clusters: clusters are high-density regions of the space, separated by low-density areas. This algorithm perfectly fits our unsupervised traffic analysis, because it is not necessary to specify a-priori difficult to set parameters such as the number of clusters to identify. Results provided by applying DBSCAN to sub-space Xi are twofold: a set of p(i) clusters {                        
                            
                                
                                    C
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    C
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    C
                                
                                
                                    p
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    } and a set of q(i) outliers {                        
                            
                                
                                    o
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    o
                                
                                
                                    q
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    }.”).).
Both Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici and Casas are analogous art since they both teach anomaly detection using machine learning techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take sorted, normalized anomaly scores associated with the classified outputs of Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici and use them as inputs into the DBSCAN clustering algorithm of Casas as a way to further classify and detect anomalies from the selection of data points. The motivation to combine is taught by Casas, since the DBSCAN clustering algorithm is ideal for clustering data points in lower dimensions, which leads to improved anomaly (Casas p.45 1st paragraph: “Using small values for k provides several advantages: firstly, doing clustering in low-dimensional spaces is more efficient and faster than clustering in bigger dimensions. Secondly, density-based clustering algorithms such as DBSCAN provide better results in low-dimensional spaces [7], because high-dimensional spaces are usually sparse, making it difficult to distinguish between high and low density regions.”).
Regarding amended Claim 10, Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici, in even further view of Casas teaches
(Currently Amended) The computer-implemented method according to claim 9, wherein: 
the supervised model is configured as a nearest-neighbor classifier (Examiner’s note: Casas teaches applying the DBSCAN clustering algorithm (which is a form of a nearest-neighbors algorithm) by performing a query on a subset of selected data points Xi of lower dimension (Casas p.45 1st paragraph) provided into the DBSCAN algorithm (where the data points Xi were selected through a set of constraints for a particular set of k features out of possible m attributes), and getting a set of clusters Pi (each of which represents a class label for those data points within each cluster) and an associated set of q(i) outliers (“said anomalies are detected based on outputs from the supervised model”), where the DBSCAN clustering algorithm represents a “supervised, machine learning model” as it is performing further classification of the data points into clusters (Casas pp.44 last paragraph – p.45 1st paragraph (Section 4.1 Clustering Ensemble and Sub-space Clustering): “Each of the N sub-spaces Xi ⊂ X is obtained by selecting k features from the complete set of m attributes. … Each partition Pi is obtained by applying DBSCAN [13] to sub-space Xi. DBSCAN is a powerful clustering algorithm that discovers clusters of arbitrary shapes and sizes [7], relying on a density-based notion of clusters: clusters are high-density regions of the space, separated by low-density areas. This algorithm perfectly fits our unsupervised traffic analysis, because it is not necessary to specify a-priori difficult to set parameters such as the number of clusters to identify. Results provided by applying DBSCAN to sub-space Xi are twofold: a set of p(i) clusters {                        
                            
                                
                                    C
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    C
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    C
                                
                                
                                    p
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    } and a set of q(i) outliers {                        
                            
                                
                                    o
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    o
                                
                                
                                    q
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    }.”).), and 
wherein further classifying the selection of data points comprises: 
querying, for each data point of said selection of data points fed into the supervised model, nearest-neighbors of said each data point in the selection of data points, wherein the nearest-neighbor is based on a computed distance of said each data point (Casas p.46 Algorithm 1. EA4RO Algorithm; Casas p.45 Section 4.2 Ranking Outliers Using Evidence Accumulation, 2nd paragraph: examiner’s note: Casas teaches running the DBSCAN clustering algorithm (“nearest-neighbor algorithm”) within the EA4RO algorithm by searching (“querying”) through the selection of data points (Algorithm 1, lines 4-10), which in turns further classifies the selection of data points by constructing a “dissimilarity vector D ∈ Rn in which it accumulates the distance between the different outliers                         
                            
                                
                                    o
                                
                                
                                    j
                                
                                
                                    i
                                
                            
                        
                     found in each sub-space i = 1, ..,N and the centroid of the corresponding sub-space-biggest-cluster                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                    . The idea is to clearly highlight those flows that are far from the normal-operation traffic at each of the different sub-spaces, statistically represented by                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                    .”. Casas further teaches (Casas p.46 Algorithm 1. EA4RO Algorithm; Casas p.46 1st paragraph (Section 4.2 Ranking Outliers Using Evidence Accumulation)) that the distances are computed and added to the dissimilarity vector D (Algorithm 1 line 9), where “… instead of using a simple Euclidean distance as a measure of dissimilarity, we compute the Mahalanobis distance dM between outliers and the centroid of the biggest cluster. The Mahalanobis distance takes into account the correlation between samples, dividing the standard Euclidean distance by the variance of the samples. This permits to boost the degree of abnormality of an outlier when the variance of the samples is smaller.”, such that this distance calculation corresponds to “wherein the nearest-neighbor is based on a computed distance of said each  weighting factor to this computed distance (Algorithm 1 line 9), such that “The weighting factor wi is used as an outlier-boosting parameter, as it gives more relevance to those outliers that are “less probable”: wi  takes bigger values when the size                         
                            
                                
                                    n
                                
                                
                                    
                                        
                                            m
                                            a
                                            x
                                        
                                        
                                            i
                                        
                                    
                                
                            
                        
                    of cluster                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                     is closer to the total number of flows n.”, which in turns identifies the nearest-neighbors relative to the cluster based on the distance, which then leads to a ranking (Algorithm 1 line 11) based on the data points included within dissimilarity vector D (“In the last part of EA4RO, flows are ranked according to the dissimilarity obtained in D, and the anomaly detection threshold Th is set.”) using the distances compared against the existing clusters                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                    , where clusters                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                     represent class groupings with the identified nearest-neighbors.).  
Regarding original Claim 12, Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici, in even further view of Casas teaches
(Original) The computer-implemented method according to claim [[11]]10, 
wherein detecting anomalies further comprises: triggering an anomaly alert based on a rating associated with said each data point ([Casas p.46 2nd paragraph (Section 4.2 Ranking Outliers Using Evidence Accumulation): “The computation of Th is simply achieved by finding the value for which the slope of the sorted dissimilarity values in Drank presents a major change. In the evaluation section we explain how to perform this computation with an example of real traffic analysis. Anomaly detection is finally done as a binary thresholding operation on D: if D(i) > Th, UNADA flags an anomaly in flow yi.”] [Casas p.49 Figure 2.a: examiner’s note: Casas teaches applying threshold on the                         
                            
                                
                                    D
                                
                                
                                    r
                                    a
                                    n
                                    k
                                
                            
                        
                     to detect attacks and generating anomaly alerts (Casas p. 48 Section 5.2 Detecting Attacks in MAWI Traffic, 1st paragraph: “Setting the detection threshold according to the previously discussed approach results in Th1 . Indeed, if we focus on the shape of the ranked dissimilarity in figure 2.(a), we can clearly appreciate a major change in the slope after the 5th ranked flow. Note however that both attacks can be easily detected and isolated from the anomalous but yet legitimate traffic without false alarms, using for example the threshold Th2 on D.”).]).  
Regarding original Claim 14, Williams in view of An, in further view of Zhou, in even further view of Tuor, in even further view of Elovici, in even further view of Casas teaches
(Original) The computer-implemented method according to claim 10, 
wherein: the supervised model is coupled to a validation expert system (Williams Figure 5, elements 518, 520, 522, 530: examiner’s note: Williams teaches a domain expert 530 receiving information from scoring process 522 (producing anomaly scores) from outputs from the model(s) 518 (implemented as a combination of unsupervised auto-encoder model followed by the supervised nearest-neighbor model) and (optionally) combination function 520; this connection flow 518, 520, (520) and 530 represents a coupling between the supervised model and the domain expert (representing a validation expert system) (Williams paragraphs [0100]-[0102]: “… once the Model(s) 518 have been stored, a test of the system's performance will execute prior to any runtime scoring. Both testing and runtime scoring utilize Scoring Process 522, which applies Model(s) 518 to the input data and executes Combination Function 520 to select the correct predicted classification, when appropriate. … Scoring Process 522 assigns a score to incoming data, ranking said data as a member of a class (or label), or as an anomalous data point. Runtime scoring delivers new data to the Scoring Process and makes those results available to the Domain Expert Analysis component 530.”).), and 
wherein the method further comprises: feeding the validation expert system with a sample of outputs from the supervised model, said outputs comprising data points as further classified by the supervised model, for the validation expert system to validate anomaly ratings associated to data points corresponding to said sample (Examiner’s note: Williams teaches a domain expert decision 536 within the domain expert 530 receiving scores from scoring process  (Williams paragraphs [0218]-[0221]: “… anomaly detectors may be trained such that the length of the sequence of access pattern records may vary and multiple time windows may be used to analyze the data. … if Model(s) 518 are trained with the Training Corpus 508, new data may be ingested upon its availability from the file, database, and application servers and delivered to the Scoring Process 522. If the sequence of recent access records may be classified as similar to a known pattern of authorized usage, or as an anomaly, then the access record may be called to a security Domain Experts attention using User Interface 532 and Alerts 534, or processed automatically for further action with Decision 536 or via an external method. … In at least one of the various embodiments, if there may be a pending security investigation or anomalous access pattern detected for a given user, group, or content area, the old data may be maintained and the Model(s) 518 not retrained until it is certain that the new data does not represent unauthorized usage or an anomalous pattern of behavior. Feedback gathered during Domain Expert Decision 536 may also considered if deciding when it may be appropriate to retrain on new access record data.”).).  
Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Williams, Jr. et al., U.S. PGPUB 2105/025455, published 9/10/2015 [hereafter referred as Williams] in view of An et al., Variational Autoencoder based Anomaly Detection using Reconstruction Probability, published December 27, 2015, 18 pages [hereafter referred as An] as applied to .
Regarding amended Claim 18, Williams in view of An teaches
(Currently Amended) The computerized system 
wherein: the system comprises a memory storing both an inference model, which is a trained, unsupervised machine learning model, and a nearest-neighbor classifier model, which is a supervised machine learning model (Williams Figure 5, element 516: examiner’s note: Referring to Williams Figure 5, Williams teaches the network/sensor computer system includes a training process and a model structure process that defines the machine learning model used for classification and analysis (Williams paragraph [0092]), where the model structure defines the structure of each model implemented in the system, where the system can include unsupervised and supervised models (Williams paragraph [0095]), as well as including a combination of models (Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”), with the auto-encoder representing inference model, and the outputs of the auto-encoder representing the selected outputs resulting from the inference model, and the k-nearest neighbor model detecting anomalies representing a supervised machine learning model classifier (where the detection or prediction of anomalies is considered a form of classification).), and 
wherein the system is further configured to: select outputs from data as classified with said inference model and feed the selected outputs into the supervised, machine learning model (Williams Figure 5, elements 512, 516: examiner’s note: Referring to Williams Figure 5, the  (Williams paragraph [0092]), which can include unsupervised and supervised models (Williams paragraph [0095]) as well as including a combination of models (Williams paragraph [0097]: “Model Structure 516 may also include a specification of a combination of the machine learning models described above, together with additional machine learning models that consume the output of DLNN models. For example, configuring an auto-encoder to reduce the dimensionality of input data, followed by a k-Nearest-Neighbor model used to detect anomalies in the reduced dimensionality space.”), where the auto-encoder represents an inference model performing the initial data classification, the outputs of the auto-encoder represent the selected outputs resulting from the inference model (“selecting outputs from the classification performed thanks to the inference model”), and the k-nearest neighbor model represents a supervised machine learning model classifier for detecting anomalies. Given the specified model combination of the auto-encoder, followed by a k-nearest-neighbor model, it logically follows that the outputs from the auto-encoder will be selected to be used as inputs into the k-nearest-neighbor-model.) …  
However, Williams in view of An does not teach
… [feed the selected outputs] … so as to detect said anomalies based on outputs from the supervised model.
Casas teaches
… [feed the selected outputs] … so as to detect said anomalies based on outputs from the supervised model (Examiner’s note: Casas teaches applying the DBSCAN clustering algorithm (which is a form of a nearest-neighbors algorithm) by performing a query on a subset of selected data points Xi of lower dimension (Casas p.45 1st paragraph) provided into the DBSCAN algorithm (where the data points Xi were selected through a set of constraints for a particular set of k features out of possible m attributes), and getting a set of clusters Pi (each of  (Casas pp.44 last paragraph – p.45 1st paragraph (Section 4.1 Clustering Ensemble and Sub-space Clustering): “Each of the N sub-spaces Xi ⊂ X is obtained by selecting k features from the complete set of m attributes. … Each partition Pi is obtained by applying DBSCAN [13] to sub-space Xi. DBSCAN is a powerful clustering algorithm that discovers clusters of arbitrary shapes and sizes [7], relying on a density-based notion of clusters: clusters are high-density regions of the space, separated by low-density areas. This algorithm perfectly fits our unsupervised traffic analysis, because it is not necessary to specify a-priori difficult to set parameters such as the number of clusters to identify. Results provided by applying DBSCAN to sub-space Xi are twofold: a set of p(i) clusters {                        
                            
                                
                                    C
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    C
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    C
                                
                                
                                    p
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    } and a set of q(i) outliers {                        
                            
                                
                                    o
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    o
                                
                                
                                    q
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    }.”).).
Both Williams in view of An and Casas are analogous art since they both teach anomaly detection using machine learning techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take sorted, normalized anomaly scores associated with the classified outputs from the auto-encoder of Williams in view of An and use them as inputs into the DBSCAN clustering algorithm of Casas as a way to further classify the selection of data points. The motivation to combine is taught by Casas, since the DBSCAN clustering algorithm is ideal for clustering data points in lower dimensions, which leads to improved anomaly classification and detection performance in the system (Casas p.45 1st paragraph: “Using small values for k provides several advantages: firstly, doing clustering in low-dimensional spaces is more efficient and faster than clustering in bigger dimensions. Secondly, density-based clustering algorithms such as DBSCAN provide better results in low-dimensional spaces [7], because high-dimensional spaces are usually sparse, making it difficult to distinguish between high and low density regions.”).
Regarding amended Claim 19, Williams in view of An, in further view of Casas teaches
(Currently Amended) The computerized system 
wherein: the system further comprises a validation expert system configured to couple to the supervised model (Williams Figure 5, elements 518, 520, 522, 530: examiner’s note: Williams teaches a domain expert 530 receiving information from scoring process 522 (producing anomaly scores) from outputs from the model(s) 518 (implemented as a combination of unsupervised auto-encoder model followed by the supervised nearest-neighbor model) and (optionally) combination function 520; this connection flow 518, 520, (520) and 530 represents a coupling between the supervised model and the domain expert (representing a validation expert system) (Williams paragraphs [0100]-[0102]: “… once the Model(s) 518 have been stored, a test of the system's performance will execute prior to any runtime scoring. Both testing and runtime scoring utilize Scoring Process 522, which applies Model(s) 518 to the input data and executes Combination Function 520 to select the correct predicted classification, when appropriate. … Scoring Process 522 assigns a score to incoming data, ranking said data as a member of a class (or label), or as an anomalous data point. Runtime scoring delivers new data to the Scoring Process and makes those results available to the Domain Expert Analysis component 530.”).), 
so as for the validation expert system to take as input a sample of outputs from the supervised model (Williams Figure 5, elements 518, 520, 522, 530: examiner’s note: Williams teaches a domain expert 530 receiving information from scoring process 522 (producing anomaly scores) from outputs from the model(s) 518 (implemented as a combination of unsupervised auto-encoder model followed by the supervised nearest-neighbor model) and (optionally) combination function 520; this connection flow 518, 520, (520) and 530 represents a  (Williams paragraphs [0100]-[0102]: “… once the Model(s) 518 have been stored, a test of the system's performance will execute prior to any runtime scoring. Both testing and runtime scoring utilize Scoring Process 522, which applies Model(s) 518 to the input data and executes Combination Function 520 to select the correct predicted classification, when appropriate. … Scoring Process 522 assigns a score to incoming data, ranking said data as a member of a class (or label), or as an anomalous data point. Runtime scoring delivers new data to the Scoring Process and makes those results available to the Domain Expert Analysis component 530.”).) and 
the supervised model to take as input a fraction of outputs obtained from the validation expert system (Examiner’s note: Casas teaches applying the DBSCAN clustering algorithm (which is a form of a nearest-neighbors algorithm) by performing a query on a subset of selected data points Xi of lower dimension (Casas p.45 1st paragraph) provided into the DBSCAN algorithm (where the data points Xi were selected through a set of constraints for a particular set of k features out of possible m attributes), and getting a set of clusters Pi (each of which represents a class label for those data points within each cluster) and an associated set of q(i) outliers (“said anomalies are detected based on outputs from the supervised model”), where the DBSCAN clustering algorithm represents a “supervised, machine learning model” as it is performing further classification of the data points into clusters (Casas pp.44 last paragraph – p.45 1st paragraph (Section 4.1 Clustering Ensemble and Sub-space Clustering): “Each of the N sub-spaces Xi ⊂ X is obtained by selecting k features from the complete set of m attributes. … Each partition Pi is obtained by applying DBSCAN [13] to sub-space Xi. DBSCAN is a powerful clustering algorithm that discovers clusters of arbitrary shapes and sizes [7], relying on a density-based notion of clusters: clusters are high-density regions of the space, separated by low-density areas. This algorithm perfectly fits our unsupervised traffic analysis, because it is not necessary to specify a-priori difficult to set parameters such as the number of clusters to identify. Results provided by applying DBSCAN to sub-space Xi are twofold: a set of p(i) clusters {                        
                            
                                
                                    C
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    C
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    C
                                
                                
                                    p
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    } and a set of q(i) outliers {                        
                            
                                
                                    o
                                
                                
                                    1
                                
                                
                                    i
                                
                            
                        
                    ,                         
                            
                                
                                    o
                                
                                
                                    2
                                
                                
                                    i
                                
                            
                        
                    , ..,                         
                            
                                
                                    o
                                
                                
                                    q
                                    (
                                    i
                                    )
                                
                                
                                    i
                                
                            
                        
                    }.”). Casas further teaches in Casas p.45 last paragraph – p.46 1st paragraph (Section 4.2 Ranking Outliers Using Evidence Accumulation) and Casas p.46 Algorithm 1 that                         
                            
                                
                                    δ
                                
                                
                                    i
                                
                            
                        
                     is defined as “the maximum neighborhood distance of a sample to identify dense regions”, which represents the set of points that were already classified by the supervised model and located within each existing cluster                         
                            
                                
                                    C
                                
                                
                                    m
                                    a
                                    x
                                
                                
                                    i
                                
                            
                        
                    , and is set in Algorithm 1 line 5 to “a fraction of the average distance between flows in sub-space Xi (we take a fraction 1/10), which is estimated from 10% of the flows, randomly selected.”; this is interpreted as the feed-back of the fraction of inputs obtained from the validation expert system to be used as inputs to the supervised model.).  

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121