DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Claims 1-10 are pending.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more Claim(s) particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more Claim(s) particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim(s) 1-10 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claims 1 and 9:
Claims 1 and 9 recite limitations “the autoencoder”. There is insufficient antecedent basis for this limitation in the claim. It is unclear whether it points to “an autoencoder for each cluster” or “each autoencoder” which can be different subject matters. One way to overcome the rejection is to link “an autoencoder for each cluster” and “each autoencoder” together. E.g., to amend the latter to be “each autoencoder of the cluster”.
Claims 1 and 9 recite limitation “the probabilistic models” which lacks antecedent basis for the limitation.
Claim 9 also recites limitation “the respective autoencoder” which lacks antecedent basis for the limitation. One way to overcome the rejection is to amend the limitation to be “a respective autoencoder” or “the 

Claim(s) 2-8 and 10 is/are rejected under 112(b) for the same reason as given in their respective base claim(s). Some of these claims also recite the same limitations as their respective base claim(s) which lack antecedent basis.


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claim(s) 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ranjan et al (US8402543) in view of Anderson et al (US20160306971).

Regarding claims 1 and 9, Ranjan teaches a method of anomaly detection for network traffic communicated by devices via a computer network, the method comprising:
(Ranjan, Fig. 1, “a system (100) for machine learning based botnet detection”, c5:25-60)
clustering a set of time series, each time series including a plurality of time windows of data corresponding to network communication characteristics for a device;
(Ranjan, Fig. 3, ST303, “Include the malicious data instance and the non-malicious data instance in a training data set including a collection of malicious data instances and non-malicious data instances”; malicious data instance cluster, non-malicious data instance cluster; the data instances can be time sequences, “supervised machine learning process…counting the number of traffic data units (e.g., flows, packets, bytes, etc.) exchanged between a set of servers and a set of clients that are known malicious clients and known non-malicious clients. Features based on such statistical counts represent a measure of communication activities associated with known malicious clients and known non-malicious clients and are included in a training data set for generating the classification model”, c4:55-end, c5:1-10; further, “any number of features may be included in the feature vector for each data instance in the training data set (134)”, c10:10-15; that is, the plurality of data instances in either the malicious data instance cluster or the non-malicious data instance cluster can include, e.g., only one respective feature, for simplicity of discussions)
training an autoencoder for each cluster based on a time series in the cluster;
(Ranjan, Fig. 3, ST304, “In Step 304, using a pre-determined machine learning algorithm, a classification model is generated based on the training data set. In particular, when the classification model is applied to each malicious data instance by a classifier, the classifier generates a malicious label. When the classification model is applied to each non-malicious data instance by the classifier, the classifier generates a non-malicious label. In one or more embodiments, the pre-determined machine learning algorithm includes a support vector machine (SVM) algorithm and the classification model includes a decision surface of the SVM”; c13:60-end, c14:1-5; “a variation of a SVM classifier known as least-square support vector machine (LS-SVM) is used”, c15:1-5; Fig. 1, online classifier 126  => autoencoder)
generating, for each autoencoder, a set of reconstruction errors for the autoencoder based on testing the autoencoder with data from time windows of at least a subset of the time series from which the autoencoder was trained;
(Ranjan, Fig. 3, ST303-304, use the malicious data instance and the non-malicious data instance to train a classification model (classifier 126 or autoencoder); the trained model is used for classifying the un-classified data (ST306-308); obviously, the same trained model generated from a particular classifier or an autoencoder may be applied to different classifiers or autoencoders for classifying/labeling the trained network traffic features; ST307, since each of data instance in either the above malicious data instance cluster or the above non-malicious data instance cluster produces (reconstructs) a correct label output from the classification model, in this case the classification error of reconstructing the label output of the classification model should be smaller than a predetermined threshold)
generating a probabilistic \\model\\ of reconstruction errors for the autoencoder; and
(Ranjan, the classification error rate in Fig. 4A can be equivalently presented using number features in the x-axis to replace percentage of the features; note that the shown classification error rate is always associated a certain underlined statistical or probabilistic process; the error rate is either per the malicious data instance cluster or per the non-malicious data instance cluster; “the pre-determined machine learning algorithm includes a support vector machine (SVM) algorithm”, c10:25-30)
	Ranjan does not expressly disclose but Anderson teaches:
	a probabilistic model of reconstruction errors;
(Anderson, “SVMs and Gaussian processes were used for the classification process”, [0092]; that is, the classification errors in a particular range of features as shown in Fig. 1A of Ranjan can be modeled as a Gaussian process due to its simplicity and generality (Anderson, “Gaussian processes are a good probabilistic alternative to support vector machines for kernel learning. A Gaussian process can be completely specified by a mean function m and covariance (kernel) function K, although the mean function is often taken to be zero without loss of generality”, [0055]))
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Anderson into the system or method of Ranjan in order to best estimate the classification errors in a machine learning model using SVM and Gaussian process for simplicity and generality. The combination of Ranjan and Anderson also teaches other enhanced capabilities.
	The combination of Ranjan and Anderson further teaches:
generating an aggregation of the probabilistic models for, in use, detecting reconstruction errors for a time series of data corresponding to network communication characteristics for a device as anomalous.
(Ranjan, Fig. 4A; following the above discussions, when only one feature is included in either the malicious data instance cluster or the non-malicious data instance cluster, the error rate is the highest under a single-feature statistical model; when two or more features included in a data instance, it is seen that the classification error rate can be significantly lower due to the underlined and well-known joint/aggregate statistical detection in an aggregate multi-feature statistical/probabilistic model; Fig. 3, ST308, “In Step 308, if the classification label of the unclassified data instance is the malicious label, the unclassified client is classified as associated with a botnet (i.e., labeled as malicious)”, c16:10-15)
(Note the 112(b) rejection on claims 1 and 9)

Regarding claim 2, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the clusters are defined based on the autoencoder for each cluster converting each time series to a vector of features for the time series and a clustering algorithm clusters the vectors.
(Ranjan, see comments on claim 1 regarding malicious data instance cluster, non-malicious data instance cluster; “any number of data instances may be included in the training data set (134) and any number of features may be included in the feature vector for each data instance in the training data set (134)”, c10:10-15)

Regarding claim 3, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the set of reconstruction errors for each autoencoder is generated based on the autoencoder processing each time series in a corresponding cluster of time series.
(Ranjan, Fig. 3, ST304, “In Step 304, using a pre-determined machine learning algorithm, a classification model is generated based on the training data set. In particular, when the classification model is applied to each malicious data instance by a classifier, the classifier generates a malicious label. When the classification model is applied to each non-malicious data instance by the classifier, the classifier generates a non-malicious label”; c13:60-end; producing outputs in a classification model using various data instances as inputs is a classification output reconstruction process)

Regarding claim 4, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the clusters are defined based on a random subdivision of the set of time series.
(Ranjan, see comments on claim 1; the labeled malicious data instance cluster and the non-malicious data instance cluster can be aggregated with unlabeled data instances, c11:30-55; e.g., “ For the URL data, half is used for training and the remainder used for testing. The test data is randomly split into five subsets. Each subset is further split into two further subsets of equal size, one as "labeled" (i.e., malicious or non-malicious) data to be used for updating the classification model and the other as "unlabeled" (i.e., unclassified) data”, c21:30-40)

Regarding claim 5, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 4, wherein the set of reconstruction errors for each autoencoder is generated based on the autoencoder processing each of the time series.
(Ranjan, see comments on claim 1; Fig. 3, ST304, “In Step 304, using a pre-determined machine learning algorithm, a classification model is generated based on the training data set. In particular, when the classification model is applied to each malicious data instance by a classifier, the classifier generates a malicious label. When the classification model is applied to each non-malicious data instance by the classifier, the classifier generates a non-malicious label”; c13:60-end; classification outputs are reconstructed in a classification model based on various data instances as inputs)

Regarding claim 6, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein each probabilistic model is a Gaussian model of reconstruction errors for the autoencoder.
(Anderson, “SVMs and Gaussian processes were used for the classification process”, [0092])

Regarding claim 7, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 6. wherein the aggregation of the probabilistic models is a Gaussian mixture model.
(Anderson, “The subroutine label may be modeled using a multiclass Gaussian process”, [0028]; eq. (8); using a multiclass Gaussian in classification is advantageous; e.g., “For the Gaussian process, including the neighbor information pushes the accuracy from 90% to 94%, and the average probability of the true class improves from 0.6552 to 0.7382”, [0078])

Regarding claim 8, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the aggregation of the probabilistic models is a hidden Markov model.
(Anderson, “wherein the subroutines are modeled as a Markov chain with the categories as nodes of a Markov chain graph”, [claim 9])

Regarding claim 10, the combination of Ranjan and Anderson teaches its/their respective base claim(s).
The combination further teaches the non-transitory computer-readable storage medium storing a computer program element comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer system to perform the method as claimed in claim 1.
(Ranjan, Anderson, see comments on claim 1)


Response to Arguments
Applicant's arguments filed on 9/16/2022 with respect to one or more of the pending claims have been fully considered but they are not persuasive.

Regarding claim(s) 1, Applicant argues that the combination of the cited references fails to teach “generating, for each autoencoder, a set of reconstruction errors for each the autoencoder based on testing the autoencoder with data from time windows of at least a subset of the time series from which the respective autoencoder was trained” as recited in claim 1.
The Examiner respectfully disagreed.
Figs. 4A-4B of Ranjan show training and testing a classification model and its classification error rates over time with different percentage of total features. The test data is time series data (Fig. 3, “the malicious data instance and the non-malicious data instance”). For simplicity of discussions, let’s use the data set with the data point of 100% of the features. The test data in Day 1 may be considered as the original training data set that generates a set of original classification errors which are around 0.025% (Fig. 4A). The test data in Day 2 is the same set of data as that of Day 1 (reads on “from time windows of at least a subset of the time series…”). The Day 2 data can be considered as the test data for the same model trained in Day 1 and it generate a set of different classification error rates around 0.02% (Fig. 4B).
Therefore, the combination of the cited references teaches the claimed limitations in question.


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN (JAMES) YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571)272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/JIANXUN YANG/
Primary Examiner, Art Unit 2664				9/27/2022