Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 7/13/2022.
Applicant’s arguments/remarks made in amendment filed 7/13/2022.

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 7/13/2022 has been entered.
 
Claims 3, 6, 8, 11, 14, and 16 are cancelled.
Claims 1, 2, 4, 7, 9, 10, 12, 15, 17, 18, and 20 are amended.
Claims 1, 2, 4, 5, 7, 9, 10, 12, 13, 15, and 17-20 are presented for examination.
Response to Arguments
Applicant presents arguments with regard to patent eligibility and priority.  Each is addressed.
Applicant argues that “Applicant has amended these claims to overcome the rejections under 35 U.S.C. § 112(b).” (Remarks, page 9, paragraph 3.) Examiner agrees. The rejections under 35 U.S.C. § 112(b) have been withdrawn.
Applicant argues that the prior art of record does not disclose the limitations of the amended claims.  The argument is moot in view of new grounds of rejection.  See detailed rejection.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 7, 9, 10, 12, 15, 17, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Huda et al (Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data, herein Huda), and Collell et al (A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data, herein Collell).
Regarding claim 1,
	Huda teaches a method for detecting an anomaly in operation of one or more machines, the method comprising:  (Huda, page 211, paragraph 2, line 1, “In this paper, we propose a semi-supervised approach that automatically integrates the knowledge about unknown malware from already available and cheap unlabeled data into the detection system.”  In other words, approach is method, detection system is detecting, and unknown malware is an anomaly.) 
	obtaining a plurality of unlabeled sensor data samples from one or more sensors associated with the one or more machines; (Huda, page 213, paragraph 5, line 3 “As shown in Fig. 1, traditionally, control networks are composed of programmable logic controllers (PLCs) that receive data from the sensors of the physical devices.”

    PNG
    media_image1.png
    677
    1194
    media_image1.png
    Greyscale

In other words, receive data from the sensors is obtaining a plurality of unlabeled sensor data, physical devices are one or more machines, and from the mapping in the above limitation, unlabeled data is unlabeled data.) 
	training a set of clustering models in parallel using the unlabeled sensor data and user-provided partial label information including a set of normal labels to generate a set of estimated labels; (Huda, Fig. 4: Sand Box block, receives Known Files (Labeled) and Unknown Files (Unlabeled).

    PNG
    media_image2.png
    586
    905
    media_image2.png
    Greyscale

In other words, Training of Classifier block is training, Unsupervised Clustering block is set of clustering models, Known Files are user-provided label information, benign is including a set of normal labels, unsupervised clustering outputs are estimated labels, and sends output to Extract information block is first set of estimated labels.)
	Thus far, Huda does not explicitly teach applying a random sample generator to generate multiple sets of labeled training samples based on the plurality of unlabeled sensor data samples and the set of estimated labels;
	Collell teaches applying a random sample generator to generate multiple sets of labeled training samples based on the plurality of unlabeled sensor data samples and the set of estimated labels (Collell, Algorithm 1, step 1.2, and, page 332, column 2, paragraph 4, line 2 “One of the simplest ways to resample is to sample each instance with equal probability with replacement, i.e., a non-parametric bootstrap, as in the original bagging algorithm [28].”

    PNG
    media_image3.png
    291
    1014
    media_image3.png
    Greyscale

In other words, sample each instance with equal probability with replacement, i.e., a non-parametric bootstrap is applying a random sample generator, and, from Algorithm 1, step 1.2, generate n training data sets is generate multiple sets of labeled training data samples.);	
	Collell teaches training a set of feed-forward neural network (FNN) models in parallel wherein a respective FNN model of the set of trained FNN models is trained using a corresponding set of labeled training samples (Collell, page 332, column 2, paragraph 1, line 2 “Bagging generally performs well with unstable base learners for whom small changes in the training data lead to large changes in the learned model [28]. For example, decision trees (DT) and neural networks (NN) are unstable classifiers and thus suitable for bagging.” And page 332, paragraph 3, line 1, “Furthermore, different aggregation methods can be used to combine the outputs of the base classifiers, e.g., hard-voting to make crisp class assignments or soft-voting for probabilistic predictions.” And,  page 331, column 2, paragraph 2, line 1 “We consider the standard classification setting where a learning algorithm learns from the training data tuples [xi,yi]N i=1, where xi ϵ X are features that can be either continuous, ordinal or categorical and yi ϵ C={1,…,m} are discrete class labels.” In other words, bagging is set of neural networks in parallel, neural networks is feed-forward neural network (FNN) models, training learning algorithm is training set of FFN models, and training data tuples… discrete class labels is trained using a corresponding set of labeled training samples.);
	Collell teaches obtaining, for [an observed sensor (see Huda, page 213, paragraph 5, line 3) page 4 of office action] data sample, a set of predicted labels outputted by the set of trained FNN models, wherein each trained FNN model outputs a predicted label (Collell, Algorithm 1, “Each base classifier i gives a probabilistic estimate…for each label k ϵ {1…m} given a test instance x.” (where m is the number of classes) In other words, test instance x is data sample, n base classifiers is set of trained FNN models, and each base classifier i gives estimate is each trained FNN model outputs a predicted label.);
	Collell teaches computing an average of the set of predicted labels outputted by the set of trained FNN models (Collell, Algorithm 1,  Compute averages of probabilistic predictions for each class 
    PNG
    media_image4.png
    24
    336
    media_image4.png
    Greyscale
 , In other words, compute averages is computing an average, from prior mapping- label is label, n-base classifiers is set of trained FNN models, and, predictions is predictions.); and	
	Collell teaches determining whether an anomaly is present in the operation of the one or more machines based on whether the average of the set of predicted labels is greater than a user-specified threshold. (Collell, algorithm 1, page 330, column 1, paragraph 1, line 1 “Dealing with a class imbalance in classification is an important problem that poses major challenges [1]. Imbalanced data sets frequently appear in real-world problems, such as in fault and anomaly detection [2,3], fraudulent phone call detection [4] and medical decision-making [5], to name a few.” And, page 332, column 2, paragraph 2, line 1”Clearly, different sampling mechanisms (Algorithm 1, step 1.2) and different thresholds (step 2.4) can be used which will yield different models and different outputs.  All the methods tested here use a variation of Algorithm 1, and we will discuss their sampling mechanisms and thresholds in the next section.” Examiner notes, that Collell is explicitly pointing out that the method they develop can be applied to a range of real world problems, for example, such as anomaly detection. In other words, Algorithm 1 can be applied to anomaly detection is determining whether an anomaly is present, and different thresholds (step 2.4) can be used is a user-specified threshold. Examiner notes that the average of the set of predicted labels was previously mapped.)
	Both Collell and Huda are directed to classifying data and detecting anomalies, among other things. Huda teaches a semi-supervised approach that combines knowledge about unknown malware from already available and unlabeled data into a detection system using clustering and support vector machines as classifiers for anomaly detection, but does not explicitly teach a set of feed-forward neural networks as classifiers. Collell teaches using bagging, bootstrap sampling and a set of feed-forward neural networks as classifiers, among other things.  In view of the teaching of Huda, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Collell into Huda. This would result in being able use clustering, bagging, bootstrapping and neural networks as classifiers for detecting anomalies in networked and cyber-physical systems.
	One of ordinary skill in the art would be motivated to do this because combining clustering with bagging classification models such as neural networks helps with class imbalance in classification such as detecting anomalies in settings where the vast majority of data is non-anomalous. (Collell, page 330, column 1, paragraph 1, line 1 “Dealing with a class imbalance in classification is an important problem that poses major challenges [1]. Imbalanced data sets frequently appear in real-world problems, such as in fault an anomaly detection [2,3], fraudulent phone call detection [4] and medical decision -making [5] to name a few.”) 
Regarding claim 2,
	The combination of Huda and Collell teach the method of claim 1,
	further comprising pre-processing the sensor data samples prior to generating the set of estimated labels.  (Huda, page 213, paragraph 3, line, 3 “All the log files from the execution of malware in the database are processed to calculate the global API list and corresponding frequencies.  This is used as the training and test database (malware and benign database).  The database is then used for training and testing the proposed and existing approaches.”  In other words, log files are sensor data samples, and, all the log files… are processed to calculate the global API list and corresponding frequencies is pre-processing the sensor data samples prior to generating the first set of estimated labels.)
Regarding claim 4,
	The combination of Huda and Collell teach the method of claim 1,
	wherein each label in the set of estimated labels is: a normal label; or an abnormal label.  (Huda, Fig. 4: “Known Files (Labeled)”, and page 213, paragraph 1, line 1 “However, existing static and dynamic analyses-based detection engines require a supervised engine with a set of labeled malware and benign files.” In other words, set of labeled malware and benign files is a set of estimated labels, benign is normal, and malware is abnormal.) 
Regarding claim 7,
	The combination of Huda and Collell teaches the method of claim 1,
	wherein applying the random sample generator to generate the multiple sets of labeled training samples further comprises: (Collell, Algorithm 1, step 1.2, and, page 332, column 2, paragraph 4, line 2, See mapping of claim 1.)
	computing a set of weights associated with the set of clustering models by using the user-provided partial label information and the set of estimated labels; (Huda, page 225, paragraph 1, line 1 “The mean of the cluster centers are calculated based on the members’ TF-IDF weights, and the distance of the examples that form the centers are measured using cosine similarity [3].” And, page 223 paragraph 2, line 1 “First, all the data from the training and test sets are merged into one set for unsupervised clustering, whereas the labels of the training data removed.  A Global K-means [28] algorithm is used to cluster the data with cosine similarity [3] distance measure. This clusters the data into three groups (cluster-0, cluster-1, and cluster-2), as present in Fig. 11.”  

    PNG
    media_image5.png
    525
    911
    media_image5.png
    Greyscale

In other words, TF-IDF weights are weights, Global K-means is set of clustering models and labeled data from training and test sets is user provided data and initial set of estimated labels.)
using the set of weights and the set of estimated labels to compute a set of abnormal label probabilities; (Collell, Algorithm 1, step 2.3 “Compute averages of probabilistic predictions for each class.” In other words, weights are implicit required in neural networks, estimated labels was previously mapped, probabilistic is probabilities, and compute averages for each class is compute a set of abnormal label probabilities because the set of abnormal labels is one of the classes.) and
applying a Bernoulli random sample generator to generate the multiple sets of labeled training samples based on the set of abnormal label probabilities (The specification of the instant application recites “Label generator 110, can in addition to building a set of clustering models, estimate a Bernoulli probability for each sample using the initial set of label outputs generated from the clustering models and partial normal labels Yh provided by user input 108.  The Bernoulli probability can be interpreted as a probabilistic soft label of a sample.” Collell, Algorithm 1, and page 332, column 2, paragraph 4, line 1 “In this section, we briefly describe the different resampling mechanisms that can be used in step 1.2 in Algorithm 1.” And page 332, column 2, paragraph 5, line 1 “Commonly used resampling mechanisms for imbalanced data (here called rebalancing methods) try to balance the class proportions. Perhaps the most popular undersampling mechanism used for ensemble learning is referred to as exactly balancing (EB).  EB resampling preserves the minority class instances such that the class proportions are exactly balanced.”  Examiner notes that imbalanced classes, in this case, refers to the situation where the majority of data is normal and a small percentage is abnormal.  Random sampling may underrepresent the class of abnormal samples.  Further, it is trivial to sample entirely from one or the other of the two classes, by a simple adjustment to Algorithm 1.  Examiner is interpreting that “based on the set of abnormal label probabilities” is using a sampling technique such as EB which guarantees that abnormal labels are proportionately represented. In other words, probabilistic is Bernoulli, from Algorithm 1, generate n training data sets by sampling1 S is generate multiple training sets of labeled training samples, and EB sampling is based on the set of abnormal label probabilities.)
Claims 9, 10, and 12 are apparatus claims corresponding to method claims 1, 2, and 4, respectively.  Otherwise, they are the same.  It is implicit that a computer implemented method requires a computer/apparatus in order to execute.  Therefore, claims 9, 10, and 12 are rejected for the same reasons as claims 1, 2, and 4, respectively.
Claim 15 is an apparatus claim corresponding to method claim 7.  Otherwise, they are the same.  Therefore, claim 15 is rejected for the same reasons as claim 7.
Claim 17 is a non-transitory computer-readable storage claim corresponding to method claim 1.  Other than that, they are the same.  It is implicit that a computer implemented method requires one or more non-transitory computer-readable storage devices in order to execute.  Therefore, claim 17 is rejected for the same reasons as claim 1.
Claim 18 is a non-transitory computer-readable storage claim corresponding to the combination of method claim 4.  Otherwise, they are the same.  Therefore, claim 18 is rejected for the same reasons as claim 4.
Claim 20 is a non-transitory computer-readable storage medium claim corresponding to method claim 7.  Otherwise, they are the same. Therefore, claim 20 is rejected for the same reasons as claim 7.
Claims 5, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Huda, Collell, and Gopalakrishnan et al (US 2016/0285700 A1, herein Gopalakrishnan).
Regarding claim 5,
	The combination of Huda and Collell teaches the method of claim 1,
	Thus far, the combination of Huda and Collell does not explicitly teach wherein at least one clustering model in the set of clustering models includes a Gaussian Mixture Model (GMM).  
	Gopalakrishnan teaches wherein at least one clustering model in the set of clustering models includes a Gaussian Mixture Model (GMM) (Gopalakrishnan, FIG. 1, and page 3, column 1, paragraph [0043], line 4 “In an embodiment, the likelihood is computed according to a Gaussian Mixture Model (GMM) model or a Hidden Markov Model (HMM) model built (i.e., parameters learned) from the historical data.  In an embodiment, the mathematical formula for predicting Xt+1 is as follows: 
    PNG
    media_image6.png
    51
    295
    media_image6.png
    Greyscale
 where 
    PNG
    media_image7.png
    20
    41
    media_image7.png
    Greyscale
 is the value predicted using the primary predictor and 
    PNG
    media_image8.png
    23
    24
    media_image8.png
    Greyscale
 is the value predicted using the alternate predictor, which in this embodiment, uses the immediate previous value of X.” and paragraph [0062], line 1 “In the absence of labeled data, multiple cluster-based analytical models can be applied on traffic data to organize it into several groups.”

    PNG
    media_image9.png
    441
    1020
    media_image9.png
    Greyscale

In other words, Gaussian Mixture model as predictor or alternate predictor is Gaussian Mixture model is at least one clustering model, and multiple based clustering models can be used is set of clustering models.)
	Both Gopalakrishnan and the combination of Huda and Collell are directed to adaptive anomaly detection in cyber physical systems, among other things.  The combination of Huda and Collell teach a system and method for unsupervised clustering combined with classifiers such as neural networks for anomaly detection, but does not explicitly teach using a Gaussian Mixture Model for clustering.  Gopalakrishnan teaches a system and method for anomaly detection using a Gaussian Mixture Model for clustering unlabeled data. In view of the teaching of the combination of Huda and Collell, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gopalakrishnan into the combination of Huda and Collell.  This would result in being able to use a Gaussian Mixture Model, as well as other clustering methods, for clustering unlabeled data for use in anomaly detection.
	One of ordinary skill in the art would be motivated to do this because detecting patterns in unlabeled data is important for exploiting historical data for the purpose of detecting future anomalies. (Gopalakrishnan, paragraph [0003], line 3 “Algorithms based on machine learning principles are capable of powerful pattern recognition and are therefore desirable as they can automatically uncover and exploit the structure within the historical data to characterize the nature of traffic behaviors and predict future performance (KPIs, traffic etc.) given the past and present.”)
Claim 13 is an apparatus claim corresponding to method claim 5.  Otherwise, they are the same.  Therefore, claim 13 is rejected for the same reasons as claim 5.
Claim 19 is a non-transitory computer-readable storage medium claim corresponding to method claim 5.  Otherwise, they are the same.  Therefore, claim 19 is rejected for the same reasons as claim 5.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124