Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
DETAILED ACTION

Applicant elected Group I, in response to the Restriction Requirement set forth in the Office Action mailed January 27, 2021, comprising claims 1-19 (filed on 03/29/2021), without traverse.  Claims 60-61 are new.  Non-elected claims 20-59 are canceled from further consideration. Claims 1-19 and 60-61 are pending.

This action is response to the application filed on March 29, 2021.

Claims 20-59 are canceled. Claims 60-61 are new. Claims 1-19 and 60-61 are pending.


Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
 A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims1-6, 11-19 and 60-61 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Honig et al (U.S. Pub. No. 8,887,281). 

With respect to claims 1 60 and 61, Honig et al teaches 
obtaining a data set comprising plurality of data samples, each of the plurality of data samples associated with respective values for a set of features (abstract,  figure. 1, configured to gather information regarding the operation of the computer system, to format the information in a data record, and to transmit the data record.  A database is configured to receive the data record from the sensor and to store the data record); identifying a respective data type of each of the features (col. 2, lines 8-30, Data mining IDSs collect data from sensors which monitor some aspect of a system.  Sensors may monitor network activity, system calls used by user processes, or file system access.  They extract predictive features from the raw data stream being monitored to produce formatted data that can be used for detection); 
automatically generating an anomaly detection blueprint based on the respective data types of one or more of the features, the anomaly detection blueprint comprising a 
executing the machine-executable module, thereby performing the anomaly detection procedure, wherein performing the anomaly detection procedure includes identifying a subset of the plurality of data samples as a set of anomalous data samples (col. 5, lines 26-46, The detection model generator may be configured to generate a set of support vectors.  The detector may be configured to map a data record to the feature space and determine the location of the data record in the feature space with respect to the decision boundary).

With respect to claim 2, Honig et al teaches identified data type of the particular feature is a numerical data type, and wherein the anomaly detection procedure corresponding to the anomaly detection blueprint does not include a task of performing normalization, standardization, or ridit transformation of the respective values of the plurality of data samples for the particular feature having the numerical data type    (col. 5, lines 46-59, data analysis engine is a feature extractor configured to extract a feature 

 With respect to claim 3, Honig et al teaches features include a particular feature, the data type of the particular feature being a numerical data type, the plurality of data samples include one or more first data samples and one or more second data samples, wherein the respective value of the particular feature for each of the first data samples is missing and wherein the respective value of the particular feature for each of the second data samples is non-missing, and the tasks of the anomaly detection procedure corresponding to the anomaly detection blueprint include a missing value imputation task comprising replacing the respective missing value of the particular feature of each of the first data samples with a median of the non-missing values of the particular feature for the second data samples (col. 14, lines 13-43, The model specifies information for evaluating the model which follows the version information.  The exemplary algorithm requires information and statistics about each feature in the data, and the values observed for that feature).
 
With respect to claim 4, Honig et al teaches respective value for the particular feature is missing, and the tasks of the anomaly detection procedure corresponding to the anomaly detection blueprint include a feature engineering task comprising: adding a new feature to the set of features and determining the respective value of the new feature for each of the plurality of data samples, the respective value of the new feature for each of the plurality of data samples indicating whether the respective data sample 
 
With respect to claim 5, Honig et al teaches obtaining a respective anomaly score for each of the plurality of data samples, the respective anomaly score for each data sample indicating a predicted extent to which the data sample is anomalous; and identifying, based on the anomaly scores, the set of anomalous data samples from the plurality of data samples; and the method further includes: determining a correlation between the respective anomaly score or the respective anomaly classification and the respective value of the label for each of the plurality of data samples; responsive to the correlation being less than a threshold correlation, removing the set of anomalous data samples from the plurality of data samples; and otherwise, responsive to the correlation being at least the threshold correlation, retaining the set of anomalous data samples in the plurality of data samples (fig. 1, col. 7, lines 49-64, types of data have fundamentally different properties.  In addition, detection models can also vary greatly.  The challenge in automating these processes is the need to support these different types of data and different types of detection models.  The methods for building these detection models as well as executing them in real time vary greatly for each type of detection model).
 
With respect to claim 6, Honig et al teaches plurality of data samples, replacing the respective value of the particular feature having the categorical data type with a 

 With respect to claim 11, Honig et al teaches particular features having the free text data type: identifying a plurality of terms that occur most frequently within a combined free text corpus comprising the values for the respective particular feature for the plurality of data samples; and generating a sample-term matrix, wherein each row of the sample-term matrix corresponds to a respective data sample in the plurality of data samples, wherein each column of the sample-term matrix corresponds to a respective term in the plurality of terms that occur most frequently, and wherein each element of the sample-term matrix indicates whether the term corresponding to the column of the element occurs in the data sample corresponding to the row of the element, within the values of the respective particular feature (col. 5, lines 26-46, The detection model generator may be configured to generate a set of support vectors.  The detector may be configured to map a data record to the feature space and determine the location of the data record in the feature space with respect to the decision boundary).

With respect to claim 12, Honig et al teaches  quantity of columns in the compact matrix is less than a quantity of columns in the sample-term matrix, and wherein each row of the compact matrix corresponds to a respective data sample in the plurality of 

With respect to claim 13, Honig et al teaches set of anomalous data samples is identified using an anomaly detection process, and wherein the anomaly detection process is selected from a group of anomaly selection processes based, at least in part, on a number of data samples in the data set and/or on a storage size of the data set   (col. 5, lines 26-46, feature extractor configured to extract a feature from a single data record or a plurality data records.  This data analysis engine may be configured to append the feature data to the data records in the database).

With respect to claim 14, Honig et al teaches storage size of the data set is less than a storage size threshold, and wherein the group of anomaly selection processes consists of an isolation forest process, a double median absolute deviance (MAD) process, a one class support vector machine (SVM) process, a local outlier factor (LOF) 

 With respect to claim 15, Honig et al teaches number of data samples in the data set is greater than a first sample number threshold and less than a second sample number threshold, wherein the storage size of the data set is less than a storage size threshold, and wherein the group of anomaly selection processes consists of an isolation forest process, a double median absolute deviance (MAD) process, and a Mahalanobis distance process (col. 5, lines 26-46, feature extractor configured to extract a feature from a single data record or a plurality data records).

With respect to claim 16, Honig et al teaches   the number of data samples in the data set is greater than a first sample number threshold and a second sample number threshold, or (2) the storage size of the data set is greater than a storage size threshold, and wherein the group of anomaly selection processes consists of a double median absolute deviance (MAD) process and a Mahalanobis distance process (section 28-30).

With respect to claim 17, Honig et al teaches respective anomaly score indicating an extent to which the respective data sample is anomalous; adding the anomaly scores to the data set as respective values of a label of the plurality of data samples, thereby generating a labeled data set; and applying a supervised anomaly detection model to the labeled data set to identify the set of anomalous data samples (col. 5, lines 26-46, 

With respect to claim 18, Honig et al teaches determining, by an unsupervised anomaly detection process, for each of the plurality of data samples, a respective anomaly score indicating an extent to which the respective data sample is anomalous, and wherein the set of anomalous data samples comprises a fraction of the plurality of data samples having greatest anomaly scores  (col. 5, lines 26-46, feature extractor configured to extract a feature from a single data record or a plurality data records.  This data analysis engine may be configured to append the feature data to the data records in the database).

With respect to claim 19, Honig et al teaches assigning a respective value of a label to each of the plurality of data samples based on the identified set of anomalous data samples, the respective value of the label assigned to each data sample indicating whether the respective data sample is anomalous; and using the labeled data samples as training data to train a supervised anomaly detection model to infer whether data samples are anomalous based on the values of the features associated with the data samples    (col. 5, lines 26-59, feature extractor configured to extract a feature from a single data record or a plurality data records.  This data analysis engine may be configured to append the feature data to the data records in the database).


Allowable Subject Matter

Claims 7-10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISAAC M WOO whose telephone number is (571)272-4043.  The examiner can normally be reached on 9:00 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ISAAC M WOO/Primary Examiner, Art Unit 2163