DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
The amendment filed on 1/12/2021 has been entered. Claims 1 and 10 stand amended. Claim 3 stands cancelled. New claim 20 is added. Claims 1, 2, and 4-20 are currently pending.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 2, 4-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Cheng et al. in US Patent Application Publication № 2009/0319457, hereinafter called Cheng, in combination with Statnikov et al. in US Patent № 7,912,698, hereinafter called Statnikov.



In regard to claim 1, Cheng teaches a method comprising:
receiving a set of data for which one or more labels are identifiable, the set of data including majority class samples and minority class samples (“If traditional
learning methods are directly applied on skewed data, they tend to be biased towards the majority class and ignore the minority class, since the goal of such methods is to minimize the error rate” paragraph 0005);
performing a plurality of classification processes on the minority class samples of the set of data to automatically identify, by each classification process, a label group (i.e. class) in the minority class samples (“Cascaded Feature Ensemble Module 104: To handle the partial feature coverage problem, feature selection (2.1) is iteratively applied to find multiple disjoint feature sets to represent the data in different features spaces. Multiple classifiers are constructed (2.2) using Module 102, based on different feature sets and then a voting scheme is defined (2.3) that computes the consensus among the learned classifiers. That is, for a given structured input data, each classifier will predict a class. Since there are multiple classifiers, the assigned class for this data is the class predicted by the majority of classifiers.” Paragraph 0020);
voting, by each classification process, to determine a selected label group for the minority class samples (“…and then a voting scheme is defined (2.3) that computes the consensus among the learned classifiers.” Paragraph 0020);
clustering the selected group of minority class samples to generate a clustered minority dataset (i.e. into subgraphs, “More specifically, for a training set, multiple disjoint subsets of frequent subgraphs are progressively selected.” Paragraph 0035);
generating a dedicated machine learning classifier using undersampling of the majority class samples and the clustered minority dataset (“Balanced Data Ensemble Module 106: Given a set of graph data with skewed prior class distribution, a sampling technique (without replacement) is used (3.1) to draw repeated samples of the positive class and under-samples of the negative class to achieve a balanced class distribution.” Paragraph 0021); and generating a curated labeled dataset, the curated labeled dataset including the  second clustered minority data set and the selected second label group (“Step 3. All of the classifiers learned in multiple Cascaded Feature Ensembles are collected. For a given structured input data set, the assigned class for this data is the class predicted by the majority of classifiers.” Paragraph 0026).
However, although Cheng teaches inserting the diverse resistance classifier into the plurality of classification processes and iterating the performing of the plurality of classification processes on the minority class samples (“Then, the selected feature set F, is removed from F and the same process is repeated on the remaining set of features, until kf features sets are selected with kf corresponding classifiers constructed.” Paragraph 0036), the voting, by each classification process, to determine a selected label group for the minority class samples (“Given a test example x, each f models,” paragraph 0036), he fails to explicitly teach performing a plurality of classifications processes including using the dedicated machine learning classifier on the minority class samples of the set of data to automatically identify, by each classification process, a second label group in the minority class samples, voting to determine a selected second label group of the minority class samples and clustering the selected second group of minority class samples to generate a second clustered minority dataset.
Statnikov teaches performing a plurality of classifications processes including using the dedicated machine learning classifier on the minority class samples of the set of data to automatically identify, by each classification process, a second label group in the minority class samples, voting to determine a selected second label group of the minority class samples and clustering the selected second group of minority class samples to generate a second clustered minority dataset (“As seen in Iteration 1 220, a first data subset D1 221 is used as the testing set and a second D2 222, a third D3 223, a fourth D4 227 and a fifth D5 229 data subsets are used as the training set 210. Each subsequent iteration uses a different subset (e.g., Iteration 2 230 uses D2 , Iteration 3 240 uses D3 , Iteration 4 uses D4 and Iteration 5 uses D5) as the testing set and the
remaining data subsets as the training sets.” Column 6 line 16).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the data labeling and learning system taught by Cheng to include multiple iterations over the remaining subsets of data, as taught by Statnikov. It would have been obvious because it represents the application of a known technique (i.e. the iterative training with different subsets, as taught by Statnikov) to a known method (i.e. the classification method taught by Cheng) ready for improvement to yield only predictable results (i.e. the specific training method taught by Cheng will be iterated over different subsets of the data).
In regard to claim 10, it is substantially similar to claim 1 and accordingly is rejected under similar reasoning.

In regard to claim 2, Cheng further teaches that the machine learning classifier further comprises a diverse resistance classifier (i.e. uses a plurality of classifiers, i.e. cascaded feature ensembles, “Step 2. For each balanced sample data subset, using Cascaded Feature Ensembles (204-1 through 204-m) to build multiple classifiers” paragraph 0025).
In regard to claim 11, it is substantially similar to claim 2 and accordingly is rejected under similar reasoning.

In regard to claim12, Cheng further teaches that comprising inserting the diverse resistance classifier into the plurality of classification processes and iterating the performing of the plurality of classification processes on the minority class samples (“Then, the selected feature set F, is removed from F and the same process is repeated f models,” paragraph 0036).

In regard to claim 5, Cheng further teaches that voting to determine a selected label group further comprising assigning an equal weight to each of the plurality of classifiers (the combination function shown in paragraph 0038 shows equal weighting for each element in the summation).
In regard to claim 15, it is substantially similar to claim 5 and accordingly is rejected under similar reasoning.

In regard to claim 7, Cheng further teaches that transforming the minority class samples into a format for each of the plurality of classifiers (“Given a set of frequent subgraphs F, feature selection is applied to get a subset of features Fi then data is transformed into this feature space and a classifier is built on top of it.” Paragraph 0036).
In regard to claim 17, it is substantially similar to claim 7 and accordingly is rejected under similar reasoning.

In regard to claim 20, Cheng teaches a method comprising:
receiving a set of data for which one or more labels are identifiable, the set of data including majority class samples and minority class samples (“If traditional
learning methods are directly applied on skewed data, they tend to be biased towards the majority class and ignore the minority class, since the goal of such methods is to minimize the error rate” paragraph 0005);
performing a bootstrap run that generates a clustered minority dataset from the received minority class samples (“Cascaded Feature Ensemble Module 104: To handle the partial feature coverage problem, feature selection (2.1) is iteratively applied to find multiple disjoint feature sets to represent the data in different features spaces. Multiple classifiers are constructed (2.2) using Module 102, based on different feature sets and then a voting scheme is defined (2.3) that computes the consensus among the learned classifiers. That is, for a given structured input data, each classifier will predict a class. Since there are multiple classifiers, the assigned class for this data is the class predicted by the majority of classifiers.” Paragraph 0020); 
generating a dedicated machine learning classifier using undersampling of the majority class samples and the clustered minority dataset (“Balanced Data Ensemble Module 106: Given a set of graph data with skewed prior class distribution, a sampling technique (without replacement) is used (3.1) to draw repeated samples of the positive class and under-samples of the negative class to achieve a balanced class distribution.” Paragraph 0021)
generating a curated labeled dataset, the curated labeled dataset including the iterative clustered minority dataset and the selected iterative label group (“Step 3. All of the classifiers learned in multiple Cascaded Feature Ensembles are collected. For a given structured input data set, the assigned class for this data is the class predicted by the majority of classifiers.” Paragraph 0026).
However, although Cheng teaches inserting the diverse resistance classifier into the plurality of classification processes and iterating the performing of the plurality of classification processes on the minority class samples (“Then, the selected feature set F, is removed from F and the same process is repeated on the remaining set of features, until kf features sets are selected with kf corresponding classifiers constructed.” Paragraph 0036), the voting, by each classification process, to determine a selected label group for the minority class samples (“Given a test example x, each classifier C, outputs an estimated posterior probability f'(x).” paragraph 0036) and the clustering the selected group of minority class samples to generate a clustered minority dataset to generate the curated data set (“classifier C, outputs an estimated posterior probability f'(x). The final prediction is derived by combining probability outputs from kf
Statnikov teaches performing a plurality of classifications processes including using the dedicated machine learning classifier on the minority class samples of the set of data to automatically identify, by each classification process, a second label group in the minority class samples, voting to determine a selected second label group of the minority class samples and clustering the selected second group of minority class samples to generate a second clustered minority dataset (“As seen in Iteration 1 220, a first data subset D1 221 is used as the testing set and a second D2 222, a third D3 223, a fourth D4 227 and a fifth D5 229 data subsets are used as the training set 210. Each subsequent iteration uses a different subset (e.g., Iteration 2 230 uses D2 , Iteration 3 240 uses D3 , Iteration 4 uses D4 and Iteration 5 uses D5) as the testing set and the
remaining data subsets as the training sets.” Column 6 line 16).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the data labeling and learning system taught by Cheng to include multiple iterations over the remaining subsets of data, as taught by Statnikov. It would have been obvious because it represents the application of a known technique (i.e. the iterative training with different subsets, as taught by Statnikov) to a known method (i.e. the classification method taught by Cheng) ready for improvement to yield only predictable results (i.e. the specific training method taught by Cheng will be iterated over different subsets of the data).

Claims 4, 6, 8, 13, 14, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cheng and Statnikov as applied to claim 1, 5, 10 or 14 above, as .

In regard to claim 4, Cheng and Statnikov teaches the method of claim 1, as above.
However, Cheng fails to explicitly teach that the plurality of classifiers further comprises a NIDS alert classifier, a killchain classifier and a SIEM classifier.
Honig teaches that the plurality of classifiers further comprises a NIDS alert classifier (“An IDS detects intrusions by monitoring a network or system and analyzing an audit stream collected from the network or system to look for clues of malicious behavior,” column 1 line 54), a killchain classifier (i.e. a classifier that detects sequences of actions that appear malicious, “Frequently, records by themselves are not meaningful, but in combination with other records they could represent an attack. The data warehouse 14 has the capability to provide the feature extractor 28 with any subset of data necessary. This could be 45 the past n records for use with algorithms based on sequences, or those that compute temporal statistical features of connections or sessions. The flexibility of this system allows any group of record to be used to create a feature.” Column 17 line 41) and a SIEM classifier (i.e. classifier which identifies malicious software based on rules, “Many widely used and commercially available IDSs are signature-based systems. As is known in the art, a signature based system matches features observed from the audit stream to a set of signatures hand crafted by experts and stored in a signature database.” Column 1 line 8).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the multi-classifier machine learning system to include the learning of the malicious activity detection classifiers as taught by Honig. It would have been obvious because it represents the use of known techniques (i.e. the malicious activity detection classifier techniques as taught by Honig) to a known system (i.e. the multiple classifier learning system taught by Cheng) ready for improvement to yield predictable results (i.e. the multiple classifiers will use the malicious activity detection techniques).

In regard to claim 13, Cheng and Statnikov teaches the method of claim 10, as above. However, Cheng fails to explicitly teach that the first classifier is a NIDS alert classifier and the second classifier is a killchain classifier.
Honig teaches that the first classifier is a NIDS alert classifier (“An IDS detects intrusions by monitoring a network or system and analyzing an audit stream collected from the network or system to look for clues of malicious behavior,” column 1 line 54) and the second classifier is a killchain classifier (i.e. a classifier that detects sequences of actions that appear malicious, “Frequently, records by themselves are not meaningful, but in combination with other records they could represent an attack. The data warehouse 14 has the capability to provide the feature extractor 28 with any subset of data necessary. This could be 45 the past n records for use with algorithms based on sequences, or those that compute temporal statistical features of connections or sessions. The flexibility of this system allows any group of record to be used to create a feature.” Column 17 line 41).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the multi-classifier machine learning system to include the learning of the malicious activity detection classifiers as taught by Honig. It would have been obvious because it represents the use of known techniques (i.e. the malicious activity detection classifier techniques as taught by Honig) to a known system (i.e. the multiple classifier learning system taught by Cheng) ready for improvement to yield predictable results (i.e. the multiple classifiers will use the malicious activity detection techniques).

In regard to claim 14, Honig further teaches a SIEM classifier that processes the minority class samples in the set of data to automatically identify a label group in the set of data and votes on the selected label group (i.e. classifier which identifies malicious software based on rules, “Many widely used and commercially available IDSs are signature-based systems. As is known in the art, a signature based system matches features observed from the audit stream to a set of signatures hand crafted by experts and stored in a signature database.” Column 1 line 8)..

In regard to claim 6, Cheng and Statnikov teaches the method of claim 5, as above. However, Cheng fails to explicitly teach adjusting a weight of one or more of the classifiers.
Honig teaches adjusting a weight of one or more of the classifiers (“A model, in this case, is the set of support vectors and their weights.” Column 23 line 27).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the multi-classifier machine learning system taught by Cheng to include the 

In regard to claim 8, Cheng and Statnikov teaches the method of claim 1, as above. However, Cheng fails to explicitly teach the data set further comprises malware data.
Honig teaches that the data set further comprises malware data (“An IDS detects intrusions by monitoring a network or system and analyzing an audit stream collected from the network or system to look for clues of malicious behavior.” Column 1 line 54).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the multi-classifier machine learning system to include the learning of the malicious activity detection classifiers as taught by Honig. It would have been obvious because it represents the substitution of one known element (i.e. the malicious activity data set taught by Honig) for another (i.e. the data set taught by Cheng) to obtain predictable results (i.e. the classification will be made of malware data). 
In regard to claim 18, it is substantially similar to claim 8 and accordingly is rejected under similar reasoning.

Claim 9 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cheng and  Statnikov as applied to claim 1 or 10 above, as applicable, and further in view of Reddy et al. in US Patent Application Publication № 2019/0305957, hereinafter called Reddy.

In regard to claim 9, Cheng and Statnikov teaches the method of claim 1, as above. However, Cheng fails to explicitly teach that clustering the selected group of minority class samples further comprises using a DB SCAN clustering method.
Honig teaches that clustering the selected group of minority class samples further comprises using a DB SCAN clustering method (“Examples include trained decision trees or classification trees, neural classification networks, support vector machines, and clustering algorithms like DB-SCAN trained to cluster historical examples into trustworthy and untrustworthy clusters” paragraph 0128).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify the multi-classifier machine learning system taught by Cheng to cluster minority class samples using the DB SCAN method, as taught by Reddy. It would have been obvious because it represents the substitution of one known element (i.e. DB Scan clustering taught by Reddy) for another (i.e. the clustering taught by Cheng) to obtain predictable results (i.e. the minority data will be clustered using a DB SCAN method). 
In regard to claim 19, it is substantially similar to claim 9 and accordingly is rejected under similar reasoning.


Response to Arguments
Applicant’s arguments, see pages 6-8, filed 1/12/2021, with respect to the rejection(s) of claim(s) 1-3,5,7,10-12, and 17 under 35 USC 102 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Cheng and Statnikov. For more information, please refer to the relevant sections above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ARTHUR GANGER whose telephone number is (571)272-0270.  The examiner can normally be reached on 10:00 AM - 7:30 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Robert Beausoliel can be reached on (571) 272-3645.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ROBERT W BEAUSOLIEL JR/           Supervisory Patent Examiner, Art Unit 2167