Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1, 19, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites “features and the model scores obtained from the first machine learning model”. It is unclear if both features and the model scores are obtained from the first machine learning model, or only the model scores are obtained from the first machine learning model, as the 2nd line of the claim recites “obtaining model scores from a first machine learning model” only. 
For purpose of examination that claim is being interpreted as: “based on features not necessarily obtained from the first machine learning model and the model scores obtained from the first machine learning model”
Claim 2-18 depend on the independent claim 1, therefore inherit the same deficiency. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding claim 1, 
Step 1: The claim recites a method, therefore, it falls into the statutory category of a method.
2A Prong 1: The limitation of to learn how to differentiate between two groups (abstract idea) based on at least one of: features and the model scores obtained from the first machine learning model is a mental process, as the limitation merely recites a process of differentiate two groups based on their features and scores, which can be done in human mind. 
The limitation of based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records is a mental process, as it merely recites determining how each data contributes to the data records based on the ranking scores, which can be done in human mind, or with the aid of pen and paper. 
2A Prong 2: This judicial exception is not integrated into a practical application. Machine learning models are field of use or technological environment (MPEP 2106.05(h)). The limitation of obtaining model scores from a first machine learning model is insignificant extra-solution activity. The limitation of training a second machine learning model amounts to well-understood, routine, and conventional activity (Iskandar, 0045). The limitation of applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group are amount to well-understood, routine, and conventional activity (Iskandar, 0045). Machine learning models are field of use or technological environment (MPEP 2106.05(h)).
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of obtaining model scores from a first machine learning model amounts to mere data gathering (MPEP 2106.05(g)). The limitation of training a second machine learning model amounts to well-understood, routine, and conventional activity (Iskandar, 0045). The limitation of applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group are amount to well-understood, routine, and conventional activity (Iskandar, 0045). Machine learning models are field of use or technological environment (MPEP 2106.05(h)). 

	Claim 19 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as the claim 1.

Claim 20 is a non-transitory computer readable storage medium claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as the claim 1.

Regarding claim 2, the limitation of wherein at least a portion of the features that are available for the data records of both groups are used is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 3, the limitation of wherein the first group includes data records in a target window and the second group includes data records in a reference window is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 4, the limitation of further comprising removing index-correlated features prior to training the second machine learning model is insignificant extra-solution activity, as it merely recites removing unwanted feature from the data which is a pre-solution activity (MPEP 2106.05(g)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 5, the limitation of further comprising removing time-correlated features prior to training the second machine learning model is insignificant extra-solution activity, as it merely recites removing unwanted feature from the data which is a pre-solution activity (MPEP 2106.05(g)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 6, 
2A Prong 1: The limitation of calculating corresponding values of a measure of correlation for each shuffle is a mathematical concept. The limitation of selecting a maximum observed value among the shuffles to be a threshold is a mental process. The limitation of determining a value for the measure of correlation without shuffling is a mental process. 
2A Prong 2: This judicial exception is not integrated into a practical application. The limitation of obtaining a data series associated with a distribution of values that generated the data records is an insignificant extra-solution activity. The limitation of wherein removing time-correlated features is also insignificant extra-solution activity, as it merely recites removing unnecessary data which is a pre-solution activity. The limitation of removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold is insignificant extra-solution activity, as it recites a process of removing a feature without shuffling specific features which is a pre-solution activity. The limitation of shuffling the data series randomly a predetermined number of times is insignificant extra-solution activity. 
2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The limitation of obtaining a data series associated with a distribution of values that generated the data records is an insignificant extra-solution activity, because it directs to a mere data gathering (MPEP 2106.05(g)). The limitation of wherein removing time-correlated features amounts to selecting a particular data source or type of data to be manipulated (MPEP 2106.05(g)). The limitation of removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold is insignificant extra-solution activity, as it recites a process of removing a feature without shuffling specific features which amounts to selecting a particular data source or type of data to be manipulated (MPEP 2106.05(g)). The limitation of shuffling the data series randomly a predetermined number of times recites selecting a particular data source or type of data to be manipulated (MPEP 2106.05(g)). The limitation of prior to training the second machine learning model is merely a field of use or technological environment (MPEP 2106.05(h)).

Regarding claim 7, the limitation of wherein the measure of correlation is sensitive to non-linear relations is a mathematical concept, as it recites giving higher weight to the non-linear relations compared to other types of relations.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 8, the limitation of wherein the measure of correlation includes a Maximal Information Coefficient (MIC) is a mathematical concept, as it recites using a mathematical function to measure correlations.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 9, the limitation of wherein shuffling the data series randomly a predetermined number of times includes choosing the predetermined number of times to ensure a statistical confidence above a threshold is insignificant extra-solution activity.
The limitation of wherein shuffling the data series randomly a predetermined number of times includes choosing the predetermined number of times to ensure a statistical confidence above a threshold merely recites shuffling the data randomly and choosing the data predetermined times which is selecting a particular data source or type of data to be manipulated (MPEP 2106.05(g)). 
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 10, the limitation of wherein the second machine learning model includes a measure of feature importance for correlated features is a mathematical concept, as it recites measuring feature importance which requires a mathematical function.
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 11, the limitation of wherein the second machine learning model is a Gradient Boosted Decision Trees (GBDT) model is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 12, the limitation of further comprising outputting an explanation report is insignificant extra-solution activity. The limitation of further comprising outputting an explanation report amounts to necessary data gathering and outputting (MPEP 2106.05(g)). The limitation of in response to an anomaly in data records of at least one of the first group and the second group is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 13, the limitation of further comprising outputting an explanation report is insignificant extra-solution activity. The limitation of further comprising outputting an explanation report amounts to data gathering and outputting (MPEP 2106.05(g)). The limitation of including window start and end timestamps is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 14, the limitation of further comprising outputting an explanation report is insignificant extra-solution activity. The limitation of further comprising outputting an explanation report amounts to data gathering and outputting (MPEP 2106.05(g)). The limitation of including a feature importance ranking list based at least in part on the ranking scores is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 15, the limitation of further comprising outputting an explanation report is insignificant extra-solution activity. The limitation of further comprising outputting an explanation report amounts to data gathering and outputting (MPEP 2106.05(g)). The limitation of including a list of a predetermined number of top data records is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 16, the limitation of wherein the list of a predetermined number of top data records includes feature values used by the second machine learning model is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 17, the limitation of further comprising outputting an explanation report insignificant extra-solution activity. The limitation of further comprising outputting an explanation report amounts to data gathering and outputting (MPEP 2106.05(g)). The limitation of including a validation curve to show how well a ranking of the data records can lower a monitoring value is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Regarding claim 18, the limitation of wherein the validation curve includes values using a target window with a predetermined number of top events removed is merely a field of use or technological environment (MPEP 2106.05(h)).
This judicial exception is not integrated into a practical application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 10, 12, 14-16, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1).

Regarding claim 1, Cantrell teaches a method comprising: obtaining model scores from a first machine learning model ([Cantrell, column 1, line 42-48] “In general, such an anomaly detection model may comprise a model object that is used to score multivariate observation data originating from a given data source and a set of anomaly thresholds that are used to evaluate the scored observation data for purposes of determining whether anomalies exist in the multivariate observation data.” The score extraction process corresponds to the obtaining model scores from a model); 
training a second machine learning model to learn how to differentiate between two groups based on at least one of: features and the model scores obtained from the first machine learning model ([Cantrell, column 28, line 56 - column 29, line 6] “As shown in FIG. 6, the disclosed process may involve: (1) at block 602, using a first set of training data originating from the given data source to extract a model object for an anomaly detection model; (2) at block 604, using a second set of training data originating from the given data source and the extracted model object to establish starting values for a set of anomaly thresholds that includes at least one respective anomaly threshold for each of a given set of variables included in the multivariate data originating from the given data source; (3) at block 606, optionally using the starting values for the set of anomaly thresholds and a set of test data originating from the given data source to differentiate between the variables included within the given set of variables that are unlikely to have anomalous values (which may be referred to herein as “non-anomalous variables”) and the remaining variables in the given set of variables that may have anomalous values (which may be referred to herein as “anomalous variables”)” ); 
However, Cantrell failed to teach applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group, and based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records.
Iskandar teaches applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group ([Iskandar, 0029] “Multivariate analysis code 150 could include a multivariate model 152 … Examples of multivariate models that could be stored in multivariate analysis code 150 could include models developed using Support Vector Machine Regression, Naïve Bayes Regression, and Logistic Regression.” Support Vector Machine Regression, Naïve Bayes Regression, and Logistic Regression are machine learning model.
[Iskandar, 0045] “At block 312, computing system 110 could be employed to determine one or more process data 140 causing a variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 of the multivariate model output data 160. The variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 can be determined by generating a ranking 181, of one or more process data 140 contributing most to a difference between the two ranges 161, 162. Computing system 110 could display a ranking 181 in table 183 with relative contributions 185 on first view 210 of user interface 120.” The multivariable model is interpreted as a machine learning model. The first range of the data corresponds to the first group, and the second range corresponds to the second group); and 
based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records ([Iskandar, 0045] “At block 312, computing system 110 could be employed to determine one or more process data 140 causing a variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 of the multivariate model output data 160. The variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 can be determined by generating a ranking 181, of one or more process data 140 contributing most to a difference between the two ranges 161, 162. Computing system 110 could display a ranking 181 in table 183 with relative contributions 185 on first view 210 of user interface 120.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell and Iskandar to use the method of determine a ranking score and apply a machine learning model to a group of data to the differentiation between two groups of data, as taught by Iskandar, to implement the machine learning method, as taught by Cantrell. The suggestion and/or motivation to do so is to improve the performance of the system, as the method has an ability to quickly determine the main causes of variability between two data (Iskandar, 0049).

Claim 19 is a system claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as the claim 1.

Claim 20 is a non-transitory computer readable storage medium claim having similar limitation to the method claim 1. Therefore, it is rejected with the same rationale as the claim 1.

Regarding claim 2, Cantrell in view of Iskandar teaches wherein at least a portion of the features that are available for the data records of both groups are used ([Iskandar, 0045] “At block 312, computing system 110 could be employed to determine one or more process data 140 causing a variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 of the multivariate model output data 160. The variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 can be determined by generating a ranking 181, of one or more process data 140 contributing most to a difference between the two ranges 161, 162. Computing system 110 could display a ranking 181 in table 183 with relative contributions 185 on first view 210 of user interface 120.” The multivariable model is interpreted as a machine learning model. The first range of the data corresponds to the first group, and the second range corresponds to the second group).

Regarding claim 3, Cantrell in view of Iskandar teaches wherein the first group includes data records in a target window and the second group includes data records in a reference window ([Cantrell, column 34, line 54-67] “Indeed, as shown in FIG. 8B, the tightened values 804A, 804B of the given variable's upper and lower anomaly thresholds resulting from a multiplier value of 0.5 may lead to the detection of 2 consecutive exceedances at relative capture times of 0.25 and 0.5, but no other exceedances. Thus, if asset data platform 102 is employing an example sliding window approach where a univariate anomaly is detected whenever there are 2 univariate exceedances within a window of 3 consecutive data points, asset data platform 102 may determine that the tightened values 804A, 804B of the given variable's upper and lower anomaly thresholds lead to the detection of 1 window amounting to anomalies out of a total of 20 windows evaluated, for an anomaly percentage of 5%.” Sliding window where the anomaly is detected corresponds to the target window, and others corresponds to the reference window).

Regarding claim 10, Cantrell in view of Iskandar teaches wherein the second machine learning model includes a measure of feature importance for correlated features ([Iskandar, 0009] “Fully automated data analysis systems cannot always be developed to detect every potential fault or issue that could occur on a piece of equipment. Similarly, strictly manually directed techniques for data analysis, such as computer routines for summarizing data, are too cumbersome to be cost effective in large data analysis problems. The shortfalls of fully automated data analysis systems and strictly manual techniques are especially true in attempts to determine what data or parameters are important in a correlation analysis exercise”).

Regarding claim 12, Cantrell in view of Iskandar teaches further comprising outputting an explanation report in response to an anomaly in data records of at least one of the first group and the second group ([Cantrell, Abstract] “A computing system may create an anomaly detection model to detect anomalies in multivariate data originating from a given data source by extracting a model object for the anomaly detection model using a first set of training data originating from the given data source, establishing starting values of a set of anomaly thresholds for the anomaly detection model using the extracted model object and a second set of training data originating from the given data source, and refining the starting values of the set of anomaly thresholds for at least a subset of the variables included in the multivariate data using the extracted model object and a set of test data.” The paragraph teaches the first and second dataset and anomaly in data records.
[Cantrell, column 22, line 12-23] “In addition to the aforementioned components, an asset may also be equipped with a set of on-board components that enable the asset to capture and report operating data. To illustrate, FIG. 4 is simplified block diagram showing some on-board components for capturing and reporting operating data that may be included within or otherwise affixed to an example asset 400. As shown, these on-board components may include sensors 402, a processor 404, data storage 406, a communication interface 408, and perhaps also a local analytics device 410, all of which may be communicatively coupled by a communication link 412 that may take the form of a system bus, a network, or other connection mechanism.” The paragraph teaches the process of reporting the operation, which comprises the anomaly detection mentioned above.).

Regarding claim 14, Cantrell in view of Iskandar further comprising outputting an explanation report including a feature importance ranking list based at least in part on the ranking scores ([Iskandar, 0036] “Execution of multivariate analysis routine 158 on various ranges (e.g., 161, 162) could be used to generate rankings 180 of one or more process data 140 contributing most to a difference between the various ranges. For example, execution of multivariate analysis routine 158 on first range 161 and second range 162 could be used to generate a ranking 181, as represented by table 183, of the one or more process data 140 contributing most to a difference between the two ranges 161, 162. Ranking 181 could also include the relative contributions 185 of each process data 140 included in the ranking. Computing system 110 could be configured to display one or more rankings 180 as one or more tables (e.g., table 183) on first view 210.” The paragraph teaches displaying (report) the ranking score, which comprises the contribution score to the difference between the two range of data, which can be interpreted as feature importance).

Regarding claim 15, Cantrell in view of Iskandar further comprising outputting an explanation report including a list of a predetermined number of top data records ([Iskandar, 0045] “At block 312, computing system 110 could be employed to determine one or more process data 140 causing a variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 of the multivariate model output data 160. The variability of the multivariate model output data 160 in the first range 161 when compared to the second range 162 can be determined by generating a ranking 181, of one or more process data 140 contributing most to a difference between the two ranges 161, 162. Computing system 110 could display a ranking 181 in table 183 with relative contributions 185 on first view 210 of user interface 120.”
[Iskandar, 0046] “At block 314, a user can analyze multivariate model output data 160, ranking 181 in table 183, and process equipment 130 to determine if any further action should be taken.”).

Regarding claim 16, Cantrell in view of Iskandar teaches wherein the list of a predetermined number of top data records includes feature values used by the second machine learning model ([Iskandar, 0046] “At block 314, a user can analyze multivariate model output data 160, ranking 181 in table 183, and process equipment 130 to determine if any further action should be taken. At block 316, a user could decide whether or not to take further action by adjusting an existing multivariate fault or adding a new multivariate fault. If a user decides not to adjust an existing multivariate fault or add a new fault, then the user could either end the analysis of the multivariate model output data 160 or execute method 300 again on a different selection of first range 161 and second range 162. Both ranges 161, 162 could be changed or only one of the ranges 161, 162 could be changed. Computing system 110 could also be configured to allow selection of a different multivariate analysis method. A multivariate analysis using a different method could then be run again on the same ranges 161, 162.”).

Claim 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1) and further in view of Loo (US 8010296 B2).

Regarding claim 4, Cantrell in view of Iskandar teaches the method of claim 1. 
However, Cantrell in view of Iskandar failed to teach further comprising removing index-correlated features prior to training the second machine learning model.
Loo teaches further comprising removing index-correlated features prior to training the second machine learning model ([Loo, column 2, line 64 – column 3, line 2] “The present invention also provides a method for analyzing a set of indexed data to compress the set of data. The method comprises the steps of identifyig and removing portions of the set of data having insufficient discriminatory power based on ensemble statistics of the set of indexed data, thereby providing a set of compressed indexed data.”, removing indices or index that has less discriminatory power is taught by Loo.
[Loo, column 13, line 17-25] “Alternatively, if a training set is available, a supervised feature extraction module 400 can be used to detect and remove points that have little discriminatory power, as illustrated in FIG. 6. Feature extraction in this case is an optimization problem whose objective is to find a combination of molecular weights (features) that yield the best classification performance under a given classification algorithm. This kind of optimization problem may be approached through stochastic search methods, such as a genetic algorithm 410.” This paragraph teaches the removing indices happens before training).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, and Loo to use the method of removing index-correlated features prior to training the machine learning model, as taught by Loo, to implement the machine learning system, as taught by Cantrell and Iskandar. The suggestion and/or motivation to do so is to improve the efficiency of the system, as removing non-discriminatory indices reduces the usage of memory space (Loo, Abstract).

Regarding claim 5, Cantrell in view of Iskandar teaches the method of claim 1. 
However, Cantrell in view of Iskandar does not specifically teach further comprising removing time-correlated features prior to training the second machine learning model.
Loo teaches further comprising removing time-correlated features prior to training the second machine learning model ([Loo, column 1, line 16-23] “The term "indexed data" or "spectrum" refers to a collection of measured values called responses. Each response may or may not be related to one or more of its neighbor elements. When a unique index, either one-dimensional or multi-dimensional, is assigned to each response, the data are considered to be indexed. The index values represent values of a physical parameter such as time, distance, frequency, mass, weight or category.” This paragraph teaches the index value includes the time-correlated feature.
[Loo, column 2, line 64 – column 3, line 2] “The present invention also provides a method for analyzing a set of indexed data to compress the set of data. The method comprises the steps of identifyig and removing portions of the set of data having insufficient discriminatory power based on ensemble statistics of the set of indexed data, thereby providing a set of compressed indexed data.”, removing indices or index that has less discriminatory power is taught by Loo).

	Claim 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1), in view of Loo (US 8010296 B2), in view of Hunter (US 20150107334 A1), and further in view of Dadkhani (US 20180349466 A1).

Regarding claim 6, Cantrell in view of Iskandar and further in view of Loo teaches wherein removing time-correlated features prior to training the second machine learning model ([Loo, column 1, line 16-23] “The term "indexed data" or "spectrum" refers to a collection of measured values called responses. Each response may or may not be related to one or more of its neighbor elements. When a unique index, either one-dimensional or multi-dimensional, is assigned to each response, the data are considered to be indexed. The index values represent values of a physical parameter such as time, distance, frequency, mass, weight or category.” This paragraph teaches the index value includes the time-correlated feature.
[Loo, column 2, line 64 – column 3, line 2] “The present invention also provides a method for analyzing a set of indexed data to compress the set of data. The method comprises the steps of identifyig and removing portions of the set of data having insufficient discriminatory power based on ensemble statistics of the set of indexed data, thereby providing a set of compressed indexed data.”, removing indices or index that has less discriminatory power is taught by Loo).
However, Cantrell in view of Iskandar, and further in view of Loo does not specifically teach includes: obtaining a data series associated with a distribution of values that generated the data records; shuffling the data series randomly a predetermined number of times; calculating corresponding values of a measure of correlation for each shuffle; selecting a maximum observed value among the shuffles to be a threshold; determining a value for the measure of correlation without shuffling; and removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold.
Hunter teaches includes: obtaining a data series associated with a distribution of values that generated the data records; shuffling the data series randomly a predetermined number of times calculating corresponding values of a measure of correlation for each shuffle ([Hunter, 0031] “A stochastic sequence can be a random sequence or a pseudo-random sequence. A sufficiently complex pattern can be sufficiently stochastic over the time scale of the experiment. Whether a pattern is sufficiently stochastic can be determined by auto correlating the pattern with itself to ensure there is no correlation with the sequence. For example, such sequences can be generated with an arbitrary specified first-order probability distribution function and an arbitrary specified first order auto-correlation function. A set of numbers having a desired probability distribution function are generated. These values are given an independent (white) auto-correlation function by double stochastic interchange. The desired auto-correlation function is then obtained by stochastically shuffling the series to minimize a sum of squares criterion between the desired and actual auto-correlation functions.”); 
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, Loo, and Hunter to use the method of shuffling the data series randomly and measuring correlation values, as taught by Hunter, to implement the machine learning method, as taught by Cantrell, Iskandar, and Loo. The suggestion and/or motivation to do so is to ensure a high statistical confidence and to make more accurate prediction (Hunter, 0031).
However, Cantrell in view of Iskandar, in view of Loo, and further in view of Hunter does not specifically teach selecting a maximum observed value among the shuffles to be a threshold; determining a value for the measure of correlation without shuffling; and removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold.
Dadkhani teaches selecting a maximum observed value among the shuffles to be a threshold; determining a value for the measure of correlation without shuffling; and removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold ([Dadkhani, 0005] “In one example, a method for determining correlations between data metrics and generating user interfaces indicating the data metrics and correlations includes identifying, by a processor, each pair of data metrics in a web analytics data set. The method further includes determining, by the processor, a Maximal Information Coefficient (MIC) score for each pair of data metrics. The MIC score for each pair of data metrics indicates a strength of a correlation between the pair of data metrics. The method further includes generating, by the processor, an interactive user interface graphically displaying each pair of correlated data metrics having an MIC score above a threshold. The interactive user interface indicates the strength of the correlation between each displayed pair of correlated data metrics. The method further includes receiving, by the processor, user input indicating an adjustment to the threshold. The method also includes modifying, by the processor, the interactive user interface in response to receiving the user input by adding pairs of correlated data metrics to the interactive user interface or removing pairs of correlated data metrics from the user interface based on the adjustment to the threshold.” The Maximal Information Coefficient comprises a method of finding the maximal score between each pair of data metrics.).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, Loo, Hunter, and Dadkhani to use the method of selecting a maximum observed value among the shuffles to be a threshold; determining a value for the measure of correlation without shuffling; and removing a feature if the value for the measure of correlation without shuffling of the feature is larger than the threshold, as taught by Dadkhani, to implement the machine learning method, as taught by Cantrell, Iskandar, Loo, and Hunter. The suggestion and/or motivation to do so is to improve the efficiency of correlation finding method, as the process remove the redundancy in the data (Dadkhani, 0003).

Regarding claim 7, Cantrell in view of Iskandar in view of Loo in view of Hunter and further in view of Dadkhani teaches wherein the measure of correlation is sensitive to non-linear relations ([Dadkhani, 0045] “A significant advantage of using MIC to identify correlations between pairs of metrics in a web analytics dataset is its ability to capture a wide range of pairwise correlations from nonlinear to even nonfunctional relationships.”).

Regarding claim 8, Cantrell in view of Iskandar in view of Loo in view of Hunter and further in view of Dadkhani teaches wherein the measure of correlation includes a Maximal Information Coefficient (MIC) ([Dadkhani, 0005] “In one example, a method for determining correlations between data metrics and generating user interfaces indicating the data metrics and correlations includes identifying, by a processor, each pair of data metrics in a web analytics data set. The method further includes determining, by the processor, a Maximal Information Coefficient (MIC) score for each pair of data metrics. The MIC score for each pair of data metrics indicates a strength of a correlation between the pair of data metrics ...”).

Regarding claim 9, Cantrell in view of Iskandar in view of Loo in view of Hunter and further in view of Dadkhani teaches wherein shuffling the data series randomly a predetermined number of times includes choosing the predetermined number of times to ensure a statistical confidence above a threshold ([Hunter, 0031] “A stochastic sequence can be a random sequence or a pseudo-random sequence. A sufficiently complex pattern can be sufficiently stochastic over the time scale of the experiment. Whether a pattern is sufficiently stochastic can be determined by auto correlating the pattern with itself to ensure there is no correlation with the sequence. For example, such sequences can be generated with an arbitrary specified first-order probability distribution function and an arbitrary specified first order auto-correlation function. A set of numbers having a desired probability distribution function are generated. These values are given an independent (white) auto-correlation function by double stochastic interchange. The desired auto-correlation function is then obtained by stochastically shuffling the series to minimize a sum of squares criterion between the desired and actual auto-correlation functions.”);

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1) and further in view of Sicurelli (US 20190212977 A1).

Regarding claim 11, Cantrell in view of Iskandar teaches the method of claim 10. 
Howerver, Cantrell in view of Iskandar failed to teach wherein the second machine learning model is a Gradient Boosted Decision Trees (GBDT) model.
Sicurelli teaches wherein the second machine learning model is a Gradient Boosted Decision Trees (GBDT) model ([Sicurelli, 0027] “As described above, a ML-trained model or function may be applied to the set of candidate geographic coordinates. In particular embodiments, a confidence score for each candidate geographic coordinate of place 102 is calculated using the ML-trained model applied to one or more features, described below, associated with each candidate geographic coordinate. The ML algorithm may access the features of the candidate geographic coordinates. In particular embodiments, the ML algorithm (e.g., gradient-boosted decision tree) may optimize a predictor function or computer model. As an example and not by way of limitation, gradient boosting is a ML technique that may be used to model classification problems that produces a prediction model in the form of an ensemble of prediction models (e.g., decision trees) ... Examples of other features may include a time difference between a time the candidate geographic coordinates is being ranked and an average time of the location data associated with place 102 or whether the candidate geographic coordinates is within the bounding box of a city polygon.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, and Sicurelli to use the method of Gradient Boosted Decision Tree, as taught by Sicurelli, to implement the machine learning method, as taught by Cantrell and Iskandar. The suggestion and/or motivation to do so is to optimize the system, as GBDT algorithm helps optimizing the predictor function or computer model (Sicurelli, 0027).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1) and further in view of Chen (US 20170314961 A1)

Regarding claim 13, Cantrell in view of Iskandar teaches the method of claim 1.
Chen teaches further comprising outputting an explanation report including window start and end timestamps ([Chen, 0028] “The system 102 may receive input 112, which may include time series data 114, and a sliding window segmentation module 116 may perform sliding window segmentation on the time series data 114. A feature extraction module 118 may extract features of system dynamics by linear or nonlinear subspace composition that represent the temporal evolution of the system 102, and a modeling module 120 and/or an analytic engine 122 may model system dynamics based on the features extracted by the feature extraction module 118. A model integrator module 124 may be implemented to combine information from different models, and to generate an overall report of system operation. The system may generate output in block 128, which may include a temporal system dynamics model 130 and anomalies 132 detected by an anomaly detection module in block 126.”).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, and Chen to use the method of outputting an explanation report including window start and end timestamp, as taught by Chen, to implement the machine learning method, as taught by Cantrell and Iskandar. The suggestion and/or motivation to do so is to improve the performance of the system, as the start and end time of the sliding window is needed to validate the performance of the system (Chen, 0028).

Claim 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Cantrell (US 11181894 B2) in view of Iskandar (US 20150302311 A1) and further in view of Barker (US 20150331963 A1).

Regarding claim 17, Cantrell in view of Iskandar teaches the method of claim 1 further comprising outputting an explanation report ([Cantrell, column 22, line 12-23] “In addition to the aforementioned components, an asset may also be equipped with a set of on-board components that enable the asset to capture and report operating data. To illustrate, FIG. 4 is simplified block diagram showing some on-board components for capturing and reporting operating data that may be included within or otherwise affixed to an example asset 400. As shown, these on-board components may include sensors 402, a processor 404, data storage 406, a communication interface 408, and perhaps also a local analytics device 410, all of which may be communicatively coupled by a communication link 412 that may take the form of a system bus, a network, or other connection mechanism.” The paragraph teaches the process of reporting the operation.). 
However, Cantrell in view of Iskandar does not specifically teach the report including a validation curve to show how well a ranking of the data records can lower a monitoring value.
Barker teaches the report including a validation curve to show how well a ranking of the data records can lower a monitoring value ([Barker, 0045] “The graph data curves 540, 544 in FIG. 5 show how well each set of parameter values in the penalized regression models fits for a range of tuning parameter values. To implement a penalized regression model, a set of selected parameters is used in a training regimen, from which a training graph curve 544 is drawn, as known to those skilled in the art. After training, a desired data set, such as “actual data” on which analysis is to be performed, may be provided to the trained penalized regression model, and a validation curve 540 may be generated. That is, training and validation data are used when a form of cross-validation is utilized to pick the best value of the tuning parameter. Alternatively, if an information criteria is used to select the tuning parameter instead, then there will only be a single line of an information criteria curve in the window 508, as known to those skilled in the art” Barker does not explicitly teach using a validation curve to show how well a ranking of the data records can lower a monitoring value, but the input data validated by the validation curve can be interpreted as a ranking of the data, which is taught by Iskandar).
Before the effective filing date of the invention to a person of ordinary skill in the art, it would have been obvious, having the teachings of Cantrell, Iskandar, and Barker to use the method of explanation report includes the validation curve, as taught by Barker, to implement the machine learning method, as taught by Cantrell and Iskandar. The suggestion and/or motivation to do so is to test the performance of the system, as the validation curve shows how much each of the dataset correlates to each other, which helps the validation of the performance of the system (Barker, 0045).

Regarding claim 18, Cantrell in view of Iskandar and further in view of Barker teaches wherein the validation curve includes values using a target window with a predetermined number of top events removed ([Barker, 0045] “The graph data curves 540, 544 in FIG. 5 show how well each set of parameter values in the penalized regression models fits for a range of tuning parameter values. To implement a penalized regression model, a set of selected parameters is used in a training regimen, from which a training graph curve 544 is drawn, as known to those skilled in the art. After training, a desired data set, such as “actual data” on which analysis is to be performed, may be provided to the trained penalized regression model, and a validation curve 540 may be generated. That is, training and validation data are used when a form of cross-validation is utilized to pick the best value of the tuning parameter. Alternatively, if an information criteria is used to select the tuning parameter instead, then there will only be a single line of an information criteria curve in the window 508, as known to those skilled in the art” The paragraph teaches using validation curve, and the input dataset to the can be the values using a target window which is taught by Cantrell).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Regarding anomaly detection using machine learning model.
US 20060084059 A1
US 20190327251 A1
US 20130342402 A1
The references above teach the anomaly detection in a plurality of sets of data using one or more machine learning models.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUN KWON whose telephone number is (571)272-2072. The examiner can normally be reached on M-F 7:30AM – 4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JUN KWON/
Patent Examiner, Art Unit 2127
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126