DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 21 May 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 9-10, and 16-18 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al. 

Regarding claim 1, Modi et al. discloses a system, comprising: 

a non-transitory memory (Modi et al., fig. 2(214)); and 

one or more hardware processors (Modi et al., fig. 2(222)) coupled with the non-transitory memory and configured to read instructions from the non-transitory memory (Modi et al., para [0027] and fig. 2) to cause the system to perform operations comprising: 

generating a first plurality of decision trees as a first layer of a gradient boosting tree (“The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence,” Modi et al., para [0041]. And, “In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308),” Modi et al., para [0042]. Thus, these two excerpts together show that the ensemble learning model may include multiple layers, each layer comprising models. And, the models may be decision trees.)

determining, for each of the first plurality of decision trees, a prediction error based on a set of training data (“The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence. In this implementation, a tree later in the sequence learns to “correct” errors from predictions generated by earlier decision trees,” Modi et al., para [0041].); 

calculating a collective prediction error for the first layer of the gradient boosting tree based on the prediction errors determined for each of the first plurality of decision trees (“In this implementation, a tree later in the sequence learns to “correct” errors from predictions generated by earlier decision trees,” Modi et al., para [0041].); and 

generating a second plurality of decision trees as a second layer of the gradient boosting tree based on the collective prediction error calculated for the first layer of the gradient boosting tree (“In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308). Some implementations can include a number of layers of prediction models. In some embodiments, features of machine learning component 302 can also be determined,” Modi et al., para [0042].).  

Modi et al., though, does not disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features. Agrawal et al. is cited to disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features (“Because the second model input dataset includes transaction data associated with one or more breach incidents, the patterns and metrics associated with fraud can be determined and applied to the first model input dataset to detect fraud. At least partly based on the comparison of the two datasets, one or more machine-learning prediction models can be trained in step 308m” Agrawal et al., para [0074].). Agrawal et al. benefits Modi et al. by using a machine-learning systems to quickly and efficiently detect merchant breaches (Agrawal et al., para [0003]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Agrawal et al. to extend the applicability of the resource deployment predictions of Modi et al.

Regarding claim 2, Modi et al., as modified by Agrawal et al., discloses the system of claim 1, wherein the operations further comprise: 

selecting, for each decision tree in the first plurality of decision trees from a set of features, one or more risk detection features (“Because the second model input dataset includes transaction data associated with one or more breach incidents, the patterns and metrics associated with fraud can be determined and applied to the first model input dataset to detect fraud. At least partly based on the comparison of the two datasets, one or more machine-learning prediction models can be trained in step 308,” Agrawal et al., para [0074].). 

Regarding claim 3, Modi et al., as modified by Agrawal et al., discloses the system of claim 2, wherein the set of features comprise features associated with an electronic transaction, features associated with a user device that submitted the electronic transaction, and features associated with a user account used in the electronic transaction (“The transaction data includes, for each transaction of the plurality of transactions, an authorization request, a portable financial device identifier, and at least one of the following: transaction amount, transaction time, transaction type, merchant identifier, merchant type, or any combination thereof,” Agrawal et al., para [0005]. “Transaction amount/time/type”, etc. are features associated with an electronic transaction. The “portable financial device identifier” is a feature associated with a user device that submitted the electronic transaction. And, “authorization request” is a feature associated with a user account used in the electronic transaction.).  

Regarding claim 9, Modi et al. discloses a method, comprising: 

selecting, by one or more hardware processors (Modi et al., fig. 2(222)) from a set of features, subsets of features for a first plurality of models (“Implementations of the random forest classifier include decision trees that are trained by data set 304 (e.g., using subsets of the training data per tree). The random forest algorithm can then aggregate votes from these decision trees to arrive at a prediction Modi et al., para [0040].); 

generating, by the one or more hardware processors, the first plurality of models for a first layer of a machine learning model The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence,” Modi et al., para [0041]. And, “In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308),” Modi et al., para [0042].), 

determining, by the one or more hardware processors, a collective prediction error associated with the first layer of the machine learning model based on a comparison between predicted risks obtained from the first plurality of models and labeled risks associated with a set of training data (“The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence. In this implementation, a tree later in the sequence learns to “correct” errors from predictions generated by earlier decision trees,” Modi et al., para [0041]. This excerpt shows that the there is a comparison (i.e., computed error) between the predictions and the training data. And, “Data set 304 can be any set of data capable of configuring machine learning component 302 to generate predictions, such as training data (e.g., a set of features with corresponding labels, such as labeled data for supervised learning),” Modi et al., para [0033]. This excerpt shows that the data may be labeled.); and 4840-4272-3515 v.1-32-Attorney Docket No.: 70481.2723US01 OCP.D2019.100773.US1 

generating, by the one or more hardware processors, a second plurality of models for a second layer of the machine learning model based on the collective prediction error associated with the first layer of the machine learning model (“In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308). Some implementations can include a number of layers of prediction models. In some embodiments, features of machine learning component 302 can also be determined,” Modi et al., para [0042].).  

Modi et al., though, does not disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features. Agrawal et al. is cited to disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features (“Because the second model input dataset includes transaction data associated with one or more breach incidents, the patterns and metrics associated with fraud can be determined and applied to the first model input dataset to detect fraud. At least partly based on the comparison of the two datasets, one or more machine-learning prediction models can be trained in step 308m” Agrawal et al., para [0074].). Agrawal et al. benefits Modi et al. by using a machine-learning systems to quickly and efficiently detect merchant breaches (Agrawal et al., para [0003]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Agrawal et al. to extend the applicability of the resource deployment predictions of Modi et al.

Regarding claim 10, Modi et al., as modified by Agrawal et al., discloses the method of claim 9, further comprising: 

selecting, for each model in the first plurality of models from the set of training data, a subset of training data (“In some embodiments, machine learning component 302 can be an ensemble learning model. For example, machine learning component 302 can include a random forest classifier that includes multiple machine learning components whose predictions are combined. Implementations of the random forest classifier include decision trees that are trained by data set 304 (e.g., using subsets of the training data per tree). The random forest algorithm can then aggregate votes from these decision trees to arrive at a prediction,” Modi et al., para [0040].); and 

training each model in the first plurality of models using the corresponding subset of training data (“In some embodiments, machine learning component 302 can be an ensemble learning model. For example, machine learning component 302 can include a random forest classifier that includes multiple machine learning components whose predictions are combined. Implementations of the random forest classifier include decision trees that are trained by data set 304 (e.g., using subsets of the training data per tree). The random forest algorithm can then aggregate votes from these decision trees to arrive at a prediction,” Modi et al., para [0040]. See also para [0010], which describes applying training and test data to the ML model.).  

Regarding claim 16, Modi et al. discloses a non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: 

selecting, from a set of Implementations of the random forest classifier include decision trees that are trained by data set 304 (e.g., using subsets of the training data per tree). The random forest algorithm can then aggregate votes from these decision trees to arrive at a prediction,” Modi et al., para [0040]. See also para [0010], which describes applying training and test data to the ML model.); 

generating the first plurality of models for a first layer of a machine learning model, wherein each of the first plurality of models is configured to receive input values corresponding to a corresponding subset of The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence,” Modi et al., para [0041]. And, “In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308),” Modi et al., para [0042].); 

determining a prediction error associated with the first plurality of models based on a comparison between outputs obtained from the first plurality of models and labeled risks associated with a set of training data (“The gradient boosting algorithm can similarly leverage an ensemble learning technique with multiple decision trees trained using data set 304, however the gradient boosting algorithm can align decision trees in sequence. In this implementation, a tree later in the sequence learns to “correct” errors from predictions generated by earlier decision trees,” Modi et al., para [0041]. This excerpt shows that the there is a comparison (i.e., computed error) between the predictions and the training data. And, “Data set 304 can be any set of data capable of configuring machine learning component 302 to generate predictions, such as training data (e.g., a set of features with corresponding labels, such as labeled data for supervised learning),” Modi et al., para [0033]. This excerpt shows that the data may be labeled.); and 

generating a second plurality of models for a second layer of the machine learning model based on the prediction error associated with the first plurality of models (“In some embodiments, machine learning component 302 can be multiple models stacked, for example with the output of a first model feeding into the input of a second model. For example, an ensemble learning model can include multiple layers of machine learning models with varying architecture where the predictions output by a first model serve as input for a second model that in turn generates next predictions (e.g., prediction 308). Some implementations can include a number of layers of prediction models. In some embodiments, features of machine learning component 302 can also be determined,” Modi et al., para [0042].).  

Modi et al., though, does not disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features. Agrawal et al. is cited to disclose wherein each of the first plurality of decision trees is configured to receive input values corresponding to one or more risk detection features (“Because the second model input dataset includes transaction data associated with one or more breach incidents, the patterns and metrics associated with fraud can be determined and applied to the first model input dataset to detect fraud. At least partly based on the comparison of the two datasets, one or more machine-learning prediction models can be trained in step 308m” Agrawal et al., para [0074].). Agrawal et al. benefits Modi et al. by using a machine-learning systems to quickly and efficiently detect merchant breaches (Agrawal et al., para [0003]). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Agrawal et al. to extend the applicability of the resource deployment predictions of Modi et al.

Regarding claim 17, Modi et al., as modified by Agrawal et al., discloses the non-transitory machine-readable medium of claim 16, wherein the operations further comprise: 

receiving a request to determine a risk associated with a transaction request (“The method 300 includes receiving transaction data associated with a plurality of transactions between one or more financial device holders and one or more merchants in step 302. The transactions may be those occurring in an electronic payment processing network, in which case the may be processed via authorization requests at a transaction processing server,” Agrawal et al., para [0074]. The transaction data is used to determine fraud (i.e., a risk) alert.); 

obtaining input values associated with the transaction request and corresponding to the set of features (“The method 300 also includes receiving fraudulent transaction data representative of one or more previously identified data-breach incidents, in step 304. For example, if a data breach was self-reported by a merchant, the fraudulent transaction data may include transaction data associated with transactions for the reporting merchant that occurred around and/or after the time of data breach. Moreover, if the data breach was previously detected by the system’s data breach detection models, transactions associated with one or more detected breaches may be used as a baseline for future breach determinations,” Agrawal et al., para [0074]. The transaction data are input values.); 

providing the input values to the machine learning model (“With further reference to FIG. 3, and in further non-limiting embodiments or aspects, the method 300 includes generating, based at least partly on the transaction data and the fraudulent transaction data, a first model input dataset and a second mode! input dataset in step 306,” Agrawal et al., para [0075].); 

obtaining an output value corresponding to the risk associated with the transaction request from the machine learning model (“With further reference to FIG. 3, and in further non-limiting embodiments or aspects, the method 300 includes determining one or more merchants that have likely been breached in step 312. For example, if any one prediction model indicates that a merchant has been breached, then the merchant may be determined to have been breached (i.e., an “OR” evaluation). Alternatively, if more than one or all of the prediction models indicate that a merchant has been breached, then the merchant may be determined to have been breached {i.e , an“AND” evaluation). Each machine learning prediction model that is employed may be compared against its own model-specific threshold, indicative of a likelihood of breach, and may be ensembled with another model type to create a combined breach likelihood score,” Agrawal et al., para [0076]. The output value may be an indication that a merchant has been breached.).

Regarding claim 18, Modi et al., as modified by Agrawal et al., discloses the non-transitory machine-readable medium of claim 16, wherein the set of risk detection features comprise features associated with an electronic transaction, features associated with a user device that submitted the electronic transaction, and features associated with a user account used in the electronic transaction (“The transaction data includes, for each transaction of the plurality of transactions, an authorization request, a portable financial device identifier, and at least one of the following: transaction amount, transaction time, transaction type, merchant identifier, merchant type, or any combination thereof,” Agrawal et al., para [0005]. “Transaction amount/time/type”, etc. are features associated with an electronic transaction. The “portable financial device identifier” is a feature associated with a user device that submitted the electronic transaction. And, “authorization request” is a feature associated with a user account used in the electronic transaction.).  

Claim(s) 4 and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al, and further in view of US 20190377819, hereinafter referred to as Filliben et al. 

Regarding claim 4, Modi et al., as modified by Agrawal et al., discloses the system of claim 2, wherein the selecting comprises selecting different one or more risk detection features for different decision trees in the first plurality of decision trees (“Other examples of techniques that may be used in step 604 include, but are not limited to bagged decision trees, random forest, gradient tree boosting, and/or stacking. In one example, a random forest approach with multiple decision trees may comprise using samples drawn with replacement, selecting random subset of features for each tree, and/or randomizing splitting thresholds,” Filliben et al., para [0079]. Selecting random sets of features is selecting different one or more features for each decision tree.). Filliben et al. benefits Modi et al. by constraining the learners in the machine learning ensemble (Filliben et al., para [0079]), thereby tailoring the training feature set so that the machine learning ensemble provides better predictions. Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Filliben et al. to enhance the resource deployment predictions of Modi et al. 

Regarding claim 11, Modi et al., as modified by Agrawal et al., discloses the method of claim 10, but not further comprising: 

modifying the set of training data;

selecting, for each model in the second plurality of models from the modified set of training data, a subset of training data; and 

training each model in the second plurality of models using the corresponding subset of training data.

Filliben et al. is cited to disclose modifying the set of training data (“Other examples of techniques that may be used in step 604 include, but are not limited to bagged decision trees, random forest, gradient tree boosting, and/or stacking. In one example, a random forest approach with multiple decision trees may comprise using samples drawn with replacement, selecting random subset of features for each tree, and/or randomizing splitting thresholds,” Filliben et al., para [0079]. Samples drawn with replacement or selecting random subset of features is modifying the set of training data.); 

selecting, for each model in the second plurality of models from the modified set of training data, a subset of training data (“Other examples of techniques that may be used in step 604 include, but are not limited to bagged decision trees, random forest, gradient tree boosting, and/or stacking. In one example, a random forest approach with multiple decision trees may comprise using samples drawn with replacement, selecting random subset of features for each tree, and/or randomizing splitting thresholds,” Filliben et al., para [0079].); and 

training each model in the second plurality of models using the corresponding subset of training data (“In one example, a stacking approach may include where several base models are trained using available data—a combiner model is trained using the outputs of the several base models as input, and creates a final output or prediction. Stacking may involve tuning hyper-parameters,” Filliben et al., para [0079]. See also para [0010], which describes applying training and test data to the ML model.). Filliben et al. benefits Modi et al. by constraining the learners in the machine learning ensemble (Filliben et al., para [0079]), thereby tailoring the training feature set so that the machine learning ensemble provides better predictions. Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Filliben et al. to enhance the resource deployment predictions of Modi et al. 

Regarding claim 12, Modi et al., as modified by Agrawal et al. and Filliben et al., discloses the method of claim 11, wherein the modifying the set of training data comprises removing at least a portion of training data from the set of training data (“Other examples of techniques that may be used in step 604 include, but are not limited to bagged decision trees, random forest, gradient tree boosting, and/or stacking. In one example, a random forest approach with multiple decision trees may comprise using samples drawn with replacement, selecting random subset of features for each tree, and/or randomizing splitting thresholds,” Filliben et al., para [0079]. Selecting a subset of features may be removing a portion of training data from the set of training data.).  

Claim(s) 5 and 20 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al, and further in view of EP 3901839, hereinafter referred to as Huu. 

Regarding claim 5, Modi et al., as modified by Agrawal et al., discloses the system of claim 2, but not wherein the operations further comprise: 4840-4272-3515 v.1-3 1-Attorney Docket No.: 70481.2723US01 OCP.D2019.100773.US1 

modifying the set of features by removing at least one feature based on the prediction errors determined for the first plurality of decision trees; and 

selecting, for each decision tree in the second plurality of decision trees from the modified set of features, a corresponding subset of risk detection features.

Huu is cited to disclose modifying the set of features by removing at least one feature based on the prediction errors determined for the first plurality of decision trees (“Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature,” Huu, Abstract.); and 

selecting, for each decision tree in the second plurality of decision trees from the modified set of features, a corresponding subset of risk detection features (Huu, Abstract.). Huu benefits Modi et al. by removing from the training data the input values that have the least impact on the output in order to reduce the model complexity, thereby reducing overfitting in the GBDT (Huu, col. 3, lines 10-13). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Huu to avoid overfitting the GBDT of Modi et al. 

Regarding claim 20, Modi et al., as modified by Agrawal et al., discloses the non-transitory machine-readable medium of claim 16, but not wherein the operations further comprise: 

modifying the set of features by removing at least one feature based on the prediction errors determined for the first plurality of decision trees; and 

selecting, for each decision tree in the second plurality of decision trees from the modified set of features, a corresponding subset of risk detection features.

Huu is cited to disclose modifying the set of risk detection features by removing at least one risk detection feature based on the prediction error determined for the first plurality of models (“Gradient Boosting Decision Tree (GBDT) successively stacks many decision trees which at each step try to fix the residual errors from the previous steps. The final score produced by the GBDT is the sum of the individual scores obtained by the decision trees for an input vector. Overfitting in GBDT can be reduced by removing the input values that have the least impact on the output from the training data. One way to determine which input variable has the lowest predictive value is to determine the input variable that is used for the first time in the latest decision tree in the GBDT. This method of identifying the low-predictive features to be removed does not require that earlier trees be regenerated to generate the new GBDT. Since the removed feature was already not used in the earlier trees, those trees already ignore the removed feature,” Huu, Abstract.); and 

selecting, for each model in the second plurality of models from the modified set of risk detection features, a subset of risk detection features (Huu, Abstract.). Huu benefits Modi et al. by removing from the training data the input values that have the least impact on the output in order to reduce the model complexity, thereby reducing overfitting in the GBDT (Huu, col. 3, lines 10-13). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Huu to avoid overfitting the GBDT of Modi et al. 

Claim(s) 6 and 19 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al., and further in view of US 11100158, hereinafter referred to as Han et al.

Regarding claim 6, Modi et al., as modified by Agrawal et al., discloses the system of claim 1, wherein each of the second plurality of decision trees is configured to use one or more risk detection features for detecting a risk of a transaction (“Because the second model input dataset includes transaction data associated with one or more breach incidents, the patterns and metrics associated with fraud can be determined and applied to the first model input dataset to detect fraud. At least partly based on the comparison of the two datasets, one or more machine-learning prediction models can be trained in step 308,” Agrawal et al., para [0074].), but not wherein the one or more risk detection features used by the second plurality of decision trees has overlapping features with the one or more risk detection features used by the first plurality of decision trees.

Han et al. is cited to disclose wherein the one or more risk detection features used by the second plurality of decision trees has overlapping features with the one or more risk detection features used by the first plurality of decision trees (“The candidate features 302 for the second model may be the same set of features as the candidate features 202 (FIG. 2) of the first model, or different with overlapping features. The candidate features 302 for the second model at least includes the feature selected for the first model, fs,” Han et al., col. 6, lines 51-57.). Han et al. benefits Modi et al. by providing a fast feature selection method for training the model (Han et al., Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Han et al. to enhance the training feature selection of Modi et al.  


Regarding claim 19, Modi et al., as modified by Agrawal et al., discloses the non-transitory machine-readable medium of claim 16, but not wherein the selecting4840-4272-3515 v.1-34-Attorney Docket No.: 70481.2723US01 OCP.D2019.100773.US1comprises selecting different subsets of risk detection features for different models in the first plurality of models. Han et al. is cited to disclose wherein the selecting4840-4272-3515 v.1-34-Attorney Docket No.: 70481.2723US01 OCP.D2019.100773.US1comprises selecting different subsets of risk detection features for different models in the first plurality of models ( “The candidate features 302 for the second model may be the same set of features as the candidate features 202 (FIG. 2) of the first model, or different with overlapping features. The candidate features 302 for the second model at least includes the feature selected for the first model, fs,” Han et al., col. 6, lines 51-57.). Han et al. benefits Modi et al. by providing a fast feature selection method for training the model (Han et al., Abstract). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Han et al. to enhance the training feature selection of Modi et al.  


Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatenatable unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al., and further in view of US 20200027021, hereinafter referred to as Sastry et al. 

Regarding claim 7, Modi et al., as modified by Agrawal et al., discloses the system of claim 1, but not wherein the calculating the collective prediction error comprises determining an average of the prediction errors determined for the first plurality of decision trees. Sastry et al. is cited to disclose wherein the calculating the collective prediction error comprises determining an average of the prediction errors determined for the first plurality of decision trees (“Various artificial intelligence algorithms can be used to calibrate the candidate photoresist models to best fit the silicon data. These include regression models, artificial neural networks, decision trees, genetic algorithms, and support vector machines. The various regression approaches can include stepwise regression with sensitivity analysis, lasso or elastic-net based regularized regression, and ridge regression, and the total weighted mean-squared error between predicted photoresist contours and photoresist contours extracted from SEM images can serve as the regression cost,” Sastry et al., para [0063].). Sastry et al. benefits Modi et al. by providing metric for measuring prediction errors (Sastry et al., para [0063]), thereby providing a standard by which to correct errors. Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Sastry et al. to improve the resource deployment predictions of Modi et al. 

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al., further in view of US 20200027021, hereinafter referred to as Sastry et al., and further in view of US 20180349382, hereinafter referred to as Kumaran et al.  

Regarding claim 8, Modi et al., as modified by Agrawal et al. and Sastry et al., discloses the system of claim 7, wherein the average is a weighted average of the prediction errors determined for the first plurality of decision trees (“Various artificial intelligence algorithms can be used to calibrate the candidate photoresist models to best fit the silicon data. These include regression models, artificial neural networks, decision trees, genetic algorithms, and support vector machines. The various regression approaches can include stepwise regression with sensitivity analysis, lasso or elastic-net based regularized regression, and ridge regression, and the total weighted mean-squared error between predicted photoresist contours and photoresist contours extracted from SEM images can serve as the regression cost,” Sastry et al., para [0063].), but not wherein the operations further comprise assigning weights to the first plurality of decision trees based on the one or more risk detection features associated with each of the first plurality of decision trees. 

Kumaran et al. is cited to disclose wherein the operations further comprise assigning weights to the first plurality of decision trees based on the one or more FIG. 6A is one example of a decision tree with exemplary features and weight values,” Kumaran et al., para [0019].). Kumaran et al. benefits Modi et al. by providing varying weight to the decision trees in order to adjust search engine rankings (Kumaran et al., para [0040]). Therefore, it would be obvious for one skilled in the art to combine the teaching of Modi et al. with those of Kumaran et al. to improve the resource deployment predictions of Modi et al. 
    
Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatenatable over US 20210357835, hereinafter referred to as Modi et al., in view of WO 2020023003, hereinafter referred to as Agrawal et al., and further in view of WO 2021012220, hereinafter referred to as Zhang et al.

Regarding claim 13, Modi et al., as modified by Agrawal et al., discloses the method of claim 11, but not wherein the modifying the set of training data comprises inserting additional training data into the set of training data. Zhang et al. is cited to disclose wherein the modifying the set of training data comprises inserting additional training data into the set of training data (“It should be noted that after implementing the embodiments of the present invention to obtain attack samples (adversarial samples) that successfully perform evasion attacks, in the decision tree training process, by adding the adversarial samples to the training data set, the improvement of the decision tree can be significantly improved safety,” Zhang et al., highlight at on p. 10 of attached translation pdf.). Zhang et al. benefits Modi et al. by incorporating adversarial machine learning to prevent attackers from inferring sensitive information from training data and target models (Zhang et al., Background). Therefore, it would be obvious for one skilled in the art to combine the teachings of Modi et al. with those of Zhang et al. to enhance the security of Modi et al. 

Allowable Subject Matter
Claims 14-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Other prior art is noted on attached PTO-892. In particular, the examiner notes Feng et al. which is a seminal work on Multi-Layered Gradient Boosting Decision Trees.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on 5712727453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656