DETAILED ACTION
This action is in response to the claims filed 05/19/2022 for application 15/959,040. Claims 1, 2, 12-14, and 18-20 have been amended and claim 11 has been canceled. Claims 1-10 and 12-20 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/19/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2-10 and 15-17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 2 recites the limitation "the one or more machine learning models".  There is insufficient antecedent basis for this limitation in the claim.


Claims 3-10 and 15-17 are rejected as being dependent on a rejected base claim without curing any of the deficiencies.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 12, 13, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro et al. ("“Why Should I Trust You?” Explaining the Predictions of Any Classifier", hereinafter "Ribeiro") in view of Gupta et al. ("US 20190156216 A1" cited by Applicant in the IDS filed 05/19/2022, hereinafter, "Gupta") and further in view of Katuwal et al. ("Machine Learning Model Interpretability for Precision Medicine", hereinafter "Katuwal").

Regarding claim 1, Ribeiro teaches A method, comprising: 
receiving an indication of a selection of an entry associated with a linear surrogate machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2), wherein the linear surrogate machine learning model approximates an output associated with a machine learning model (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2]); 
in response to the received indication of the selection of the entry associated with the linear surrogate machine learning model (“We want to minimize the locality-aware loss L(f, g, πx) without making any assumptions about f, since we want the explainer to be model-agnostic. Thus, in order to learn the local behavior of f as the interpretable inputs vary, we approximate L(f, g, πx) by drawing samples, weighted by πx. We sample instances around x 0 by drawing nonzero elements of x 0 uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z 0 ∈ {0, 1} d 0 (which contains a fraction of the nonzero elements of x 0 ), we recover the sample in the original representation z ∈ R d and obtain f(z), which is used as a label for the explanation model. Given this dataset Z of perturbed samples with the associated labels, we optimize Eq. (1) to get an explanation ξ(x).” [pg. 3, § 3.3 Sampling for Local Exploration, ¶1]), dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]),
a first non-linear surrogate machine learning model (“Second, our choice of G (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation” [pg. 4 pg. 4, § 3.4 Sparse Linear Explanations, ¶2])
retraining the linear surrogate machine learning model and/or the first non-linear surrogate machine learning model in response to a determination that the output associated with the linear surrogate machine learning model does not correlate with the output associated with the first non-linear surrogate model for more than a threshold number of entries (“If one notes that a classifier is untrustworthy, a common task in machine learning is feature engineering, i.e. modifying the set of features and retraining in order to improve generalization. Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize. We use the 20 newsgroups data here as well, and ask Amazon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training, for B = 10 instances with K = 10 words in each explanation (an interface similar to Figure 2, but with a single algorithm).” [pg. 8, §6.3, ¶1-2; Implies retraining a linear/non-linear surrogate model. See further: pgs. 8-9, §6.4, ¶1-2; The experiment proceeds as follows: we first present a balanced set of 10 test predictions (without explanations), where one wolf is not in a snowy background (and thus the prediction is “Husky”) and one husky is (and is thus predicted as “Wolf”). We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions. Since this task requires some familiarity with the notion of spurious correlations and generalization]).
However Ribeiro fails to explicitly teach wherein dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models includes updating a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model; 
determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Gupta teaches wherein dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models includes updating a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model (“Yet another existing approach to model interpretation includes training surrogate models. In this approach, a decision tree is learned using the training data for the model, where instead of predicting the true classification of the data, the decision tree is trained to predict the classification that is predicted by the model itself. Then, the paths of the tree are output in the form of rules that the model used in making the predictions. However, surrogate models, such as decision trees, have a single root node, and hence all rules extracted from such trees by definition include the root node attribute in their description. Further, even on a relatively simple data set decision trees can become complex with paths that span several features. This can lead to rules that have a large number of unintelligible feature value pairs.” [¶0023; See further: ¶0040 discloses, Instead, a given instance of the input data is perturbed (i.e., the input data is varied), and a locally faithful linear model is trained in the locality of the instance. The weights (probabilities) of the different features then approximate the marginal contribution values. An optimization to speed up computation can be performed by excluding all instances in the input data that are already covered by the conditions obtained from a particular instance of the training data. In other words, the algorithm can consider instances of the data that are not yet covered by one of the conditions generated from the training data. Each condition includes a single feature, and a value for categorical features or a range of values for numerical features. The result after considering all instances is an exhaustive list of conditions that were important at each instance level, including subsets of the conditions that were important in classifying instances of that class.])
Ribeiro and Gupta are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings by implementing a decision tree surrogate model to indicate a path associated with an entry as taught by Gupta. One would have been motivated to make this modification in order to gain a better interpretation of the model. [¶0022, Gupta]
Ribeiro/Gupta fails to explicitly teach determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Katuwal teaches determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model (“A test subject was randomly selected to demonstrate the individual-level model interpretation derived from LIME. The non-linear decision function of the RF model is approximated by a sparse linear model in the neighborhood of the test patient (red “X”; see Fig. 2). At first, perturbed data points or instances are created around the test patient X. Then, a sparse linear model is fitted on the RF model’s prediction for these perturbed instances where prediction of each perturbed instance is weighted inversely with its distance from the test patient X. Finally using the sparse linear model unique to each patient, an explanation containing important features’ contribution during the decision process of the RF model for the patient is extracted” pg. 2, bottom left col – top right col; See also Figure 2.]);
Ribeiro, Gupta, and Katuwal are all in the same field of endeavor of model interpretation, thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s teachings by determining the correlation between the outputs of the linear/non-linear models as taught by Katuwal. One would have been motivated to make this modification in order to find simple and truthful explanations of the decision process of complex models. [pg. 3, right col, top para, Katuwal]

Regarding claim 2, Ribeiro/Gupta/Katuwal teaches The method of claim 1, wherein the one or more machine learning models include one or more non-linear surrogate models (“For example, a model that predicts sepia-toned images to be retro cannot be explained by presence of absence of super pixels. Second, our choice of G (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation.” [pg. 4, § 3.4 Sparse Linear Explanations, ¶2]).

Regarding claim 12, Ribeiro/Gupta/Katuwal teaches The method of claim 1, wherein a plurality of branches associated with the decision tree surrogate model are based on input data associated with the machine learning model, wherein the input data associated with the machine learning model includes a plurality of entries (“We use two sentiment analysis datasets (books and DVDs, 2000 instances each) where the task is to classify product reviews as positive or negative. We train decision trees (DT), logistic regression with L2 regularization (LR), nearest neighbors (NN), and support vector machines with RBF kernel (SVM), all using bag of words as features. We also include random forests (with 1000 trees) trained with the average word2vec embedding (RF), a model that is impossible to interpret without a technique like LIME. We use the implementations and default parameters of scikitlearn, unless noted otherwise. We divide each dataset into train (1600 instances) and test (400 instances)” [pg. 6, § 5.1 Experiment Setup, ¶1]), wherein each entry has a one or more features and one or more corresponding feature values (“Intuitively, we want I such that features that explain many different instances have higher importance scores. In Figure 5, we show a toy example W, with n = d 0 = 5, where W is binary (for simplicity). The importance function I should score feature f2 higher than feature f1, i.e. I2 > I1, since feature f2 is used to explain more instances” [pg. 4. Submodular Pick for Explaining Models, ¶4; importance scores would be equivalent to feature values]).

Regarding claim 13, Ribeiro/Gupta/Katuwal teaches The method of claim 11, wherein dynamically updating the one or more interpretation views associated with the one or more machine learning models includes highlighting a path of the decision tree surrogate model, wherein the highlighted path is specific to the selected entry (“We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed). The explanations for each classifier are then presented to a set of 5 users in a new round of interaction, which results in 50 new classifiers. We do a final round, after which we have 250 classifiers, each with a path of interaction tracing back to the first 10 subjects. The explanations and instances shown to each user are produced by SP-LIME or RP-LIME. We show the average accuracy on the religion dataset at each interaction round for the paths originating from each of the original 10 subjects (shaded lines), and the average across all paths (solid lines) in Figure 10. It is clear from the figure that the crowd workers are able to improve the model by removing features they deem unimportant for the task” [pg. 8, § 6.3 Can non-experts improve a classifier, ¶2-3]).

Regarding claim 19, Ribeiro teaches A system, comprising: 
a processor configured to: 
receive an indication of a selection of an entry associated with a linear surrogate machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2), wherein the linear surrogate machine learning model approximates an output associated with a machine learning model (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2]); 
in response to the received indication of the selection of the entry associated with the linear surrogate machine learning model (“We want to minimize the locality-aware loss L(f, g, πx) without making any assumptions about f, since we want the explainer to be model-agnostic. Thus, in order to learn the local behavior of f as the interpretable inputs vary, we approximate L(f, g, πx) by drawing samples, weighted by πx. We sample instances around x 0 by drawing nonzero elements of x 0 uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z 0 ∈ {0, 1} d 0 (which contains a fraction of the nonzero elements of x 0 ), we recover the sample in the original representation z ∈ R d and obtain f(z), which is used as a label for the explanation model. Given this dataset Z of perturbed samples with the associated labels, we optimize Eq. (1) to get an explanation ξ(x).” [pg. 3, § 3.3 Sampling for Local Exploration, ¶1]), dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]),
a first non-linear surrogate machine learning model (“Second, our choice of G (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation” [pg. 4 pg. 4, § 3.4 Sparse Linear Explanations, ¶2])
retraining the linear surrogate machine learning model and/or the first non-linear surrogate machine learning model in response to a determination that the output associated with the linear surrogate machine learning model does not correlate with the output associated with the first non-linear surrogate model for more than a threshold number of entries (“If one notes that a classifier is untrustworthy, a common task in machine learning is feature engineering, i.e. modifying the set of features and retraining in order to improve generalization. Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize. We use the 20 newsgroups data here as well, and ask Amazon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training, for B = 10 instances with K = 10 words in each explanation (an interface similar to Figure 2, but with a single algorithm).” [pg. 8, §6.3, ¶1-2; Implies retraining a linear/non-linear surrogate model. See further: pgs. 8-9, §6.4, ¶1-2; The experiment proceeds as follows: we first present a balanced set of 10 test predictions (without explanations), where one wolf is not in a snowy background (and thus the prediction is “Husky”) and one husky is (and is thus predicted as “Wolf”). We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions. Since this task requires some familiarity with the notion of spurious correlations and generalization]).
a memory coupled to the processor and configured to provide the processor with instructions (“In practice, explaining random forests with 1000 trees using scikit-learn (http://scikit-learn.org) on a laptop with N = 5000 takes under 3 seconds without any optimizations such as using gpus or parallelization” [pg. 4, §3.4 Sparse Linear Explanations, ¶2; processors and memory are implicit.])
However Ribeiro fails to explicitly teach wherein to dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models, the processor is configured to update a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model; 
determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Gupta teaches wherein to dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models, the processor is configured to update a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model (“Yet another existing approach to model interpretation includes training surrogate models. In this approach, a decision tree is learned using the training data for the model, where instead of predicting the true classification of the data, the decision tree is trained to predict the classification that is predicted by the model itself. Then, the paths of the tree are output in the form of rules that the model used in making the predictions. However, surrogate models, such as decision trees, have a single root node, and hence all rules extracted from such trees by definition include the root node attribute in their description. Further, even on a relatively simple data set decision trees can become complex with paths that span several features. This can lead to rules that have a large number of unintelligible feature value pairs.” [¶0023; See further: ¶0040 discloses, Instead, a given instance of the input data is perturbed (i.e., the input data is varied), and a locally faithful linear model is trained in the locality of the instance. The weights (probabilities) of the different features then approximate the marginal contribution values. An optimization to speed up computation can be performed by excluding all instances in the input data that are already covered by the conditions obtained from a particular instance of the training data. In other words, the algorithm can consider instances of the data that are not yet covered by one of the conditions generated from the training data. Each condition includes a single feature, and a value for categorical features or a range of values for numerical features. The result after considering all instances is an exhaustive list of conditions that were important at each instance level, including subsets of the conditions that were important in classifying instances of that class.])
Ribeiro and Gupta are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings by implementing a decision tree surrogate model to indicate a path associated with an entry as taught by Gupta. One would have been motivated to make this modification in order to gain a better interpretation of the model. [¶0022, Gupta]
Ribeiro/Gupta fails to explicitly teach determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Katuwal teaches determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model (“A test subject was randomly selected to demonstrate the individual-level model interpretation derived from LIME. The non-linear decision function of the RF model is approximated by a sparse linear model in the neighborhood of the test patient (red “X”; see Fig. 2). At first, perturbed data points or instances are created around the test patient X. Then, a sparse linear model is fitted on the RF model’s prediction for these perturbed instances where prediction of each perturbed instance is weighted inversely with its distance from the test patient X. Finally using the sparse linear model unique to each patient, an explanation containing important features’ contribution during the decision process of the RF model for the patient is extracted” pg. 2, bottom left col – top right col; See also Figure 2.]);
Ribeiro, Gupta, and Katuwal are all in the same field of endeavor of model interpretation, thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s teachings by determining the correlation between the outputs of the linear/non-linear models as taught by Katuwal. One would have been motivated to make this modification in order to find simple and truthful explanations of the decision process of complex models. [pg. 3, right col, top para, Katuwal]

Regarding claim 20, Ribeiro teaches A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for (“In practice, explaining random forests with 1000 trees using scikit-learn (http://scikit-learn.org) on a laptop with N = 5000 takes under 3 seconds without any optimizations such as using gpus or parallelization” [pg. 4, 3.4 Sparse Linear Explanations, ¶2; processors and memory are implicit.]): 
receiving an indication of a selection of an entry associated with a linear surrogate machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2), wherein the linear surrogate machine learning model approximates an output associated with a machine learning model (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2]); 
in response to the received indication of the selection of the entry associated with the linear surrogate machine learning model (“We want to minimize the locality-aware loss L(f, g, πx) without making any assumptions about f, since we want the explainer to be model-agnostic. Thus, in order to learn the local behavior of f as the interpretable inputs vary, we approximate L(f, g, πx) by drawing samples, weighted by πx. We sample instances around x 0 by drawing nonzero elements of x 0 uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z 0 ∈ {0, 1} d 0 (which contains a fraction of the nonzero elements of x 0 ), we recover the sample in the original representation z ∈ R d and obtain f(z), which is used as a label for the explanation model. Given this dataset Z of perturbed samples with the associated labels, we optimize Eq. (1) to get an explanation ξ(x).” [pg. 3, § 3.3 Sampling for Local Exploration, ¶1]), dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]),
a first non-linear surrogate machine learning model (“Second, our choice of G (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation” [pg. 4 pg. 4, § 3.4 Sparse Linear Explanations, ¶2])
retraining the linear surrogate machine learning model and/or the first non-linear surrogate machine learning model in response to a determination that the output associated with the linear surrogate machine learning model does not correlate with the output associated with the first non-linear surrogate model for more than a threshold number of entries (“If one notes that a classifier is untrustworthy, a common task in machine learning is feature engineering, i.e. modifying the set of features and retraining in order to improve generalization. Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize. We use the 20 newsgroups data here as well, and ask Amazon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training, for B = 10 instances with K = 10 words in each explanation (an interface similar to Figure 2, but with a single algorithm).” [pg. 8, §6.3, ¶1-2; Implies retraining a linear/non-linear surrogate model. See further: pgs. 8-9, §6.4, ¶1-2; The experiment proceeds as follows: we first present a balanced set of 10 test predictions (without explanations), where one wolf is not in a snowy background (and thus the prediction is “Husky”) and one husky is (and is thus predicted as “Wolf”). We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions. Since this task requires some familiarity with the notion of spurious correlations and generalization]).
However Ribeiro fails to explicitly teach wherein dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models includes updating a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model; 
determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Gupta teaches wherein dynamically updating one or more interpretation views associated with one or more other surrogate machine learning models includes updating a first surrogate machine learning model comprising a decision tree surrogate model that indicates a path associated with the selected entry associated with the linear surrogate machine learning model through the updated decision tree surrogate model (“Yet another existing approach to model interpretation includes training surrogate models. In this approach, a decision tree is learned using the training data for the model, where instead of predicting the true classification of the data, the decision tree is trained to predict the classification that is predicted by the model itself. Then, the paths of the tree are output in the form of rules that the model used in making the predictions. However, surrogate models, such as decision trees, have a single root node, and hence all rules extracted from such trees by definition include the root node attribute in their description. Further, even on a relatively simple data set decision trees can become complex with paths that span several features. This can lead to rules that have a large number of unintelligible feature value pairs.” [¶0023; See further: ¶0040 discloses, Instead, a given instance of the input data is perturbed (i.e., the input data is varied), and a locally faithful linear model is trained in the locality of the instance. The weights (probabilities) of the different features then approximate the marginal contribution values. An optimization to speed up computation can be performed by excluding all instances in the input data that are already covered by the conditions obtained from a particular instance of the training data. In other words, the algorithm can consider instances of the data that are not yet covered by one of the conditions generated from the training data. Each condition includes a single feature, and a value for categorical features or a range of values for numerical features. The result after considering all instances is an exhaustive list of conditions that were important at each instance level, including subsets of the conditions that were important in classifying instances of that class.])
Ribeiro and Gupta are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings by implementing a decision tree surrogate model to indicate a path associated with an entry as taught by Gupta. One would have been motivated to make this modification in order to gain a better interpretation of the model. [¶0022, Gupta]
Ribeiro/Gupta fails to explicitly teach determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model;
Katuwal teaches determining whether an output associated with the linear surrogate machine learning model correlates with an output associated with the first non-linear surrogate model (“A test subject was randomly selected to demonstrate the individual-level model interpretation derived from LIME. The non-linear decision function of the RF model is approximated by a sparse linear model in the neighborhood of the test patient (red “X”; see Fig. 2). At first, perturbed data points or instances are created around the test patient X. Then, a sparse linear model is fitted on the RF model’s prediction for these perturbed instances where prediction of each perturbed instance is weighted inversely with its distance from the test patient X. Finally using the sparse linear model unique to each patient, an explanation containing important features’ contribution during the decision process of the RF model for the patient is extracted” pg. 2, bottom left col – top right col; See also Figure 2.]);
Ribeiro, Gupta, and Katuwal are all in the same field of endeavor of model interpretation, thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s teachings by determining the correlation between the outputs of the linear/non-linear models as taught by Katuwal. One would have been motivated to make this modification in order to find simple and truthful explanations of the decision process of complex models. [pg. 3, right col, top para, Katuwal]

Claims 3-7 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Gupta and Katuwal and further in view of Lei et al. ("Distribution-Free Predictive Inference For Regression", hereinafter "Lei").
Regarding claim 3, Ribeiro/Gupta/Katuwal teaches The method of claim 2, however fails to explicitly teach wherein one of the one or more non-linear surrogate models includes a feature importance model.
Lei teaches wherein one of the one or more non-linear surrogate models includes a feature importance model (“In this section, we discuss the problem of estimating the importance of each variable in a prediction model… First, our method is not limited to linear regression. Second, the spirit of our approach is to focus on predictive quantities and we want to measure variable importance directly in terms of prediction.” [pg. 32, § 6 Model-Free Variable Importance: LOCO, ¶1]).
Ribeiro, Gupta, Katuwal and Lei are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings, in particular, Ribeiro’s non-linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 4, Ribeiro/Gupta/Katuwal/Lei teaches The method of claim 3, where Lei further teaches wherein the feature importance model is configured to output one or more features, wherein the one or more features have a corresponding global feature importance value (“For a more global measure of variable importance, we can focus on the distribution of ∆j (Xn+1, Yn+1), marginally over (Xn+1, Yn+1). We rely on a splitting approach, where the index set used for the training of µ and µ(−j) is I1 ( {1, . . . , n}, a proper subset.” [pg. 36, 6.2 Global Measures of Variable Importance, ¶1]) and a corresponding local feature importance value (“As with the guarantees from conformal inference, the coverage statement (15) is marginal over Xn+1, and in general, does not hold conditionally at Xn+1 = x. But, to summarize the effect of covariate j, we can still plot the intervals Wj(Xi) for i = 1, . . . , n, and loosely interpret these as making local statements about variable importance” [pg. 34, § 6.1 Local Measure of Variable Importance, ¶3]).
Ribeiro, Gupta, Katuwal and Lei are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings, in particular, Ribeiro’s non-linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 5, Ribeiro/Gupta/Katuwal/Lei teaches The method of claim 4, where Ribeiro teaches wherein the corresponding global feature importance value associated with a feature is based at least in part on a number of times the feature is used in a random forest model (“We train decision trees (DT), logistic regression with L2 regularization (LR), nearest neighbors (NN), and support vector machines with RBF kernel (SVM), all using bag of words as features. We also include random forests (with 1000 trees) trained with the average word2vec embedding (RF), a model that is impossible to interpret without a technique like LIME. We use the implementations and default parameters of scikitlearn, unless noted otherwise. We divide each dataset into train (1600 instances) and test (400 instances).” [pg. 6, § 5.1 Experiment Setup, ¶1; See §5.4 Can I trust this model, ¶1 discloses random forest model]).

Regarding claim 6, Ribeiro/Gupta/Katuwal/Lei teaches The method of claim 5, where Lei teaches wherein the corresponding global feature importance value associated with the feature is based at least in part on a level of the random forest model that the feature was used to split the random forest model (“For a more global measure of variable importance, we can focus on the distribution of ∆j (Xn+1, Yn+1), marginally over (Xn+1, Yn+1). We rely on a splitting approach, where the index set used for the training of µ and µ(−j) is I1 ( {1, . . . , n}, a proper subset. Denote by I2 its complement, and by Dk = {(Xi , Yi) : i ∈ Ik}, k = 1, 2 the data samples in each index set.” [pg. 36, § 6.2 Global Measures of Variable Importance, ¶1; Lei further discloses using random forest models: “The only exception is the random forest estimator, which gave stable errors over a variety of tuning choices; hence it is represented by a single point in each plot (corresponding to 500 trees in the low-dimensional problems, and 1000 trees in the high-dimensional problems). All curves in the figures represent an average over 50 repetitions, and error bars indicating the standard errors. In all cases, we used the split conformal method for computational efficiency.” [pg. 24, ¶2]]).
Ribeiro, Gupta, Katuwal and Lei are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings, in particular, Ribeiro’s non-linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 7, Ribeiro/Gupta/Katuwal/Lei teaches The method of claim 4, where Lei teaches wherein the corresponding local feature importance value is computed using a leave-one-covariate out mechanism (“Our proposal, leave-one-covariate-out or LOCO inference, proceeds as follows” [pg. 32, 6 Model-Free Variable Importance: LOCO, ¶2]; See further §6.1).
Ribeiro, Gupta, Katuwal and Lei are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings, in particular, Ribeiro’s non-linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Gupta, Katuwal, and Lei and further in view of Lee et al. ("US 20150379429 A1", cited by Applicant in the IDS filed 02/17/2022, hereinafter "Lee").

Regarding claim 8, Ribeiro/Gupta/Katuwal/Lei teaches The method of claim 4, further comprising: 
Ribeiro teaches comparing the corresponding global feature importance value associated with a feature with the corresponding local feature importance value associated with the feature (“Another essential criterion is local fidelity. Although it is often impossible for an explanation to be completely faithful unless it is the complete description of the model itself, for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted. We note that local fidelity does not imply global fidelity: features that are globally important may not be important in the local context, and vice versa. While global fidelity would imply local fidelity, identifying globally faithful explanations that are interpretable remains a challenge for complex models.” [pg. 3, top left col, ¶2; implies comparing global and local feature importance values]); and 
However Ribeiro/Gupta/Katuwal/Lei fails to explicitly teach determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value
Lee teaches determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value (“To avoid such undesirable scenarios, a technique for pruning selected parameters may be employed in some embodiments. According to such a technique, when certain triggering conditions are met (e.g., when the number of features for which parameters are stored exceeds a threshold), a fraction of the features that contribute least to the models' predictions may be identified as pruning victims. An efficient in-memory technique to estimate quantile boundary values (e.g., the 20% of the features that contribute the least to the model's predictions) for parameters may be used in some embodiments, without requiring copying of the parameters or an explicit sort operation.” [¶0257]).
Ribeiro, Gupta, Katuwal, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Regarding claim 9, Ribeiro/Gupta/Katuwal/Lei/Lee teaches The method of claim 8, where Lee teaches in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal to a threshold value, investigating the feature importance model (“According to such a technique, when certain triggering conditions are met (e.g., when the number of features for which parameters are stored exceeds a threshold), a fraction of the features that contribute least to the models' predictions may be identified as pruning victims. An efficient in-memory technique to estimate quantile boundary values (e.g., the 20% of the features that contribute the least to the model's predictions) for parameters may be used in some embodiments, without requiring copying of the parameters or an explicit sort operation. Entries (e.g., parameter values) for the pruning victims identified may be removed from the feature set 5025, thus reducing the memory consumed. However, additional learning iterations may be performed even after pruning some features. Thus, the feature set size may grow and shrink repeatedly as more observation records are considered, more features are added, and more features are pruned.” [¶0257; additional learning/techniques would correspond to “investigating”.]).
Ribeiro, Gupta, Katuwal, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Regarding claim 10, Ribeiro/Gupta/Katuwal/Lei/Lee teaches The method of claim 8, where Lee teaches in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is less than a threshold value, forgoing an investigation of the feature importance model (“At least some of the parameter vector entries may be removed based on the adjusted weights in some embodiments (element 6116). For example, entries whose weights fall below a rejection threshold may be removed. In some embodiments, an efficient quantile boundary estimation technique similar to that discussed in the context of FIG. 52 and FIG. 54 may be applied to the absolute values of the feature weights, and parameter vector entries whose weights fall in the lowest X % may be removed.” [¶0305; removing entries would imply forgoing an investigation.]).
Ribeiro, Gupta, Katuwal, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Gupta and Katuwal and further in view of Lee.

Regarding claim 14, Ribeiro/Gupta/Katuwal teaches The method of claim 1, however fails to explicitly teach wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model.
Lee teaches wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model (“For example, in a greedy pruning technique 3650, the unpruned tree 3604 may be analyzed in a top-down fashion, selecting the path that leads to the node with the highest PUM value at each split in the tree. The cumulative PUM values of the nodes encountered during the greedy top-down traversal may be tracked, as well as the total number of nodes encountered.” [¶0213]).
Ribeiro, Gupta, Katuwal, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings in order to find the frequency of a path used by a decision tree as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Gupta and Katuwal and further in view of Goldstein et al. ("Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation", hereinafter "Goldstein").

Regarding claim 15, Ribeiro/Gupta/Katuwal teaches The method of claim 2, however fails to explicitly teach wherein one of the one or more non-linear surrogate models includes a partial dependence plot.
Goldstein teaches wherein one of the one or more non-linear surrogate models includes a partial dependence plot (“The resulting graphic, which is called a partial dependence plot, displays the average value of ˆf as a function of xS. For the remainder of the paper we consider a single predictor of interest at a time (|S| = 1) and write xS without boldface accordingly” [pg. 4, top para])
Ribeiro, Gupta, Katuwal, and Goldstein are all in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]
Regarding claim 16, Ribeiro/Gupta/Katuwal/Goldstein teaches The method of claim 15, where Goldstein teaches wherein the partial dependence plot indicates a dependence of a prediction label of the partial dependence plot on a feature having a particular value (“The goal of this article is to present Individual Conditional Expectation (ICE) plots, a toolbox for visualizing models produced by “black box” algorithms. These algorithms use training data {xi, yi}Ni=1 (where xi = (xi,1, . . . , xi,p) is a vector of predictors and yi is the response) to construct a model ˆf that maps the features x to fitted values ˆf(x). Though these algorithms can produce fitted values that enjoy low generalization error, it is often difficult to understand how the resultant ˆf uses x to generate predictions. The ICE toolbox helps visualize this mapping. [pg. 1-2, § 1 Introduction, ¶1]).
Ribeiro, Gupta, Katuwal, and Goldstein are all in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]

Regarding claim 17, Ribeiro/Gupta/Katuwal/Goldstein teaches The method of claim 15, where Goldstein teaches wherein the partial dependence plot indicates an average 25prediction label based on all entries associated with the partial dependence plot having a corresponding feature with a same particular value (“Each subset of predictors S has its own partial dependence function fS, which gives the average value of f when xS is fixed and xC varies over its marginal distribution dP (xC). As neither the true f nor dP (xC) are known, we estimate Equation 1 by computing where {xC1, ..., xCN} represent the different values of xC that are observed in the training data. Note that the approximation here is twofold: we estimate the true model with ˆf, the output of a statistical learning algorithm, and we estimate the integral over xC by averaging over the N xC values observed in the training set. [pg. 3, § 2.2 Friedman’s PDP, ¶3; xs is fixed thus corresponds to a same particular value.]).
Ribeiro, Gupta, Katuwal, and Goldstein are all in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Gupta, Katuwal, and Lei and further in view of Goldstein.

Regarding claim 18, Ribeiro/Gupta/Katuwal teaches The method of claim 1, wherein the one or more interpretation views associated with one or more machine learning models includes, a view associated with a decision tree surrogate model (“Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule list” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶1]), 
Ribeiro/Gupta/Katuwal fails to explicitly teach a view associated with a feature importance surrogate model and a view associated with a partial dependence plot.
	Lei teaches a view associated with a feature importance surrogate model (“In this section, we discuss the problem of estimating the importance of each variable in a prediction model… First, our method is not limited to linear regression. Second, the spirit of our approach is to focus on predictive quantities and we want to measure variable importance directly in terms of prediction.” [pg. 32, § 6 Model-Free Variable Importance: LOCO, ¶1])
Ribeiro, Gupta, Katuwal and Lei are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s teachings, in particular, Ribeiro’s non-linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]
Ribeiro/Gupta/Katuwal/Lei fails to explicitly teach and a view associated with a partial dependence plot
Goldstein teaches and a view associated with a partial dependence plot (“The resulting graphic, which is called a partial dependence plot, displays the average value of ˆf as a function of xS. For the remainder of the paper we consider a single predictor of interest at a time (|S| = 1) and write xS without boldface accordingly” [pg. 4, top para]).
Ribeiro, Gupta, Katuwal, Lei, and Goldstein are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Gupta discloses a method for generating rules that globally explains the behavior of a machine learning model. Katuwal teaches a method of model interpretability for precision medicine. Lei discloses a variable importance method called LOCO inference. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Gupta’s/Katuwal’s/Lei’s teachings to further implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]
Response to Arguments
Regarding the 35 U.S.C. 112(b) Rejection:
Applicant’s amendments to claim 2 appear to overcome the previous 112(b) rejection, however the amendments to claim 1 appear to have caused an antecedent basis issue in claim 2. Therefore, a new 112(b) rejection is now applied. 

Regarding the 35 U.S.C. § 101 Rejection:
Applicant’s amendments to claims 1, 19, and 20 have overcome the 101 rejection. Therefore, the previous rejection has been withdrawn. 

Regarding the 35 U.S.C. § 102/103 Rejections:
Applicant’s arguments regarding the Hall reference qualifying as an exception under 35 U.S.C. 102(b)(1) are not persuasive. Applicant is required to file an affidavit or declaration to establish that a disclosure is not prior art under 35 U.S.C. 102(a) due to an exception in 35 U.S.C. 102(b). For further details, please refer to MPEP §2153.01(a).

Applicant’s arguments regarding the prior art of Ribeiro failing to disclose the following limitation “in response to the receive indication of the selection of the entry associated with the linear surrogate machine learning model, dynamically updating one or more interpretations views associated with one or more other surrogate machine learning models” has been considered but are not persuasive. Ribeiro appears to teach these features on (pg. 3, § 3.2 - 3.3). Please see the 103 rejection above.

Applicant’s arguments regarding the other newly amended features to Claims 1, 19, and 20 have been considered but are moot because these limitations are now taught by the newly presented arts of Gupta and Katuwal. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.


Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        


/BRIAN M SMITH/Primary Examiner, Art Unit 2122