DETAILED ACTION
This action is in response to the claims filed 04/20/2018 for application 15/959,040. Claims 1-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 11/05/2019, 01/21/2022, and 02/17/2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 3 recites the limitation "the one or more non-linear surrogate models" in line 9.  There is insufficient antecedent basis for this limitation in the claim.


Claims 4-10 are rejected as being dependent on a rejected base claim without curing any of the deficiencies.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part, receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry. The limitations of receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “a machine learning model” and “one or more machine learning models”. These elements that are recited are only generally linked to the judicial exception. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing one or more machine learning models to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. The claim is not patent eligible.  

Regarding claim 2, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the one or more machine learning models include one or more non-linear models. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 3, the rejection of claim 2 is further incorporated, and further, the claim recites: wherein one of the one or more non-linear surrogate models includes a feature importance model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 4, the rejection of claim 3 is further incorporated, and further, the claim recites: wherein the feature importance model is configured to output one or more features, wherein the one or more features have a corresponding global feature importance value and a corresponding local feature importance value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 5, the rejection of claim 4 is further incorporated, and further, the claim recites: wherein the corresponding global feature importance value associated with a feature is based at least in part on a number of times the feature is used in a random forest model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 6, the rejection of claim 5 is further incorporated, and further, the claim recites: wherein the corresponding global feature importance value associated with the feature is based at least in part on a level of the random forest model that the feature was used to split the random forest model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 7, the rejection of claim 4 is further incorporated, and further, the claim recites wherein the corresponding local feature importance value is computed using a leave-one-covariate out mechanism. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 8, the rejection of claim 4 is further incorporated, and further, the claim recites comparing the corresponding global feature importance value associated with a feature with the corresponding local feature importance value associated with the feature; and determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 9, the rejection of claim 8 is further incorporated, and further, the claim recites in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal to a threshold value, investigating the feature importance model. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 10, the rejection of claim 8 is further incorporated, and further, the claim recites in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is less than a threshold value, forgoing an investigation of the feature importance model. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 11, the rejection of claim 2 is further incorporated, and further, the claim recites wherein one of the one or more non-linear surrogate models includes a decision tree surrogate model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 12, the rejection of claim 11 is further incorporated, and further, the claim recites wherein a plurality of branches associated with the decision tree surrogate model are based on input data associated with the machine learning model, wherein the input data associated with the machine learning model includes a plurality of entries, wherein each entry has a one or more features and one or more corresponding feature values. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 13, the rejection of claim 11 is further incorporated, and further, the claim recites wherein dynamically updating the one or more interpretation views associated with the one or more machine learning models includes highlighting a path of the decision tree surrogate model, wherein the highlighted path is specific to the selected entry. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 14, the rejection of claim 11 is further incorporated, and further, the claim recites wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 15, the rejection of claim 2 is further incorporated, and further, the claim recites wherein one of the one or more non-linear surrogate models includes a partial dependence plot. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 16, the rejection of claim 15 is further incorporated, and further, the claim recites wherein the partial dependence plot indicates a dependence of a prediction label of the partial dependence plot on a feature having a particular value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 17, the rejection of claim 15 is further incorporated, and further, the claim recites wherein the partial dependence plot indicates an average prediction label based on all entries associated with the partial dependence plot having a corresponding feature with a same particular value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 18, the rejection of claim 1 is further incorporated, and further, the claim recites wherein the one or more interpretation views associated with one or more machine learning models includes a view associated with a feature importance surrogate model, a view associated with a decision tree surrogate model, and a view associated with a partial dependence plot. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. 

Regarding claim 19, 
Step 1 Analysis: Claim 19 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 19 recites, in part, receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry. The limitations of receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “a machine learning model” and “one or more machine learning models”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “processor” and “memory”. Thus, these elements in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing one or more machine learning models to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the processor and memory amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Regarding claim 20, 
Step 1 Analysis: Claim 20 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 20 recites, in part, receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry. The limitations of receiving an indication of a selection of an entry and dynamically updating one or more interpretation views based on the selected entry, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements – “a machine learning model” and “one or more machine learning models”. These elements that are recited are only generally linked to the judicial exception. Additionally, the claim recites the – “computer program product” and “non-transitory computer readable storage medium”. Thus, these elements in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing one or more machine learning models to perform the steps of the claimed process amount to no more than generally linking the elements to the judicial exception. Additionally, the computer program product and non-transitory computer readable storage medium amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-9 and 11-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Hall et al. ("Machine Learning Interpretability with H2O Driverless AI" cited by Applicant in the IDS filed 11/05/2019, hereinafter "Hall").

Regarding claim 1, Hall teaches A method, comprising: 
receiving an indication of a selection of an entry associated with a machine learning model (“When a single observation, x, is selected, its path through tree is highlighted. The path of x(i) through htree can be helpful when analyzing the logic or validity of g(x‘).” [pg. 12, ¶1]); and 
dynamically updating one or more interpretation views associated with one or more machine learning models based on the selected entry (“This atone is an obstacle to interpretation, but when using these types of algorithms as interpretation tools or with interpretation tools, it is important to remember that details of explanations can change across multiple accurate models. This instability of explanations is a driving factor behind the presentation of multiple explanatory results in Driverless Al, enabling users to find explanatory information that is consistent across multiple modeling and interpretation techniques.” [pg. 10, § The Multiplicity of Good Models, ¶1]).

Regarding claim 2, Hall teaches The method of claim 1, wherein the one or more machine learning models include one or more non-linear models (“Driverless Al provides both global and local explanations for complex, nonlinear, non-monotonic machine learning models.” [pg. 7, Scope, ¶4]).

Regarding claim 3, Hall teaches The method of claim 2, wherein one of the one or more non-linear surrogate models includes a feature importance model (“In Driverless Al, decision tree surrogate, ICE, K-LIME, and partial dependence are all model-agnostic techniques, whereas LOCO and random forest feature importance are model-specific techniques. [pg. 8, § Application Domain, ¶1]).

Regarding claim 4, Hall teaches The method of claim 3, wherein the feature importance model is configured to output one or more features, wherein the one or more features have a corresponding global feature importance value and a corresponding local feature importance value (“Feature importance measures the effect that a feature has on the predictions of a model. Global feature importance measures the overall impact of an input feature on the Driverless Al model predictions while taking nonlinearity and interactions into consideration. Global feature importance values give an indication of the magnitude of a feature’s contribution to model predictions for all observations. Unlike regression parameters, they are often unsigned and typically not directly related to the numerical predictions of the model. Local feature importance describes how the combination of the learned model rules or parameters and an individual observation's attributes affect a model’s prediction for that observation while taking nonlinearity and interactions into effect…Figure 9 displays the global and local feature importance values for the credit card default data, sorted in descending order from the globally most important feature to the globally least important feature. Local feature importance values are displayed under the global feature importance value for each feature.” [pg. 20-21, § Feature Importance, ¶1]).

Regarding claim 5, Hall teaches The method of claim 4, wherein the corresponding global feature importance value associated with a feature is based at least in part on a number of times the feature is used in a random forest model (“In figure 9, PAY_O, PAY_2, LIMIT_BAL, PAY_3, and BILL_AMT1 are the top 5 most important features globally. As expected, this result is well aligned with the results of the decision tree surrogate model discussed in section 2.2. Taking the results of two interpretability techniques into consideration, it is extremely likely that timing of the customer's first 3 payments, PAY_0, PAY-_2, and PAY-3, are the most important global features for any g(x) prediction.” [pg. 21, § Random Forest Feature Importance, ¶3]).

Regarding claim 6, Hall teaches The method of claim 5, wherein the corresponding global feature importance value associated with the feature is based at least in part on a level of the random forest model that the feature was used to split the random forest model (“Here Θb, is the set of splitting rules for each tree htree,b. As explained in [5], at each split in each tree htree,b, the improvement in the split-criterion is the importance measure attributed to the splitting feature. The importance measure is accumulated over all trees separately for each feature. The aggregated feature importance values are then scaled between 0 and 1, such that the most important feature has an importance value of 1.” [pg. 21, Random Forest Feature Importance, ¶2]).

Regarding claim 7, Hall teaches The method of claim 4, wherein the corresponding local feature importance value is computed using a leave-one-covariate out mechanism (“Leave-one-covariate-out (LOCO) provides a mechanism for calculating feature importance values for any model g on a per-observation basis x") by subtracting the model's prediction for an observation of data” [pg. 22, § LOCO Feature Importance, ¶1]).

Regarding claim 8, Hall teaches The method of claim 4, further comprising: 
comparing the corresponding global feature importance value associated with a feature with the corresponding local feature importance value associated with the feature (“Although LOCO feature importance values can be signed quantities, they are scaled between 0 and 1 such that the most important feature for an observation of data, x, has an importance value of 1 for direct global versus local comparison to random forest feature importance in Driverless Al.” [pg. 22, § LOCO Feature Importance, ¶1]); and 
determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value (“Living square footage, sqft_living, is linearly associated with increases in price with a correlation coefficient of greater than 0.6. There is also a linearly increasing trend between the number of bathrooms and the home price. The more bathrooms, the higher the home price. Hence, inputs related to square footage and the number of bathrooms are expected to be globally important in the Driverless Al model.” [pg. 32, Below Table 4]).

Regarding claim 9, Hall teaches The method of claim 8, in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal to a threshold value, investigating the feature importance model (“Continuing with the global versus local analysis, explanations for the most expensive home are considered briefly. In figure 31, the two features sqft_living and sqft_-above are the most important features locally along with bathrooms and yr-built. The data indicate the most expensive home has eight bathrooms, 12050 square feet of total square footage in which 8570 is allocated for living space (not including the basement), and the home was built in 1910. Following global explanations and reasonable expectations, this most expensive home has characteristics that justify it’s high prediction for price… For the selected homes, global and local explanations are reasonable when compared to one-another and to logical expectations. In practice, explanations for several different types of homes, and especially for outliers and other anomalous observations, should be investigated and analyzed to enhance understanding and trust in the Driverless Al model.” [pg. 38, Below Figure 31]).
Regarding claim 11, Hall teaches The method of claim 2, wherein one of the one or more non-linear surrogate models includes a decision tree surrogate model (“For the purposes of interpretation in Driverless Al, g is considered to represent the entire pipeline, including both the feature transformations and model, and the surrogate model is a decision tree (tree).” [pg. 11, § Decision Tree Surrogate Model, ¶2]).

Regarding claim 12, Hall teaches The method of claim 11, wherein a plurality of branches associated with the decision tree surrogate model are based on input data associated with the machine learning model (“Figure 4 displays the decision tree surrogate, htree, for an example probability of default model, g, created with Driverless Al using the UCI repository credit card default data [10]. The PAY-_O feature is likely the most important feature in g due to its place in the initial split in htree and its second occurrence on the third level of htree. First level interactions between PAY_O and PAY _2 and between PAY_O and PAY-_5 are visible along with several second level interactions. Following the decision path to the lowest probability leaf node in htree (figure 4 lower left) shows that customers who pay their first (PAY_0) and second (PAY-2) month bills on time are the least likely to default according to tree.” [pg. 12, Below Figure 4, See Figure 4]), wherein the input data associated with the machine learning model includes a plurality of entries, wherein each entry has a one or more features and one or more corresponding feature values (“Figure 9 displays the global and local feature importance values for the credit card default data, sorted in descending order from the globally most important feature to the globally least important feature. Local feature importance values are displayed under the global feature importance value for each feature. In figure 9, PAY_O, PAY_2, LIMIT_BAL, PAY_3, and BILL_AMT1 are the top 5 most important features globally. As expected, this result is well aligned with the results of the decision tree surrogate model discussed in section 2.2.” [pg. 21, ¶2]).

Regarding claim 13, Hall teaches The method of claim 11, wherein dynamically updating the one or more interpretation views associated with the one or more machine learning models includes highlighting a path of the decision tree surrogate model, wherein the highlighted path is specific to the selected entry (“When an observation of data is selected using the K-LIME plot, discussed in section 2.3, htree Can also provide a degree of local interpretability. When a single observation, x, is selected, its path through tree is highlighted. The path of x through htree can be helpful when analyzing the logic or validity of g(x‘).” [pg. 12, Below Figure 4]).

Regarding claim 14, Hall teaches The method of claim 11, wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model (“The thickness of the edges in this path indicate that this is a very common decision path through htree. [pg. 12, Below Figure 4]).

Regarding claim 15, Hall teaches The method of claim 2, wherein one of the one or more non-linear surrogate models includes a partial dependence plot (“The partial dependence plots show how different values of a feature affect the average prediction of the Driverless Al model. Figure 13 displays the partial dependence plot for sex and indicates that predicted survival increases dramatically for female passengers.” [pg. 27, § Partial Dependence Plots, ¶1]).

Regarding claim 16, Hall teaches The method of claim 15, wherein the partial dependence plot indicates a dependence of a prediction label of the partial dependence plot on a feature having a particular value (“Partial dependence plots show the partial dependence as a function of specific values of our feature subset Xj. The plots show how machine-learned response functions change based on the values of an input feature of interest, while taking nonlinearity into consideration and averaging out the effects of all other input features. Partial dependence plots enable increased transparency in g and enable the ability to validate and debug g by comparing a feature’s average predictions across its domain to known standards and reasonable expectations.” [pg. 18, § Partial Dependence and Individual Conditional Expectation, ¶4]).

Regarding claim 17, Hall teaches wherein the partial dependence plot indicates an average prediction label based on all entries associated with the partial dependence plot having a corresponding feature with a same particular value (“Equation 5 essentially states that the partial dependence of a given feature X, is the average of the response function g, setting the given feature Xj = xj and using all other existing feature vectors of the complement set xf) as they exist in the dataset… Partial dependence plots show the partial dependence as a function of specific values of our feature subset Xj. The plots show how machine-learned response functions change based on the values of an input feature of interest, while taking nonlinearity into consideration and averaging out the effects of all other input features. [pg. 18, § Partial Dependence and Individual Conditional Expectation, ¶4]).

Regarding claim 18, Hall teaches The method of claim 1, wherein the one or more interpretation views associated with one or more machine learning models includes a view associated with a feature importance surrogate model (“Currently in Driverless Al, a random forest surrogate model HRF consisting of B decision trees htree,b is trained on the predictions of the Driverless Al model.” [pg. 21, § Random Forest Feature Importance, ¶1]), a view associated with a decision tree surrogate model (“A surrogate model is a data mining and engineering technique in which a generally simpler model is used to explain another usually more complex model or phenomenon. Given our learned function g and set of predictions, g(X) = Y, '¥ we can train a surrogate model h: X,Y — h, such that h(X) = g(X) [2]. To preserve interpretability, the hypothesis set for A is often restricted to linear models or decision trees.” [pg. 11, § Decision Tree Surrogate Model, ¶1]), and a view associated with a partial dependence plot (“The partial dependence plots show how different values of a feature affect the average prediction of the Driverless Al model.” [pg. 27, Partial Dependence Plots, ¶1]).

Regarding claim 19, Hall teaches A system, comprising: 
a processor configured to: 
receive an indication of a selection of an entry associated with a machine learning model (“When a single observation, x, is selected, its path through tree is highlighted. The path of x(i) through htree can be helpful when analyzing the logic or validity of g(x‘).” [pg. 12, ¶1]); and 
dynamically update one or more interpretation views associated with one or more machine learning models based on the selected entry (“This atone is an obstacle to interpretation, but when using these types of algorithms as interpretation tools or with interpretation tools, it is important to remember that details of explanations can change across multiple accurate models. This instability of explanations is a driving factor behind the presentation of multiple explanatory results in Driverless Al, enabling users to find explanatory information that is consistent across multiple modeling and interpretation techniques.” [pg. 10, § The Multiplicity of Good Models, ¶1]); and a memory coupled to the processor and configured to provide the processor with instructions (“Driverless AI runs on commodity hardware. It was also specifically designed to take advantage of graphical processing units (GPUs), including multi-GPU workstations and servers such as the NVIDIA DGX-1 for order-of-magnitude faster training.” [pg. 5, About H2O Driverless AI, ¶2; use of memory is implicit]).

Regarding claim 20, Hall teaches A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for (“Driverless AI runs on commodity hardware. It was also specifically designed to take advantage of graphical processing units (GPUs), including multi-GPU workstations and servers such as the NVIDIA DGX-1 for order-of-magnitude faster training.” [pg. 5, About H2O Driverless AI, ¶2; use of memory is implicit]): 
receiving an indication of a selection of an entry associated with a machine learning model (“When a single observation, x, is selected, its path through tree is highlighted. The path of x(i) through htree can be helpful when analyzing the logic or validity of g(x‘).” [pg. 12, ¶1]); and 
dynamically updating one or more interpretation views associated with one or more machine learning models based on the selected entry (“This atone is an obstacle to interpretation, but when using these types of algorithms as interpretation tools or with interpretation tools, it is important to remember that details of explanations can change across multiple accurate models. This instability of explanations is a driving factor behind the presentation of multiple explanatory results in Driverless Al, enabling users to find explanatory information that is consistent across multiple modeling and interpretation techniques.” [pg. 10, § The Multiplicity of Good Models, ¶1]).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Hall in view of Lee et al. ("US 20150379429 A1", cited by Applicant in the IDS filed 02/17/2022, hereinafter "Lee").

Regarding claim 10, Hall teaches The method of claim 8, where Hall further teaches in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value (“Following the global versus local analysis motif, local contributions to model predictions for a single home are also analyzed and compared to global explanations and reasonable expectations.” [pg. 36, Local Explanations, ¶1])
Although Hall teaches determining a difference between the corresponding local/global feature importance values and removing problematic features (pg. 14), the reference fails to go into details about determining that the difference is less than a threshold value, and then forgoing an investigation of the feature importance model
Lee teaches is less than a threshold value, forgoing an investigation of the feature importance model (“At least some of the parameter vector entries may be removed based on the adjusted weights in some embodiments (element 6116). For example, entries whose weights fall below a rejection threshold may be removed. In some embodiments, an efficient quantile boundary estimation technique similar to that discussed in the context of FIG. 52 and FIG. 54 may be applied to the absolute values of the feature weights, and parameter vector entries whose weights fall in the lowest X % may be removed.” [¶0305; removing entries would imply forgoing an investigation.]).
Hall and Lee are both in the same field of endeavor of machine learning model interpretations. Hall discloses machine learning interpretability with H2O Driverless AI. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hall’s teachings by removing certain features below a threshold value as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]
Backup Rejection
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 2, 11-13, 19 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ribeiro et al. ("“Why Should I Trust You?” Explaining the Predictions of Any Classifier", hereinafter "Ribeiro").

	Regarding claim 1, Ribeiro teaches A method, comprising: 
receiving an indication of a selection of an entry associated with a machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2); and 
dynamically updating one or more interpretation views associated with one or more machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]).

Regarding claim 2, Ribeiro teaches The method of claim 1, wherein the one or more machine learning models include one or more non-linear models (“For example, a model that predicts sepia-toned images to be retro cannot be explained by presence of absence of super pixels. Second, our choice of G (sparse linear models) means that if the underlying model is highly non-linear even in the locality of the prediction, there may not be a faithful explanation.” [pg. 4, § 3.4 Sparse Linear Explanations, ¶2]).

Regarding claim 11, Ribeiro teaches The method of claim 2, wherein one of the one or more non-linear surrogate models includes a decision tree surrogate model (“Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule list” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶1]).

Regarding claim 12, Ribeiro teaches The method of claim 11, wherein a plurality of branches associated with the decision tree surrogate model are based on input data associated with the machine learning model, wherein the input data associated with the machine learning model includes a plurality of entries (“We use two sentiment analysis datasets (books and DVDs, 2000 instances each) where the task is to classify product reviews as positive or negative. We train decision trees (DT), logistic regression with L2 regularization (LR), nearest neighbors (NN), and support vector machines with RBF kernel (SVM), all using bag of words as features. We also include random forests (with 1000 trees) trained with the average word2vec embedding (RF), a model that is impossible to interpret without a technique like LIME. We use the implementations and default parameters of scikitlearn, unless noted otherwise. We divide each dataset into train (1600 instances) and test (400 instances)” [pg. 6, § 5.1 Experiment Setup, ¶1]), wherein each entry has a one or more features and one or more corresponding feature values (“Intuitively, we want I such that features that explain many different instances have higher importance scores. In Figure 5, we show a toy example W, with n = d 0 = 5, where W is binary (for simplicity). The importance function I should score feature f2 higher than feature f1, i.e. I2 > I1, since feature f2 is used to explain more instances” [pg. 4. Submodular Pick for Explaining Models, ¶4; importance scores would be equivalent to feature values]).

Regarding claim 13, Ribeiro teaches The method of claim 11, wherein dynamically updating the one or more interpretation views associated with the one or more machine learning models includes highlighting a path of the decision tree surrogate model, wherein the highlighted path is specific to the selected entry (“We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed). The explanations for each classifier are then presented to a set of 5 users in a new round of interaction, which results in 50 new classifiers. We do a final round, after which we have 250 classifiers, each with a path of interaction tracing back to the first 10 subjects. The explanations and instances shown to each user are produced by SP-LIME or RP-LIME. We show the average accuracy on the religion dataset at each interaction round for the paths originating from each of the original 10 subjects (shaded lines), and the average across all paths (solid lines) in Figure 10. It is clear from the figure that the crowd workers are able to improve the model by removing features they deem unimportant for the task” [pg. 8, § 6.3 Can non-experts improve a classifier, ¶2-3]).

Regarding claim 19, Ribeiro teaches A system, comprising: 
a processor configured to: 
receive an indication of a selection of an entry associated with a machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2); and 
dynamically update one or more interpretation views associated with one or more machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]); and 
a memory coupled to the processor and configured to provide the processor with instructions (“In practice, explaining random forests with 1000 trees using scikit-learn (http://scikit-learn.org) on a laptop with N = 5000 takes under 3 seconds without any optimizations such as using gpus or parallelization” [pg. 4, 3.4 Sparse Linear Explanations, ¶2; processors and memory are implicit.]).

Regarding claim 20, Ribeiro teaches A computer program product, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for (“In practice, explaining random forests with 1000 trees using scikit-learn (http://scikit-learn.org) on a laptop with N = 5000 takes under 3 seconds without any optimizations such as using gpus or parallelization” [pg. 4, 3.4 Sparse Linear Explanations, ¶2; processors and memory are implicit.]): 
receiving an indication of a selection of an entry associated with a machine learning model (“Let the model being explained be denoted f : Rd → R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. We further use πx(z) as a proximity measure between an instance z to x, so as to define locality around x” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2); and 
dynamically updating one or more interpretation views associated with one or more machine learning models based on the selected entry (“Finally, let L(f, g, πx) be a measure of how unfaithful g is in approximating f in the locality defined by πx. In order to ensure both interpretability and local fidelity, we must minimize L(f, g, πx) while having Ω(g) be low enough to be interpretable by humans.” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶2; minimizing a loss function would imply dynamically updating.]).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 3-7 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Lei et al. ("Distribution-Free Predictive Inference For Regression", hereinafter "Lei").

Regarding claim 3, Ribeiro teaches The method of claim 2, however fails to explicitly teach wherein one of the one or more non-linear surrogate models includes a feature importance model.
Lei teaches wherein one of the one or more non-linear surrogate models includes a feature importance model (“In this section, we discuss the problem of estimating the importance of each variable in a prediction model… First, our method is not limited to linear regression. Second, the spirit of our approach is to focus on predictive quantities and we want to measure variable importance directly in terms of prediction.” [pg. 32, § 6 Model-Free Variable Importance: LOCO, ¶1]).
Ribeiro and Lei are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s non linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 4, Ribeiro/Lei teaches The method of claim 3, where Lei further teaches wherein the feature importance model is configured to output one or more features, wherein the one or more features have a corresponding global feature importance value (“For a more global measure of variable importance, we can focus on the distribution of ∆j (Xn+1, Yn+1), marginally over (Xn+1, Yn+1). We rely on a splitting approach, where the index set used for the training of µ and µ(−j) is I1 ( {1, . . . , n}, a proper subset.” [pg. 36, 6.2 Global Measures of Variable Importance, ¶1]) and a corresponding local feature importance value (“As with the guarantees from conformal inference, the coverage statement (15) is marginal over Xn+1, and in general, does not hold conditionally at Xn+1 = x. But, to summarize the effect of covariate j, we can still plot the intervals Wj(Xi) for i = 1, . . . , n, and loosely interpret these as making local statements about variable importance” [pg. 34, § 6.1 Local Measure of Variable Importance, ¶3]).
Ribeiro and Lei are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s non linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 5, Ribeiro/Lei teaches The method of claim 4, where Ribeiro teaches wherein the corresponding global feature importance value associated with a feature is based at least in part on a number of times the feature is used in a random forest model (“We train decision trees (DT), logistic regression with L2 regularization (LR), nearest neighbors (NN), and support vector machines with RBF kernel (SVM), all using bag of words as features. We also include random forests (with 1000 trees) trained with the average word2vec embedding (RF), a model that is impossible to interpret without a technique like LIME. We use the implementations and default parameters of scikitlearn, unless noted otherwise. We divide each dataset into train (1600 instances) and test (400 instances).” [pg. 6, § 5.1 Experiment Setup, ¶1; See §5.4 Can I trust this model, ¶1 discloses random forest model]).

Regarding claim 6, Ribeiro/Lei teaches The method of claim 5, where Lei teaches wherein the corresponding global feature importance value associated with the feature is based at least in part on a level of the random forest model that the feature was used to split the random forest model (“For a more global measure of variable importance, we can focus on the distribution of ∆j (Xn+1, Yn+1), marginally over (Xn+1, Yn+1). We rely on a splitting approach, where the index set used for the training of µ and µ(−j) is I1 ( {1, . . . , n}, a proper subset. Denote by I2 its complement, and by Dk = {(Xi , Yi) : i ∈ Ik}, k = 1, 2 the data samples in each index set.” [pg. 36, § 6.2 Global Measures of Variable Importance, ¶1; Lei further discloses using random forest models: “The only exception is the random forest estimator, which gave stable errors over a variety of tuning choices; hence it is represented by a single point in each plot (corresponding to 500 trees in the low-dimensional problems, and 1000 trees in the high-dimensional problems). All curves in the figures represent an average over 50 repetitions, and error bars indicating the standard errors. In all cases, we used the split conformal method for computational efficiency.” [pg. 24, ¶2]]).
Ribeiro and Lei are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s non linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Regarding claim 7, Ribeiro/Lei teaches The method of claim 4, where Lei teaches wherein the corresponding local feature importance value is computed using a leave-one-covariate out mechanism (“Our proposal, leave-one-covariate-out or LOCO inference, proceeds as follows” [pg. 32, 6 Model-Free Variable Importance: LOCO, ¶2]; See further §6.1).
Ribeiro and Lei are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s non linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]

Claims 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Lei and further in view of Lee et al. ("US 20150379429 A1", cited by Applicant in the IDS filed 02/17/2022, hereinafter "Lee").

Regarding claim 8, Ribeiro/Lei teaches The method of claim 4, further comprising: 
Ribeiro teaches comparing the corresponding global feature importance value associated with a feature with the corresponding local feature importance value associated with the feature (“Another essential criterion is local fidelity. Although it is often impossible for an explanation to be completely faithful unless it is the complete description of the model itself, for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted. We note that local fidelity does not imply global fidelity: features that are globally important may not be important in the local context, and vice versa. While global fidelity would imply local fidelity, identifying globally faithful explanations that are interpretable remains a challenge for complex models.” [pg. 3, top left col, ¶2; implies comparing global and local feature importance values]); and 
However Ribeiro/Lei fails to explicitly teach determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value
Lee teaches determining whether a difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal a threshold value (“To avoid such undesirable scenarios, a technique for pruning selected parameters may be employed in some embodiments. According to such a technique, when certain triggering conditions are met (e.g., when the number of features for which parameters are stored exceeds a threshold), a fraction of the features that contribute least to the models' predictions may be identified as pruning victims. An efficient in-memory technique to estimate quantile boundary values (e.g., the 20% of the features that contribute the least to the model's predictions) for parameters may be used in some embodiments, without requiring copying of the parameters or an explicit sort operation.” [¶0257]).
Ribeiro, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Regarding claim 9, Ribeiro/Lei/Lee teaches The method of claim 8, where Lee teaches in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is greater than or equal to a threshold value, investigating the feature importance model (“According to such a technique, when certain triggering conditions are met (e.g., when the number of features for which parameters are stored exceeds a threshold), a fraction of the features that contribute least to the models' predictions may be identified as pruning victims. An efficient in-memory technique to estimate quantile boundary values (e.g., the 20% of the features that contribute the least to the model's predictions) for parameters may be used in some embodiments, without requiring copying of the parameters or an explicit sort operation. Entries (e.g., parameter values) for the pruning victims identified may be removed from the feature set 5025, thus reducing the memory consumed. However, additional learning iterations may be performed even after pruning some features. Thus, the feature set size may grow and shrink repeatedly as more observation records are considered, more features are added, and more features are pruned.” [¶0257; additional learning/techniques would correspond to “investigating”.]).
Ribeiro, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Regarding claim 10, Ribeiro/Lei/Lee teaches The method of claim 8, where Lee teaches in response to determining that the difference between the corresponding global feature importance value associated with the feature and the corresponding local feature importance value is less than a threshold value, forgoing an investigation of the feature importance model (“At least some of the parameter vector entries may be removed based on the adjusted weights in some embodiments (element 6116). For example, entries whose weights fall below a rejection threshold may be removed. In some embodiments, an efficient quantile boundary estimation technique similar to that discussed in the context of FIG. 52 and FIG. 54 may be applied to the absolute values of the feature weights, and parameter vector entries whose weights fall in the lowest X % may be removed.” [¶0305; removing entries would imply forgoing an investigation.]).
Ribeiro, Lei, and Lee are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Lei’s teachings in order to implement a threshold value for comparing features as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Lee.

Regarding claim 14, Ribeiro teaches The method of claim 11, however fails to explicitly teach wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model.
Lee teaches wherein a width of a path of the decision tree surrogate model indicates a frequency of which the path is used by the decision tree surrogate model (“For example, in a greedy pruning technique 3650, the unpruned tree 3604 may be analyzed in a top-down fashion, selecting the path that leads to the node with the highest PUM value at each split in the tree. The cumulative PUM values of the nodes encountered during the greedy top-down traversal may be tracked, as well as the total number of nodes encountered.” [¶0213]).
Ribeiro and Lee are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lee discloses a method for evaluating machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings in order to find the frequency of a path used by a decision tree as taught by Lee. One would have been motivated to make this modification in order to come up with more accurate and faster predictions. [¶0224, Lee]

Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Goldstein et al. ("Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation", hereinafter "Goldstein").

Regarding claim 15, Ribeiro teaches The method of claim 2, however fails to explicitly teach wherein one of the one or more non-linear surrogate models includes a partial dependence plot.
Goldstein teaches wherein one of the one or more non-linear surrogate models includes a partial dependence plot (“The resulting graphic, which is called a partial dependence plot, displays the average value of ˆf as a function of xS. For the remainder of the paper we consider a single predictor of interest at a time (|S| = 1) and write xS without boldface accordingly” [pg. 4, top para])
Ribeiro and Goldstein are both in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]

Regarding claim 16, Ribeiro/Goldstein teaches The method of claim 15, where Goldstein teaches wherein the partial dependence plot indicates a dependence of a prediction label of the partial dependence plot on a feature having a particular value (“The goal of this article is to present Individual Conditional Expectation (ICE) plots, a toolbox for visualizing models produced by “black box” algorithms. These algorithms use training data {xi, yi}Ni=1 (where xi = (xi,1, . . . , xi,p) is a vector of predictors and yi is the response) to construct a model ˆf that maps the features x to fitted values ˆf(x). Though these algorithms can produce fitted values that enjoy low generalization error, it is often difficult to understand how the resultant ˆf uses x to generate predictions. The ICE toolbox helps visualize this mapping. [pg. 1-2, § 1 Introduction, ¶1]).
Ribeiro and Goldstein are both in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]

Regarding claim 17, Ribeiro/Goldstein teaches The method of claim 15, where Goldstein teaches wherein the partial dependence plot indicates an average 25prediction label based on all entries associated with the partial dependence plot having a corresponding feature with a same particular value (“Each subset of predictors S has its own partial dependence function fS, which gives the average value of f when xS is fixed and xC varies over its marginal distribution dP (xC). As neither the true f nor dP (xC) are known, we estimate Equation 1 by computing where {xC1, ..., xCN} represent the different values of xC that are observed in the training data. Note that the approximation here is twofold: we estimate the true model with ˆf, the output of a statistical learning algorithm, and we estimate the integral over xC by averaging over the N xC values observed in the training set. [pg. 3, § 2.2 Friedman’s PDP, ¶3; xs is fixed thus corresponds to a same particular value.]).
Ribeiro and Goldstein are both in the same field of endeavor model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s teachings to implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Lei and further in view of Goldstein.

Regarding claim 18, Ribeiro teaches The method of claim 1, wherein the one or more interpretation views associated with one or more machine learning models includes, a view associated with a decision tree surrogate model (“Formally, we define an explanation as a model g ∈ G, where G is a class of potentially interpretable models, such as linear models, decision trees, or falling rule list” [pg. 3, § 3.2 Fidelity-Interpretability Trade-off, ¶1]), 
Ribeiro fails to explicitly teach a view associated with a feature importance surrogate model and a view associated with a partial dependence plot.
	Lei teaches a view associated with a feature importance surrogate model (“In this section, we discuss the problem of estimating the importance of each variable in a prediction model… First, our method is not limited to linear regression. Second, the spirit of our approach is to focus on predictive quantities and we want to measure variable importance directly in terms of prediction.” [pg. 32, § 6 Model-Free Variable Importance: LOCO, ¶1])
Ribeiro and Lei are both in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s non linear model to implement the feature/variable importance model as taught by Lei. One would have been motivated to make this modification in order to determine the importance of each feature in a prediction model. [pg. 32, § 6 Model-Free Variable Importance: LOCO]
Ribeiro/Lei fails to explicitly teach and a view associated with a partial dependence plot
Goldstein teaches and a view associated with a partial dependence plot (“The resulting graphic, which is called a partial dependence plot, displays the average value of ˆf as a function of xS. For the remainder of the paper we consider a single predictor of interest at a time (|S| = 1) and write xS without boldface accordingly” [pg. 4, top para]).
Ribeiro, Lei, and Goldstein are all in the same field of endeavor of model interpretation and thus are analogous. Ribeiro discloses LIME which explains the predictions of a classifier. Lei discloses a variable importance method called LOCO inference. Goldstein teaches individual conditional expectation plots for visualizing a model. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ribeiro’s/Lei’s teachings to further implement the partial dependent plot of Goldstein. One would have been motivated to make this modification in order to visualize the partial relationship between the predicted output and features. [Abstract, Goldstein]

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Ribeiro et al. ("Model-Agnostic Interpretability of Machine Learning) discloses model agnostic explanation approach (LIME).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122