DETAILED ACTION
This is the first office action regarding application number 16/124,047, filed September 6, 2018.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Specification
Paragraph [0061] of the specification identifies one non-patent literature reference that is not included as part of any submitted Information Disclosure Statement, and also corresponding copies of each reference are not included as part of the application submission. 
Ben-Haim et al., A Streaming Parallel Decision Tree Algorithm, Journal of Machine Learning Research 11 (2010) 849-872, published 2/2010 (http://www.jmlr.org/papers/v11/ben-haim10a.html)
If applicant wishes to have the above non-patent literature references to be considered, applicant must list them in a new Information Disclosure Statement and submit copies of the references. See CFR 37 1.98.
The disclosure is objected to because of the following informalities:
Paragraph [00008] line 2, [00009] line 2: typographical error; “block box” should read “black box”. Appropriate correction is required.
Paragraph [00020]: The single run-on sentence comprising this paragraph needs to be fixed. Appropriate correction is required.
Paragraph [00031]: Figure 1 does not contain element 126 as indicated in this paragraph. Appropriate correction is required.
Paragraph [00061]: This paragraph contains an embedded hyperlink (http://www.jmlr.org/papers/v11/ben-haim10a.html), which is not permitted within the specification. See MPEP 608.01(VII). This issue would be resolved by either a proper citation of the weblink (using top-level domain name without any prefix such as http://) or a proper citation of the actual reference located within this link (Ben-Haim et al., “A Streaming Parallel Decision Tree Algorithm”, Journal of Machine Learning Research 11 (2010) 849-872.). Appropriate correction is required.
Paragraph [00069], last sentence: typographical error; “6062%” should read “60.62%” to be consistent with Figure 5 (marked as element 520 in Figure 5, but it is not identified in the specification.) Appropriate correction is required.
Paragraph [00085]: Figure 6 element 647 is not described in this paragraph, or anywhere else in the specification. Appropriate correction is required.
Paragraph [00087]: Figure 7 does not contain element “memory controller 703” as indicated in this paragraph. Appropriate correction is required.
Paragraph [00091]: Figure 7 does not contain element 712 as indicated in this paragraph. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B). If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. Applicant is advised to consult MPEP 2106 for more details of the analysis.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more than the abstract idea itself, and hence is not patent-eligible subject matter.
Regarding Claim 1, 
Step 1: The claim recites one or more non-transitory computer-readable media, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim recites the following abstract ideas:
create a set of perturbed input data points (Pk) from P by changing the value of at least one feature of P for each perturbed input data point (Under its broadest reasonable ; …
analyze the predictions m(Pk) for the perturbed input data points to determine which features are most influential to the prediction (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as analyzing the predictions for the perturbed input data points to determine which features are most influential represents a decision-making process, which is a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind, using a generic computer as a tool to perform the mental process. See MPEP 2106.04(a)(2)(III-C).); …
Step 2A Prong 2: This claim further recites:
instructions, which when executed by one or more processors of a computing system, causes the computing system to (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.): 
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P (This claim element is directed to a form of pre-solution/insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.); …
obtain a prediction m(Pk) for each of the perturbed input data points (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See ; 
output the analysis results to a user (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
instructions, which when executed by one or more processors of a computing system, causes the computing system to (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.): 
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example iv.); …
obtain a prediction m(Pk) for each of the perturbed input data points (This claim element is directed to retrieving or transmitting data over a network (where the machine learning model m is located in a computer readable-storage media location), which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example i. Additionally, this claim element is also directed to storing and retrieving information in memory (after retrieval of the prediction data m(Pk) over the network), which is also a well-known, understood, routine, and conventional activity, and hence does not ; 
output the analysis results to a user (This claim element is directed to necessary data gathering and outputting, which does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(g).).  
Regarding Claim 2,
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the values of multiple features of P are changed for at least some of the perturbed data points (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as identifying changed values for some of the perturbed data points represents a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind. See MPEP 2106.04(a)(2)(III).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 3,
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as inferring rules is interpreted as identifying conditions to distinguish between predictions of perturbed points, which represents a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind. See MPEP 2106.04(a)(2)(III).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 4,
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is within a pre-defined distance of m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as inferring rules is interpreted as identifying conditions to distinguish between predictions of perturbed points based on a pre-defined distance metric, which represents a mathematical relationship, a way of organizing and manipulating information using mathematical correlations. See MPEP 2106.04(a)(2)(I-A) example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 5, 
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a pre-defined distance based on an Euclidean, L_1 norm, max_norm, or KL-divergence method represents a mathematical calculation. See MPEP 2106.04(a)(2)(I-C).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 6,
Step 1: The claim recites the one or more computer readable media of claim 4, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 4, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the rules include a threshold above or below which the prediction m(Pk) changes from m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as identifying the rules (e.g., conditions) that change the predictions based on threshold values represents a mathematical relationship, a way of organizing and manipulating information using mathematical correlations. See MPEP 2106.04(a)(2)(I-A) example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 7,
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein create a set of perturbed data points further includes to operate on P using a pre-defined perturbation function D (This claim element is considered a form of applying mere instructions (i.e., a pre-defined perturbation function D) on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
wherein create a set of perturbed data points further includes to operate on P using a pre-defined perturbation function D (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception .  
Regarding Claim 8,
Step 1: The claim recites the one or more computer readable media of claim 7, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 7, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
D takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} (This claim element places an additional limitation on the pre-defined perturbation function D by describing its inputs and outputs, as well as generally linking the system to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
D takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} (As analyzed in Step 2A Prong 2, type definitions and a general linking to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).  
Regarding Claim 9,
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
instructions that, when executed, cause the computing system to (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.)
access a histogram for each of the features of P, that indicates which bin of the histogram the value of each of the features in P falls within (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
instructions that, when executed, cause the computing system to (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.)
access a histogram for each of the features of P, that indicates which bin of the histogram the value of each of the features in P falls within (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example iv.).  
Regarding Claim 10, 
Step 1: The claim recites the one or more computer readable media of claim 9, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 9, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein changing the value of at least one feature of P for each perturbed input data point Pk includes selecting a new value for that feature from a different bin of the histogram (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining (i.e., selecting) new feature values from different histogram bins represents a decision-making process, which is a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind. See MPEP 2106.04(a)(2)(III).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 11, 
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
instructions that, when executed, cause the computing system to (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.)
display a prediction explanation graphic for either P or a Pk, to the user (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.),
the prediction explanation graphic including each of the features of the input data point, the value of each of those features, and a relative importance indication of each feature (This claim element places an additional limitation on the prediction explanation graphic by describing its contents, as well as generally linking the system to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
instructions that, when executed, cause the computing system to (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.)
display a prediction explanation graphic for either P or a Pk, to the user (This claim element is directed to necessary data gathering and outputting, which does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(g).),
the prediction explanation graphic including each of the features of the input data point, the value of each of those features, and a relative importance indication of each feature (As analyzed in Step 2A Prong 2, type definitions and a general linking to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding Claim 12, 
Step 1: The claim recites the one or more computer readable media of claim 1, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 1, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the model is proprietary (This claim element places an additional limitation on the type of model (i.e., a proprietary model), as well as generally linking the system to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).), 
accessible via a remote server (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), …
instructions that, when executed, cause the computing system to (This claim element is considered a form of applying mere instructions on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.)
access the model over a network link between the computing system and the remote server (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
wherein the model is proprietary (As analyzed in Step 2A Prong 2, type definitions and a general linking to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.), 
accessible via a remote server (This claim element is directed to retrieving or transmitting data over a network, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example i.), … 
instructions that, when executed, cause the computing system to (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.)
access the model over a network link between the computing system and the remote server (This claim element is directed to retrieving or transmitting data over a network, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example i.).  
Regarding Claim 13, 
Step 1: The claim recites a computing system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim recites the following abstract ideas:
create a set of perturbed input data points (Pk) from P by changing the value of at least one feature of P for each perturbed input data point (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as creating a set of perturbed input ; …
analyze the predictions m(Pk) for the perturbed input data points to determine which features are most influential to the prediction (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as analyzing the predictions for the perturbed input data points to determine which features are most influential represents a decision-making process, which is a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind, using a generic computer as a tool to perform the mental process. See MPEP 2106.04(a)(2)(III-C).); …
Step 2A Prong 2: This claim further recites:
one or more processors to implement a model characterization engine, the model characterization engine to (This claim element is directed to generally linking the system to a technological environment (by reciting one or more processors in a computer environment). Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).)
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P (This claim element is directed to a form of pre-solution/insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.); …
obtain a prediction m(Pk) for each of the perturbed input data points (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.); 
output the analysis results to a user (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
one or more processors to implement a model characterization engine, the model characterization engine to (As analyzed in Step 2A Prong 2, a general linking to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P (This claim element is directed to storing and retrieving information in memory, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example iv.); …
obtain a prediction m(Pk) for each of the perturbed input data points (This claim element is directed to retrieving or transmitting data over a network (where the machine learning model m is located in a computer readable-storage media location), which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example i. Additionally, this claim element is also directed to storing and retrieving information in memory (after retrieval of the prediction data m(Pk) over the network), which is also a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example iv.); 
output the analysis results to a user (This claim element is directed to necessary data gathering and outputting, which does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(g).).  
Regarding Claim 14, 
Step 1: The claim recites the computing system of claim 13, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 13, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the model characterization engine is to access the model from a remote server by inputting P and Pk to it, and receiving m(P) and m(Pk) from it, over a network link (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.).  
Step 2B: This claim further recites:
wherein the model characterization engine is to access the model from a remote server by inputting P and Pk to it, and receiving m(P) and m(Pk) from it, over a network link (This claim element is directed to retrieving or transmitting data over a network, which is a well-known, understood, routine, and conventional activity, and hence does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(d)(II) list 1, example i.).  
Regarding Claim 15, 
Step 1: The claim recites the computing system of claim 13, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 13, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein the model characterization engine creates the set of perturbed data points by operating on P using a pre-defined perturbation function D (This claim element is considered a form of applying mere instructions (i.e., a pre-defined perturbation function D) on a generic computer to implement a judicial exception. See MPEP 2106.05(f). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.)
that takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} (This claim element places an additional limitation on the pre-defined perturbation function D by describing its inputs and outputs, as well as generally linking the system to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
wherein the model characterization engine creates the set of perturbed data points by operating on P using a pre-defined perturbation function D (As analyzed in Step 2A Prong 2, applying mere instructions on a generic computer to implement a judicial exception does not integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.)
that takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} (As analyzed in Step 2A Prong 2, type definitions and a general linking to a technological environment do not further integrate the judicial exception into a practical .  
Regarding Claim 16, 
Step 1: The claim recites the computing system of claim 13, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 13, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein to analyze the predictions the model characterization engine is further to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as inferring rules is interpreted as identifying conditions to distinguish between predictions of perturbed points, which represents a mental process (observations, evaluations, judgments, opinions) that is implementable in the human mind. See MPEP 2106.04(a)(2)(III).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 17, 
Step 1: The claim recites the computing system of claim 13, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 13, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein analyze the predictions the model characterization engine is further to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is within a pre-defined distance of m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as inferring rules is interpreted as identifying conditions to distinguish between predictions of perturbed points based on a pre-defined distance metric, which represents a mathematical relationship, a way of organizing and manipulating information using mathematical correlations. See MPEP 2106.04(a)(2)(I-A) example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 18, 
Step 1: The claim recites the computing system of claim 17, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of claim 17, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract idea:
wherein the pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as determining a pre-defined distance based on an Euclidean, L_1 norm, max_norm, or KL-divergence method represents a mathematical calculation. See MPEP 2106.04(a)(2)(I-C).).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 19, 
Step 1: The claim recites the computing system of claim 16, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 16, and hence inherits the same abstract ideas mentioned above. This claim further recites the following abstract ideas:
wherein the rules include a threshold above or below which the prediction m(Pk) changes from m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as identifying the rules (e.g., conditions) that change the predictions based on threshold values represents a mathematical relationship, a way of organizing and manipulating information using mathematical correlations. See MPEP 2106.04(a)(2)(I-A) example iv.), or 
wherein the rules … changes at least by a pre-defined distance from m(P) (Under its broadest reasonable interpretation, this claim element recites a judicial exception, as inferring rules is interpreted as identifying conditions to distinguish between predictions of perturbed points based on a pre-defined distance metric, which represents a mathematical relationship, a way of organizing and manipulating information using mathematical correlations. See MPEP 2106.04(a)(2)(I-A) example iv.).  
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding Claim 20, 
Step 1: The claim recites the computing system of claim 13, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim is a dependent claim of Claim 13, and hence inherits the same abstract ideas mentioned above.
Step 2A Prong 2: This claim further recites:
wherein to output the analysis results to the user, the model characterization engine is further to cause the computing system to display a prediction explanation graphic for either P or a Pk to the user (This claim element is directed to a form of insignificant extra-solution activity for use in a claimed process. See MPEP 2106.05(g). This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), 
the prediction explanation graphic including each of the features of P or Pk, as the case may be, the value of each of those features, and a relative importance indication of each feature (This claim element places an additional limitation on the prediction explanation graphic by describing its contents, as well as generally linking the system to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).  
Step 2B: This claim further recites:
wherein to output the analysis results to the user, the model characterization engine is further to cause the computing system to display a prediction explanation graphic for either P or a Pk to the user (This claim element is directed to necessary data gathering and outputting, which does not add significantly more than the judicial exception, alone or in combination with other elements in the claim. See MPEP 2106.05(g).), 
the prediction explanation graphic including each of the features of P or Pk, as the case may be, the value of each of those features, and a relative importance indication of each feature .  

Claim Rejections - 35 USC § 112








The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding Claim 5,
Claim 5 recites the limitation "wherein the pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence" in lines 1-2.  There is insufficient antecedent basis for this limitation in the claim, since the term “the pre-defined distance” is not mentioned earlier in this claim or in parent claim 1. For the purposes of examination, this claim limitation will be interpreted as “wherein [a] pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence”.

Claim Rejections - 35 USC § 102


The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-2, 7-8, 11-15, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Shteingart et al., U.S. PGPUB 2016/0379133, published 12/29/2016 [hereafter referred as Shteingart].
Regarding Claim 1, Shteingart teaches
One or more non-transitory computer-readable media including instructions, which when executed by one or more processors of a computing system ([Shteingart paragraph [0014]: computer-readable medium comprises computer storage media.] [Shteingart paragraph [0017]: embodiments are implemented by program modules containing computer-executable instructions.] [Shteingart Figure 3; paragraphs [0042]-[0043]: computing device 300 containing a CPU processor and computer storage media containing computer-readable instructions, program modules, data structures.]), causes the computing system to: 
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P ([Shteingart Figure 1, elements 110, 155, 170, 171; paragraphs [0020]-[0022]: a classifier 110 (“machine learning model m”), an input data set 155, each feature of the data set (“an input data point P to m, including one or more features”) (“The classifier 110 is a component of the system that is configured to classify a data set according to a set of rules. The set of rules that are used by the classifier 110 are designed to look at the data set that is input and each feature of the data set and determine a particular output based on the combination of the features of the data set. … The classifier 110 can use any rules or processes available to classify or otherwise produce the output from the input data, such as training data 150, first data set 155 and second data set 157 as input and results 170 and 171 and output. … The output 170/171 of the classifier 110 can simply contain the determined result.”).] [Shteingart paragraph [0024]: results 170 corresponds to the prediction from the first data set 155 (“a prediction m(P) of m for P”) (“…results 170 from the original classification of the unperturbed data set (i.e. first data set 155)”).] [paragraph [0023]: a perturber 130 changing feature values (“one or more features”) (“The perturber 130 is a component of the system that is configured to take one data set (e.g., first data set 155) in the test data and change or modify one of the features …”).]); 
create a set of perturbed input data points (Pk) from P by changing the value of at least one feature of P for each perturbed input data point ([Shteingart Figure 1, elements 115, 130, 160-1, 160-2, … 160-N; paragraph [0023]: a perturber 130 taking an input and modifying features to create a set of perturbed data (“create a set of perturbed input data points Pk from P by changing the value of at least one feature of P for each perturbed input data point”) (“The perturber 130 swaps a first feature in the first data set 155 with the corresponding first feature in the second data set 156 to create a perturbed data set 16[0]-1. The perturber 130 repeats this process for each of the features in the first data set 155, to create additional perturbed data sets 16[0]-2, 16[0]-N (all collectively referred to herein as perturbed data set 161). This process can also be repeated several times for the same feature in an attempt to gain better statistics for the feature. In some embodiments the perturber 130 can select multiple features to replace in creating a perturbed data set 161.”).]); 
obtain a prediction m(Pk) for each of the perturbed input data points ([Shteingart paragraph [0024]: feature comparison module performing analysis based on output results 171 from the perturbed data sets 161 (“obtain a prediction m(Pk) for each of the perturbed input data points”) to determine which particular feature caused a change to be more or less in favor of a  (“The feature comparison module 140 is a component of the system that is configured to receive the output from the classifier 110 from each of the perturbed data sets 161 and to compare the results 171 of that classification with the results 170 from the original classification of the unperturbed data set (i.e. first data set 155). For each feature in the data set the feature comparison module 140 computes a deviation statistic for that feature such as mean label or score deviation. This deviation statistic indicates or represents the change caused by the subbing of the particular feature. The results 171 from the classifier 110 may indicate that the particular feature caused the change to be more or less in favor of a particular result. That is the result may be stronger towards the original result for the first data set 155 or move the score closer to the opposite result.”).]); 
analyze the predictions m(Pk) for the perturbed input data points to determine which features are most influential to the prediction ([Shteingart paragraph [0025]: feature comparison module performing analysis based on output results 171 from each of the perturbed data sets 161 (“analyze the predictions m(Pk) for the perturbed input data points”) to determine which particular feature caused a change to be more or less in favor of a particular result (“…to determine which features are more influential to the prediction”) (“The feature comparison module 140 identifies the feature or features in the perturbed data set that was changed and compares that feature with the original feature of the first data set 155. In this way the feature comparison module 140 can identify the amount of change that occurred when the feature was subbed out. This allows for the system to judge the overall impact of a specific feature change beyond the actual change in the score. If the two features used are actually quite close, then the expectation would be that there would be little to no change in the classification. … The feature comparison module 140 builds a table of features and the corresponding probabilities received for the perturbed data sets such that the impact of the feature on the results can be observed, reported and displayed as a representation 180.”).]); and  
output the analysis results to a user ([Shteingart paragraph [0038]: displaying the representation results generated by the feature comparison module to a user on a display device (“output the analysis results to a user”) (“The representation that was generated by the feature comparison module 140 is output. This is illustrated at step 240. The output could be simply storing the representation in a data storage system for later retrieval. The output could be formatted and displayed to a user on a display device associated with a computing device.”).]).  
Regarding Claim 2, Shteingart teaches
The one or more computer readable media of claim 1, 
wherein the values of multiple features of P are changed for at least some of the perturbed data points ([Shteingart paragraph [0019]: perturbing data values by selecting values from a randomly chosen data point with an opposite label (“wherein the values of multiple features of P are changed for at least some of the perturbed data points”) (“It is based on evaluating the effect of perturbing each feature by bootstrapping it with the negative (opposite) samples and measuring the change in the classifier output. For instance, assume classification of a feature vector of a positively labeled instance. To assess the importance of a given feature value in the classified feature vector, a random negatively labeled instance is taken out of the training set and replaces the feature at question (i.e. the one of the features in the positively labeled feature vector) with a corresponding feature from this set. Then, by classifying the modified feature vector and comparing its predicted label and classifier output (score and label) it is possible to measure and observe the effect of changing each feature.”).] [Shteingart paragraph [0023]: perturbing multiple features of P (“In some embodiments the perturber 130 can select multiple features to replace in creating a perturbed data set 161.”).]).  
Regarding Claim 7, Shteingart teaches
The one or more computer readable media of claim 1, 
wherein create a set of perturbed data points further includes to operate on P using a pre-defined perturbation function D ([Shteingart Figure 1, elements 115, 130, 160-1, 160-2, … 160-N; paragraph [0023]: a perturber 130 taking an input and modifying features on a data point to create a set of perturbed data, where the perturber process is interpreted as an implementation of an algorithm (e.g., a pre-defined perturbation function D) (“create a set of perturbed input data points further includes to operate on P using a pre-defined perturbation function D”) (“The perturber 130 is a component of the system that is configured to take one data set (e.g. first data set 155) in the test data and change or modify one of the features that make up that data set with a corresponding feature from another one of the data sets in the test data. … The perturber 130 swaps a first feature in the first data set 155 with the corresponding first feature in the second data set 156 to create a perturbed data set 161-1. The perturber 130 repeats this process for each of the features in the first data set 155, to create additional perturbed data sets 161-2, 161-N (all collectively referred to herein as perturbed data set 161). This process can also be repeated several times for the same feature in an attempt to gain better statistics for the feature. In some embodiments the perturber 130 can select multiple features to replace in creating a perturbed data set 161.”).]).  
Regarding Claim 8, Shteingart teaches
The one or more computer readable media of claim 7, 
D takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} ([Shteingart Figure 1, elements 115, 130, 160-1, 160-2, … 160-N; paragraph [0023]: a perturber 130 taking an input and modifying features to create a set of perturbed data, where the perturber is interpreted as a ‘pre-defined function D’ that takes a single data point 155 and outputs perturbed data sets 160-1, 160-2, 160-N (“D takes as input a single point P and  (“The perturber 130 is a component of the system that is configured to take one data set (e.g. first data set 155) in the test data and change or modify one of the features that make up that data set with a corresponding feature from another one of the data sets in the test data. … The perturber 130 swaps a first feature in the first data set 155 with the corresponding first feature in the second data set 156 to create a perturbed data set 16[0]-1. The perturber 130 repeats this process for each of the features in the first data set 155, to create additional perturbed data sets 16[0]-2, 16[0]-N (all collectively referred to herein as perturbed data set 161).”).]).  
Regarding Claim 11, Shteingart teaches
The one or more computer readable media of claim 1, further comprising instructions that, when executed, cause the computing system to 
display a prediction explanation graphic for either P or a Pk, to the user ([Shteingart claim 19: “claim 19: “…wherein the representation is a graphical representation…”] [Shteingart paragraph [0037]: feature comparison module building a representation that reflects the perturbed data sets (“… for ... a Pk”) (“The feature comparison module 140 builds a table (or other representation) of features and the corresponding probabilities received for the perturbed data sets. This is illustrated at step 235.”).] [Shteingart paragraph [0038]: feature comparison module building a representation 180 shown on display to a user (“display a prediction explanation graphic … to the user”) (“The representation that was generated by the feature comparison module 140 is output. This is illustrated at step 240. The output could be simply storing the representation in a data storage system for later retrieval. The output could be formatted and displayed to a user on a display device associated with a computing device.”).]), 
the prediction explanation graphic including each of the features of the input data point, the value of each of those features, and a relative importance indication of each feature ([Shteingart paragraph [0037]: feature comparison module building a representation showing features and their associated values (“the prediction explanation graphic including each of the features of the input data point, the value of each of those features …”), and aggregated information from a feature scorer (“At this step the feature comparison module 140 builds the representation such that the impact of the feature on the results can be observed, reported and displayed. In some approaches the feature comparison module 140 may receive results from the use of multiple different data sets from the training data 150. In this instance the feature scorer can aggregate the effects of each of the features in the overall representation. By doing this the effects of a particular feature from the negative result is normalized. In some approaches the feature comparison module 140 can highlight those features where the normalization process had the greatest effect. This could be where in some of the data sets that feature had more of an impact than in others, such as where the value associated with the feature was significantly different than the value that was present in the first data set 155.”).] [Shteingart paragraph [0025]: feature comparison module identifying overall impact of a specific feature change beyond the actual change in the score, and representing the aggregated results by way of a feature scorer, with the aggregated results generated by the feature scorer reflecting “a relative importance indication of each feature” (“The feature comparison module 140 identifies the feature or features in the perturbed data set that was changed and compares that feature with the original feature of the first data set 155. In this way the feature comparison module 140 can identify the amount of change that occurred when the feature was subbed out. This allows for the system to judge the overall impact of a specific feature change beyond the actual change in the score. If the two features used are actually quite close, then the expectation would be that there would be little to no change in the classification. … The feature comparison module 140 builds a table of features and the corresponding probabilities received for the perturbed data sets such that the impact of the feature on the results can be observed, reported and displayed as a representation 180. In some approaches the feature comparison module 140 may receive results from the use of multiple different data sets from the training data 150. In this instance the feature scorer can aggregate the effects of each of the features in the overall report or representation 180.”).]).  
Regarding Claim 12, Shteingart teaches
The one or more computer readable media of claim 1, 
wherein the model is proprietary ([Shteingart paragraph [0018]: classifiers that are trained to perform a classification for a specific application/field using specific data (e.g., detect fraud, detect illness), where algorithm details are not disclosed and thus requires explanation and interpretation for understanding, which is being interpreted as being proprietary to a specific use case or application (“wherein the model is proprietary”) (“In many machine learning systems in general, and binary classification in particular, there is a need to decipher or understand what the reason is for deciding the output classification is positive ( e.g. fraud in cyber security or illness in medicine) rather than negative ( e.g. genuine in cyber security or healthy in medicine). … One tedious solution is to analyze the classification model and understanding what lead to the observed output. With modern machine learning this analysis becomes challenging and requires specialists. This is especially true when using models which don't assume data feature independence.”).]), 
accessible via a remote server ([Shteingart paragraph [0013]: storage device containing computer code (including the classifier model) that can be used for communicating, propagating, or transporting programs for use (i.e., over a computer network) (“…the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.”).] [Shteingart paragraph [0046]: storage device containing the program instructions can be distributed across the network, and executed at the local terminal and at a remote computer (“accessible via a remote server”) (“Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example a remote computer may store an example of the process described as software. … Alternatively the local computer may download pieces of the software as needed, or distributively process by executing some software instructions at the local terminal and some at the remote computer (or computer network).”).]), and 
further comprising instructions that, when executed, cause the computing system to 
access the model over a network link between the computing system and the remote server ([Shteingart paragraph [0046]: storage device containing the program instructions can be distributed across the network and executed at the local terminal and at a remote computer; in the context of the model being located at a remote storage server, this is interpreted as a way to “access the model over a network link between the computing system and the remote server” (“Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example a remote computer may store an example of the process described as software. … Alternatively the local computer may download pieces of the software as needed, or distributively process by executing some software instructions at the local terminal and some at the remote computer (or computer network).”).]).  
Regarding Claim 13, Shteingart teaches
A computing system comprising: 
one or more processors to implement a model characterization engine (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.), 
the model characterization engine to: 
access a machine learning model m, an input data point P to m, P including one or more features, and a prediction m(P) of m for P (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.); 
create a set of perturbed input data points (Pk) from P by changing the value of at least one feature of P for each perturbed input data point (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.); 
obtain a prediction m(Pk) for each of the perturbed input data points (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.); 
analyze the predictions m(Pk) for the perturbed input data points to determine which features are most influential to the prediction (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.); and 
output the analysis results to a user (This claim element is similar to a corresponding claim element in Claim 1, and hence is rejected under similar rationale.).  
Regarding Claim 14, Shteingart teaches
The computing system of claim 13, 
wherein the model characterization engine is to access the model from a remote server by inputting P and Pk to it, and receiving m(P) and m(Pk) from it, over a network link ([Shteingart paragraph [0014]: computer-readable medium comprises computer storage media.] [Shteingart paragraph [0017]: embodiments are implemented by program modules containing computer-executable instructions.] [Shteingart Figure 3; paragraphs [0042]-[0043]: computing device 300 containing a CPU processor and computer storage media ] [Shteingart paragraph [0013]: storage device containing computer code (including the classifier model) that can be used for communicating, propagating, or transporting programs for use, where the model program code is located on such storage devices (“…the subject matter may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.”).] [Shteingart paragraph [0046]: storage device containing the program instructions can be distributed across the network (“over a network link”) and executed at the local terminal and at a remote computer; in the context of the model being at a remote storage server, this is interpreted as sending inputs to the model and receiving outputs back from the model over the network link (“inputting P and Pk to it, and receiving m(P) and m(Pk) from it”) (“Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example a remote computer may store an example of the process described as software. … Alternatively the local computer may download pieces of the software as needed, or distributively process by executing some software instructions at the local terminal and some at the remote computer (or computer network).”).]).  
Regarding Claim 15, Shteingart teaches
The computing system of claim 13, 
wherein the model characterization engine creates the set of perturbed data points by operating on P using a pre-defined perturbation function D (This claim element is similar to a corresponding claim element in Claim 7, and hence is rejected under similar rationale.)
that takes as input a single point P and outputs a set of n *perturbed* points {Pk_1, Pk_2, ... Pk_n} (This claim element is similar to a corresponding claim element in Claim 8, and hence is rejected under similar rationale.).  
Regarding Claim 20, Shteingart teaches
The computing system of claim 13, 
wherein to output the analysis results to the user, the model characterization engine is further to cause the computing system to 
display a prediction explanation graphic for either P or a Pk to the user (This claim element is similar to a corresponding claim element in Claim 11, and hence is rejected under similar rationale.), 
the prediction explanation graphic including each of the features of P or Pk, as the case may be, the value of each of those features, and a relative importance indication of each feature (This claim element is similar to a corresponding claim element in Claim 11, and hence is rejected under similar rationale.).  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shteingart et al., U.S. PGPUB 2016/0379133, published 12/29/2016  [hereafter referred as Shteingart] in view of Howard et al., U.S. PGPUB 2002/0169736, published 11/14/2002 [hereafter referred as Howard].
Regarding Claim 3, Shteingart teaches
The one or more computer readable media of claim 1, 
wherein analyze the predictions further includes to [perform comparisons] that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) ([Shteingart paragraph [0024]: feature comparison module performing comparison of perturbed results 172 against results 170, and deviation statistics are generated for each perturbed feature from the set of perturbed input data sets, where the scores are generated by the scorer for each output (see [Shteingart paragraph [0022]]) (“The feature comparison module 140 is a component of the system that is configured to receive the output from the classifier 110 from each of the perturbed data sets 161 and to compare the results 171 of that classification with the results 170 from the original classification of the unperturbed data set (i.e. first data set 155). For each feature in the data set the feature comparison module 140 computes a deviation statistic for that feature such as mean label or score deviation.”).] [Shteingart paragraph [0024]: feature comparison module using deviation statistics to compare between results 170 and 171, where the perturbed result 171 may be stronger towards the original result (“from those whose prediction m(Pk) is the same as m(P)”) or closer to the opposite result (“whose prediction m(Pk) is different than m(P)”) (“The results 171 from the classifier 110 may indicate that the particular feature caused the change to be more or less in favor of a particular result. That is the result may be stronger towards the original result for the first data set 155 or move the score closer to the opposite result.”).]).  
However, Shteingart does not teach
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P).  
Howard teaches
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) ([Howard Figure 2, elements 21, 26, 27, 25, paragraph [0048]: a rule inducer component that receives a series of input data and corresponding outputs from a classifier (“… the computer system 1 comprises a data classifier 21 and a rule inducer 25. A series of input data from a data source 22 is input to the data classifier 21 to produce a corresponding set of outputs 23. These outputs comprise information about which of a number of classes 23 each input is a member of. The series of input data from the data source 22 is also input to the rule inducer 25 as indicated by arrow 26. The rule inducer 25 also receives information about the corresponding series of outputs from the data classifier 21 as indicated by arrow 27. Given these inputs 26, 27 the rule inducer 25 produces a series of rules 24 which describe relationships between the series of input data provided to the data classifier 21 and the corresponding series of outputs produced by the data classifier.”).] [Howard Figure 7; paragraph [0102]-[0108]: using a rule inducer algorithm to infer rules (shown in [Howard Figure 7]), where there are identified rules associated to different classes C0, C1, C2, C3, where the numbers within the brackets indicate the number of examples (“perturbed data points”) assigned to a particular class (“distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P)”) (“…It is not essential to use the CN2 algorithm for the rule inducer 25, alternative rule induction techniques can be used. A rule inducer is a means by which a rule-based system can learn by example. The process of rule induction involves the creation of rules from a set of examples. The idea is to create rules which describe general concepts of the example set. The term rule inducer is used here to refer to any system which involves the creation of rules from a set of examples. … CN2 is a rule induction algorithm which takes a set of examples (that are vectors of attribute values and information about which class each example is a member of) and generates a set of rules for classifying them. For example, a rule might take the form: “If telephone call number 10 has attribute A and attribute B but not attribute C then it is a member of Class 2. … the rules take the form shown in FIG. 7. This shows 6 IF-THEN rules 71 and a default condition 72. The attribute names 73 correspond to the attributes shown in FIG. 5 and the rules specify various threshold valves for the attributes 74. Each rule has a THEN portion specifying a membership of a particular Class 75. The numbers in square brackets 76 indicate how many examples met the conditions of the rule and were assigned to the particular class.”).]).  
Both Shteingart and Howard are analogous art since both teach understanding and interpreting classifier models through analyzing the corresponding outputs (predictions, class labels) of the inputs (examples, data points) generated by the classifier model.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart and incorporate the rule inducer algorithm of Howard to generate a set of inferred rules for the classifier model. The motivation to combine is taught in Howard, as classifier outputs can be hard to interpret, with few details why a particular input is classified to a certain class. Deriving rules using defined algorithms to induce the rules from the inputs and outputs reduces the amount of complex analysis and known examples needed to characterize the classifier, saving computational time and resource time, as well as providing explanations that can be understood ([Howard paragraphs [0006]-[0008]: “One problem is that the output of classifiers is often difficult to interpret. This is especially the case when unsupervised training has been used. The classifier output specifies which of a certain number of classes each input has been placed into. The user is given no explanation of what the classes mean in terms of the particular task or problem domain. Neither is the user provided with any information about why a particular input has been classified in the way that it has. … Previously, users have needed to carry out complex analyses of the classifier in order to obtain these kinds of explanations. Known examples can be input to the classifier and the outputs compared with the expected outputs. However, in order to do this known examples must be available and this is often not the case. Even when known examples can be obtained this is often a lengthy and expensive procedure. … … A further problem is that because these kinds of explanations are not available the user's confidence in the system is reduced. This means that the user is less likely to run the system, thus reducing the value of such a system.”]).
Regarding Claim 16, Shteingart teaches
The computing system of claim 13, 
wherein analyze the predictions further includes to [perform comparisons] that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) (This claim element is similar to a corresponding claim element in Claim 3, and hence is rejected under similar rationale.).
However, Shteingart does not teach
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P).  
Howard teaches
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is the same as m(P) (This claim element is similar to a corresponding claim element in Claim 3, and hence is rejected under similar rationale.).
Both Shteingart and Howard are analogous art since both teach understanding and interpreting classifier models through analyzing the corresponding outputs (predictions, class labels) of the inputs (examples, data points) generated by the classifier model.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart and incorporate the rule inducer algorithm of Howard to generate a set of inferred rules for the classifier model. The motivation to combine is taught in Howard, as classifier outputs can be hard to interpret, with few details why a particular input is classified to a certain class. Deriving rules using defined algorithms to induce the rules from the inputs and outputs reduces the amount of complex analysis and known examples needed to characterize the classifier, saving computational time and resource time, as well as providing explanations that can be understood by a user, thus enhancing the usability of the classifier as the user is more confident that the classifier is operating in an expected way ([Howard paragraphs [0006]-[0008]: “One problem is that the output of classifiers is often difficult to interpret. This is especially the case when unsupervised training has been used. The classifier output specifies which of a certain number of classes each input has been placed into. The user is given no explanation of what the classes mean in terms of the particular task or problem domain. Neither is the user provided with any information about why a particular input has been classified in the way that it has. … Previously, users have needed to carry out complex analyses of the classifier in order to obtain these kinds of explanations. Known examples can be input to the classifier and the outputs compared with the expected outputs. However, in order to do this known examples must be available and this is often not the case. Even when known examples can be obtained this is often a lengthy and expensive procedure. … … A further problem is that because these kinds of explanations are not available the user's confidence in the system is reduced. This means that the user is less likely to run the system, thus reducing the value of such a system.”]).
Regarding Claim 19, Shteingart in view of Howard teaches
The computing system of claim 16, 
wherein the rules include a threshold above or below which the prediction m(Pk) changes from m(P) ([Figure 7; paragraph [0102]-[0108]: using a rule inducer algorithm to infer rules (shown in [Howard Figure 7]), where there are identified rules associated to different classes C0, C1, C2, where the numbers within the brackets indicate the number of examples (“perturbed data points”) assigned to a particular class, with each rule specifying thresholds above or below for each predicted target class, such that those rules below it will capture a different predicted class that is not satisfied by the current rule (“the rules include a threshold above or below with the prediction m(Pk) changes from m(P)”) (“…It is not essential to use the CN2 algorithm for the rule inducer 25, alternative rule induction techniques can be used. A rule inducer is a means by which a rule-based system can learn by example. The process of rule induction involves the creation of rules from a set of examples. The idea is to create rules which describe general concepts of the example set. The term rule inducer is used here to refer to any system which involves the creation of rules from a set of examples. … CN2 is a rule induction algorithm which takes a set of examples (that are vectors of attribute values and information about which class each example is a member of) and generates a set of rules for classifying them. For example, a rule might take the form: “If telephone call number 10 has attribute A and attribute B but not attribute C then it is a member of Class 2. … the rules take the form shown in FIG. 7. This shows 6 IF-THEN rules 71 and a default condition 72. The attribute names 73 correspond to the attributes shown in FIG. 5 and the rules specify various threshold valves for the attributes 74. Each rule has a THEN portion specifying a membership of a particular Class 75. The numbers in square brackets 76 indicate how many examples met the conditions of the rule and were assigned to the particular class.”).]), 
or 
changes at least by a pre-defined distance from m(P).  
Claims 4, 5, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Shteingart et al., U.S. PGPUB 2016/0379133, published 12/29/2016  [hereafter referred as Shteingart] in view of Liu et al., Integrating Classification and Association Rule Mining, KDD-98 Proceedings, AAAI, 1998, 7 pages [hereafter referred as Liu].
Regarding Claim 4, Shteingart teaches
The one or more computer readable media of claim 1, 
wherein analyze the predictions further includes to [perform comparisons] that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) ) from those whose m(Pk) is within a … distance of m(P) ([Shteingart paragraph [0024]: perturbed results against original results are compared, and deviation statistics are generated for each perturbed feature from the set of perturbed input data sets, where the act of comparing is interpreted as distinguishing outputs from the perturbed data points and the input data point, with the generation of scores by the scorer for each output being the focus for the comparison (see [Shteingart paragraph [0022]]) (“wherein analyze the predictions further includes to [perform comparisons] that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P)…”) (“The feature comparison module 140 is a component of the system that is configured to receive the output from the classifier 110 from each of the perturbed data sets 161 and to compare the results 171 of that classification with the results 170 from the original classification of the unperturbed data set (i.e. first data set 155). For each feature in the data set the feature comparison module 140 computes a deviation statistic for that feature such as mean label or score deviation.”).] [Shteingart paragraph [0024]: using deviation statistic to compare between results 170 and 171, where the perturbed result 171 may be stronger towards the original result (“from those whose m(Pk) is within a … distance of m(P)”) or closer to the opposite result (“whose prediction m(Pk) is different than m(P)”) (“The results 171 from the classifier 110 may indicate that the particular feature caused the change to be more or less in favor of a particular result. That is the result may be stronger towards the original result for the first data set 155 or move the score closer to the opposite result.”).])…  
However, Shteingart does not teach
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points … whose m(Pk) is within a pre-defined distance of m(P).  
However, Liu teaches
wherein analyze the predictions further includes to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is within a pre-defined distance of m(P) ([Liu p.1 col.1 Abstract, 1st paragraph: inferring rules based on items in a database to build a classifier model (“Classification rule mining aims to discover a small set of rules in the database that forms an accurate classifier.”).] [Liu p.2 col.1 Problem Statement, 1st – 2nd paragraphs: each item in a database contains discretized attributes (“features”) classified into known classes (“predictions”), representing sets of perturbed data points (“Our proposed framework assumes that the dataset is a normal relational table, which consists of N cases described by l distinct attributes. These N cases have been classified into q known classes. An attribute can be a categorical (or discrete) or a continuous (or numeric) attribute. In this work, we treat all the attributes uniformly. For a categorical attribute, all the possible values are mapped to a set of consecutive positive integers. For a continuous attribute, its value range is discretized into intervals, and the intervals are also mapped to consecutive positive integers. With these mappings, we can treat a data case as a set of (attribute, integer-value) pairs and a class label. We call each (attribute, integer-value) pair an item.”).] [Liu p.2 col.2 Basic concepts used in the CBA-RG algorithm: given a set of rules <condset -> y> (condset is a set of items, y is the class label/prediction), assign a confidence value to each item, and compare each item (sets of perturbed data points) that belong to the same condset based on a confidence value and a minconf value, where the comparison against minconf against all items guarantees that each item is within a certain range of minconf (which is a pre-defined distance) (“from those whose m(Pk) is within a pre-defined distance of m(P)”) (“For all the ruleitems that have the same condset, the ruleitem with the highest confidence is chosen as the possible rule (PR) representing this set of ruleitems. If there are more than one ruleitem with the same highest confidence, we randomly select one ruleitem. For example, we have two ruleitems that have the same condset: 1. <{(A, 1), (B, 1)}, (class: 1)>. 2. <{(A, 1), (B, 1)}, (class: 2)>. Assume the support count of the condset is 3. The support count of the first ruleitem is 2, and the second ruleitem is 1. Then, the confidence of ruleitem 1 is 66.7%, while the confidence of ruleitem 2 is 33.3% With these two ruleitems, we only produce one PR (assume |D| = 10): (A, 1), (B, 1) ® (class, 1) [supt = 20%, confd= 66.7%] If the confidence is greater than minconf, we say the rule is accurate. The set of class association rules (CARs) thus consists of all the PRs that are both frequent and accurate.”).]).  
Both Shteingart and Liu are analogous art since both are in the realm of understanding classifier models, and both teach understanding and interpreting perturbed data points through analysis of the data points and their associated class label (predictions).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart and incorporate the comparison and rule generation steps of Liu to generate a set of inferred rules for a classifier model. The motivation to combine is taught in Liu, as this comparison allows for ([Liu p.1 col.2 last paragraph – p.2 col.1 2nd paragraph: “The framework helps to solve the understandability problem (Clark and Matwin 1993; Pazzani, Mani and Shankle 1997) in classification rule mining. Many rules produced by standard classification systems are difficult to understand because these systems use domain independent biases and heuristics to generate a small set of rules to form a classifier. These biases, however, may not be in agreement with the existing knowledge of the human user, thus resulting in many generated rules that make no sense to the user, while many understandable rules that exist in the data are left undiscovered. With the new framework, the problem of finding understandable rules is reduced to a postprocessing task (since we generate all the rules). … · A related problem is the discovery of interesting or useful rules. The quest for a small set of rules of the existing classification systems results in many interesting and useful rules not being discovered. … Unfortunately, the classification system (we used C4.5) just could not find such rules even though such rules do exist as discovered by our system. … · In the new framework, the database can reside on disk rather than in the main memory. Standard classification systems need to load the entire database into the main memory (e.g., Quinlan 1992), although some work has been done on the scaling up of classification systems (Mahta, Agrawal and Rissanen 1996).”]).
Regarding Claim 5, Shteingart teaches
The one or more computer readable media of claim 1.
However Shteingart does not teach
wherein [a] pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence.
Liu teaches 
wherein [a] pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence ([Liu p.2 col.2 Basic concepts used in the CBA-RG algorithm: given a set of rules <condset -> y> (condset is a set of items, y is the class label/prediction), assign a confidence value to each item, and compare each item that belong to the same condset based on a confidence value and a minconf value, where the minconf represents a pre-defined distance (“from those whose m(Pk) is within a pre-defined distance of m(P)”), where the comparisons between conf values between items and the minconf value represent a linear (e.g., Euclidean) distance metric (“For all the ruleitems that have the same condset, the ruleitem with the highest confidence is chosen as the possible rule (PR) representing this set of ruleitems. If there are more than one ruleitem with the same highest confidence, we randomly select one ruleitem. For example, we have two ruleitems that have the same condset: 1. <{(A, 1), (B, 1)}, (class: 1)>. 2. <{(A, 1), (B, 1)}, (class: 2)>. Assume the support count of the condset is 3. The support count of the first ruleitem is 2, and the second ruleitem is 1. Then, the confidence of ruleitem 1 is 66.7%, while the confidence of ruleitem 2 is 33.3% With these two ruleitems, we only produce one PR (assume |D| = 10): (A, 1), (B, 1) ® (class, 1) [supt = 20%, confd= 66.7%] If the confidence is greater than minconf, we say the rule is accurate. The set of class association rules (CARs) thus consists of all the PRs that are both frequent and accurate.”).]).  
Both Shteingart and Liu are analogous art since both are in the realm of understanding classifier models, and both teach understanding and interpreting perturbed data points through analysis of the data points and their associated class label (predictions).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart and incorporate the comparison and rule generation steps of Liu to generate a set of inferred rules for a classifier model. The motivation to combine is taught in Liu, as this comparison allows for identifying a minimum set of interesting rules that are easily understood by a user (thus ([Liu p.1 col.2 last paragraph – p.2 col.1 2nd paragraph: “The framework helps to solve the understandability problem (Clark and Matwin 1993; Pazzani, Mani and Shankle 1997) in classification rule mining. Many rules produced by standard classification systems are difficult to understand because these systems use domain independent biases and heuristics to generate a small set of rules to form a classifier. These biases, however, may not be in agreement with the existing knowledge of the human user, thus resulting in many generated rules that make no sense to the user, while many understandable rules that exist in the data are left undiscovered. With the new framework, the problem of finding understandable rules is reduced to a postprocessing task (since we generate all the rules). … · A related problem is the discovery of interesting or useful rules. The quest for a small set of rules of the existing classification systems results in many interesting and useful rules not being discovered. … Unfortunately, the classification system (we used C4.5) just could not find such rules even though such rules do exist as discovered by our system. … · In the new framework, the database can reside on disk rather than in the main memory. Standard classification systems need to load the entire database into the main memory (e.g., Quinlan 1992), although some work has been done on the scaling up of classification systems (Mahta, Agrawal and Rissanen 1996).”]).
Regarding Claim 17, Shteingart teaches
The computing system of claim 13, 
wherein to analyze the predictions the model characterization engine is further to [perform comparisons] that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) ) from those whose m(Pk) is within a … distance of m(P) (This claim element is similar to a corresponding claim element in Claim 4, and hence is rejected under similar rationale.).  

wherein to analyze the predictions the model characterization engine is further to infer rules that distinguish those ones of the perturbed data points … whose m(Pk) is within a pre-defined distance of m(P).  
However, Liu teaches
wherein to analyze the predictions the model characterization engine is further to infer rules that distinguish those ones of the perturbed data points whose prediction m(Pk) is different than m(P) from those whose m(Pk) is within a pre-defined distance of m(P) (This claim element is similar to a corresponding claim element in Claim 4, and hence is rejected under similar rationale.).
Both Shteingart and Liu are analogous art since both are in the realm of understanding classifier models, and both teach understanding and interpreting perturbed data points through analysis of the data points and their associated class label (predictions).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart and incorporate the comparison and rule generation steps of Liu to generate a set of inferred rules for a classifier model. The motivation to combine is taught in Liu, as this comparison allows for identifying a minimum set of interesting rules that are easily understood by a user (thus improving understandability), and the comparison algorithm does not require loading the entire dataset into memory, thus improving overall performance of the system ([Liu p.1 col.2 last paragraph – p.2 col.1 2nd paragraph: “The framework helps to solve the understandability problem (Clark and Matwin 1993; Pazzani, Mani and Shankle 1997) in classification rule mining. Many rules produced by standard classification systems are difficult to understand because these systems use domain independent biases and heuristics to generate a small set of rules to form a classifier. These biases, however, may not be in agreement with the existing knowledge of the human user, thus resulting in many generated rules that make no sense to the user, while many understandable rules that exist in the data are left undiscovered. With the new framework, the problem of finding understandable rules is reduced to a postprocessing task (since we generate all the rules). … · A related problem is the discovery of interesting or useful rules. The quest for a small set of rules of the existing classification systems results in many interesting and useful rules not being discovered. … Unfortunately, the classification system (we used C4.5) just could not find such rules even though such rules do exist as discovered by our system. … · In the new framework, the database can reside on disk rather than in the main memory. Standard classification systems need to load the entire database into the main memory (e.g., Quinlan 1992), although some work has been done on the scaling up of classification systems (Mahta, Agrawal and Rissanen 1996).”]).
Regarding Claim 18, Shteingart in view of Liu teaches
The computing system of claim 17, 
wherein the pre-defined distance is one of Euclidean, L_1 norm, max_norm, or KL-divergence (This claim element is similar to a corresponding claim element in Claim 5, and hence is rejected under similar rationale.).  
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Shteingart et al., U.S. PGPUB 2016/0379133, published 12/29/2016  [hereafter referred as Shteingart] in view of Liu et al., Integrating Classification and Association Rule Mining, KDD-98 Proceedings, AAAI, 1998, 7 pages [hereafter referred as Liu] as applied to Claim 4; in further view of Howard et al., U.S. PGPUB 2002/0169736, published 11/14/2002 [hereafter referred as Howard].
Regarding Claim 6, Shteingart in view of Liu as applied to Claim 4 teaches
The one or more computer readable media of claim 4.
However Shteingart in view of Liu does not teach 
wherein the rules include a threshold above or below which the prediction m(Pk) changes from m(P).  

wherein the rules include a threshold above or below which the prediction m(Pk) changes from m(P) ([Figure 7; paragraph [0102]-[0108]: using a rule inducer algorithm to infer rules (shown in [Howard Figure 7]), where there are identified rules associated to different classes C0, C1, C2, where the numbers within the brackets indicate the number of examples (“perturbed data points”) assigned to a particular class, with each rule specifying thresholds above or below for each predicted target class, such that those rules below it will capture a different predicted class that is not satisfied by the current rule (“the rules include a threshold above or below with the prediction m(Pk) changes from m(P)”) (“…It is not essential to use the CN2 algorithm for the rule inducer 25, alternative rule induction techniques can be used. A rule inducer is a means by which a rule-based system can learn by example. The process of rule induction involves the creation of rules from a set of examples. The idea is to create rules which describe general concepts of the example set. The term rule inducer is used here to refer to any system which involves the creation of rules from a set of examples. … CN2 is a rule induction algorithm which takes a set of examples (that are vectors of attribute values and information about which class each example is a member of) and generates a set of rules for classifying them. For example, a rule might take the form: “If telephone call number 10 has attribute A and attribute B but not attribute C then it is a member of Class 2. … the rules take the form shown in FIG. 7. This shows 6 IF-THEN rules 71 and a default condition 72. The attribute names 73 correspond to the attributes shown in FIG. 5 and the rules specify various threshold valves for the attributes 74. Each rule has a THEN portion specifying a membership of a particular Class 75. The numbers in square brackets 76 indicate how many examples met the conditions of the rule and were assigned to the particular class.”).]).  
Both Shteingart in view of Liu and Howard are analogous art since both teach understanding and interpreting classifier models through analyzing the corresponding outputs 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to enhance the feature comparison module of Shteingart in view of Liu and incorporate the rule inducer algorithm of Howard to generate a set of inferred rules for the classifier model. The motivation to combine is taught in Howard, as classifier outputs can be hard to interpret, with few details why a particular input is classified to a certain class. Deriving rules using defined algorithms to induce the rules from the inputs and outputs reduces the amount of complex analysis and known examples needed to characterize the classifier, saving computational time and resource time, as well as providing explanations that can be understood by a user, thus enhancing the usability of the classifier as the user is more confident that the classifier is operating in an expected way ([Howard paragraphs [0006]-[0008]: “One problem is that the output of classifiers is often difficult to interpret. This is especially the case when unsupervised training has been used. The classifier output specifies which of a certain number of classes each input has been placed into. The user is given no explanation of what the classes mean in terms of the particular task or problem domain. Neither is the user provided with any information about why a particular input has been classified in the way that it has. … Previously, users have needed to carry out complex analyses of the classifier in order to obtain these kinds of explanations. Known examples can be input to the classifier and the outputs compared with the expected outputs. However, in order to do this known examples must be available and this is often not the case. Even when known examples can be obtained this is often a lengthy and expensive procedure. … … A further problem is that because these kinds of explanations are not available the user's confidence in the system is reduced. This means that the user is less likely to run the system, thus reducing the value of such a system.”])
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Shteingart et al., U.S. PGPUB 2016/0379133, published 12/29/2016  [hereafter referred as Shteingart] in view of BigML.com blog post: Streaming Histograms for Clojure and Java https://blog.bigml.com/2013/02/11/streaming-histograms-for-clojure-and-java/, published online date: 2/11/2013, incorporating directly via hyperlink reference without modification BigML.com Github Histogram README.md https://github.com/bigmlcom/histogram/blob/master/README.md, published online date: 7/29/2015 [hereafter collectively referred as BigML.com].
Regarding Claim 9, Shteingart teaches
The one or more computer readable media of claim 1, further comprising instructions that, when executed, cause the computing system to …
However, Shteingart does not teach
access a histogram for each of the features of P, that indicates which bin of the histogram the value of each of the features in P falls within.  
BigML.com teaches 
access a histogram for each of the features of P, that indicates which bin of the histogram the value of each of the features in P falls within ([BigML.com (Github Histogram README.md Overview, 2nd paragraph): using histograms for learning, visualization, and analysis (“access a histogram…”) (“The histograms act as an approximation of the underlying dataset. They can be used for learning, visualization, discretization, or analysis. The histograms may be built independently and merged, making them convenient for parallel and distributed algorithms.”).] [BigML.com (Github Histogram README.md Basics, 8th paragraph): a histogram containing values for points within each bin (“…for each of the features of P, that indicates which bin of the histogram the value of each of the features in P falls within”) (“The histogram approximates distributions using a constant number of bins. This bin limit is a parameter when creating a histogram … A bin contains a :count of the points within the bin along with the :mean for the values in the bin.”).]).  
Both Shteingart and BigML.com are analogous art since both are in the same realm of machine learning and perform analysis on machine learning systems. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the feature comparison module of Shteingart and incorporate the step of accessing histograms of BigML.com as a way to access and sample feature values for data points. The motivation to combine is taught in BigML.com, where histograms are compact representations of data that contain statistical information (mean, median, percentiles) that can support incoming streaming data well by updating the data in real-time, and can handle parallel and distributed applications in creating decision trees. The fact that histograms are compact representations of data also indicates that they require less memory space in the system, thus saving on memory usage and data access time in the system, thereby improving overall performance in the system ([BigML.com (Github Histogram README.md Overview, 2nd paragraph): “The histograms act as an approximation of the underlying dataset. They can be used for learning, visualization, discretization, or analysis. The histograms may be built independently and merged, making them convenient for parallel and distributed algorithms.”).] [BigML.com (Streaming Histograms for Clojure and Java, 2nd-5th paragraphs): “The histograms are a handy way to compress streams of numeric data. When you want to summarize a stream using limited memory there are two general options. You can either store a sample of data in hopes that it is representative of the whole (such as a reservoir sample) or you can construct some summary statistics, updating as data arrives. The histogram library provides a tool for the latter approach. … Since the histogram provides an approximation of the data’s original distribution, you can find all the basic stats you’d expect, such as mean, median, and arbitrary percentiles.  … The histograms have a few more tricks. Along with the primary variable the histograms can track information about secondary numeric or categorical variables. We use this feature when growing decision trees, but it could be useful whenever you want to watch for correlation between variables in a streaming context. ”]).
Regarding Claim 10, Shteingart in view of BigML.com teaches
The one or more computer readable media of claim 9, 
wherein changing the value of at least one feature of P for each perturbed input data point Pk includes selecting a new value for that feature from a different bin of the histogram ([BigML.com (Streaming Histograms for Clojure and Java, 2nd paragraph): using histograms to represent streams of numeric data for sampling, where “sampling” is interpreted as selecting values from a list of possible values, where it is apparent that sampling a histogram involves selecting values from different bins of the histogram (“wherein changing the value of at least one feature of P for each perturbed input data point Pk includes selecting a new value for that feature from a different bin of the histogram”) (“The histograms are a handy way to compress streams of numeric data. When you want to summarize a stream using limited memory there are two general options. You can either store a sample of data in hopes that it is representative of the whole (such as a reservoir sample) or you can construct some summary statistics, updating as data arrives.”).]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121