Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/25/2020 was filed before the mailing date of the first office action. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.	

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101. Claims 1-20 are directed to three methods of different scope, one group of method claims 1-10, a second group of method claims 11-16, and a third group of method claims 17-20. MPEP 2106.03(I) states “software expressed as code or a set of instructions detached from any medium is an idea without physical embodiment”. While Applicant’s specification makes clear that the claimed methods of the invention are performed by a computer (see at least paragraphs [0036]-[0037]), but Applicant has not claimed these elements; therefore, the claimed invention is directed to non-statutory subject matter. Additionally, claims 1-20 fall within the judicial exception of an abstract idea, specifically the abstract ideas of “Mental Processes” (including observation, evaluation, and opinion) and “Mathematical Concepts (including mathematical calculations and relationships)”.
	
Claim 1:
Step 1: see above for the statutory rejection of claim 1.
Step 2A, Prong 1: Claim 1 recites the following abstract ideas:
determining a first set of features of a first sample and a second set of features of a second sample (mental step directed to evaluation – a person could determine which features should belong in a first or second set in their mind);
determining a set of overlapping features of the first and second sets of features (mental step directed to evaluation – a person could determine which features in first or second set would overlap in their mind);
and a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (mental step directed to evaluation, judgement – a person could visualize and analyze features of a machine learning model and determine whether samples are similar in their mind).
Step 2A, Prong 2: Claim 1 recites the following additional elements:
presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Presenting a set of analyzed features are interpreted as transmitting data over a network, which does not integrate the abstract idea into a practical application. Examiner also notes that the visualization technique is performed after any machine learning has occurred and that the claim does not require any additional machine learning techniques to be performed as part of the visualization process.
Step 2B: Claim 1 recites the following additional elements:
presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Presenting a set of analyzed features are interpreted as transmitting data over a network, which does not amount to significantly more (see MPEP 2106.05(d)(II)).
	Claim 11:
Step 1: see above for the statutory rejection of claim 11.
Step 2A, Prong 1: Claim 11 recites the following abstract ideas:
determining a first set of features of a first sample and a second set of features of a second sample (mental step directed to evaluation – a person could determine which features should belong in a first or second set in their mind);
determining a set of overlapping features of the first and second sets of features (mental step directed to evaluation – a person could determine which features in first or second set would overlap in their mind);
and using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (mental step directed to evaluation, judgement – a person could visualize and analyze features of a machine learning model and determine whether samples are similar in their mind).
Step 2A, Prong 2: Claim 11 recites the following additional elements:
wherein the first and second sets of features are a function of non-zero outputs from nodes of a predetermined layer of the plurality of layers; and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Defining the features as non-zero outputs from a predetermined layer is interpreted as selecting a particular data source or type of data to be manipulated; presenting a set of analyzed features are interpreted as transmitting data over a network. These elements do not integrate the abstract idea into a practical application.
Step 2B: Claim 11 recites the following additional elements:
wherein the first and second sets of features are a function of non-zero outputs from nodes of a predetermined layer of the plurality of layers; and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Defining the features as non-zero outputs from a predetermined layer is interpreted as selecting a particular data source or type of data to be manipulated; presenting a set of analyzed features are interpreted as transmitting data over a network. These elements do not amount to significantly more (see MPEP 2106.05(g) and MPEP 2106.05(d)(II)).
	Claim 17:
Step 1: see above for the statutory rejection of claim 17.
Step 2A, Prong 1: Claim 17 recites the following abstract ideas:
determining a first set of features of a first sample and a second set of features of a second sample (mental step directed to evaluation – a person could determine which features should belong in a first or second set in their mind), wherein the first and second sets of features are based on gradients of nodes of a last convolutional layer of the plurality of layers (defining the first and second features sets as gradients of nodes is interpreted as a mathematical relationship);
determining a set of overlapping features of the first and second sets of features by rank-ordering outputs of the nodes of the last convolutional layer and selecting a predetermined number of highest ranked overlapping features (mental step directed to evaluation, judgement – a person could determine overlapping features by rank-ordering node outputs and selected a number of highest ranked features in their mind); 
and using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (mental step directed to evaluation, judgement – a person could visualize and analyze features of a machine learning model and determine whether samples are similar in their mind).
Step 2A, Prong 2: Claim 17 recites the following additional elements:
presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Presenting a set of analyzed features are interpreted as transmitting data over a network, which does not integrate the abstract idea into a practical application.
Step 2B: Claim 17 recites the following additional elements:
presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample. Presenting a set of analyzed features are interpreted as transmitting data over a network, which does not amount to significantly more (see MPEP 2106.05(d)(II)).
The independent claims are not patent eligible.
Dependent claims 2-10, 12-16, and 18-20 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the claims are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claim 2:
Step 1: see above for the statutory rejection of claim 2.
Step 2A, Prong 1: Claim 2 recites the abstract ideas from claim 1 on which it depends.
Step 2A, Prong 2: Claim 2 recites the following additional elements:
the first sample is an input sample to the machine learning model for classification and the second sample is a nearest neighbor to the first sample. Defining a given sample as an input to a machine learning model and a neighboring sample is interpreted as selecting a particular data source or type of data to be manipulated, which does not integrate the abstract idea into a practical application.
Step 2B: Claim 2 recites the following additional elements:
the first sample is an input sample to the machine learning model for classification and the second sample is a nearest neighbor to the first sample. Defining a given sample as an input to a machine learning model and a neighboring sample is interpreted as selecting a particular data source or type of data to be manipulated, which does not amount to significantly more (see MPEP 2106.05(g)).
Claim 3:
Step 1: see above for the statutory rejection of claim 3.
Step 2A, Prong 1: Claim 3 recites the abstract ideas from claim 1 on which it depends.
Step 2A, Prong 2: Claim 3 recites the following additional elements:
the machine learning model is based on a neural network having a plurality of layers, and wherein the first and second sets of features are non-zero outputs from nodes of a predetermined layer of the plurality of layers. The limitation defines the machine learning model as a neural network but does not actively require machine learning to be performed; therefore the neural network is interpreted as generic computer component, which does not integrate the abstract idea into a practical application.
Step 2B: Claim 3 recites the following additional elements:
the machine learning model is based on a neural network having a plurality of layers, and wherein the first and second sets of features are non-zero outputs from nodes of a predetermined layer of the plurality of layers. The limitation defines the machine learning model as a neural network but does not actively require machine learning to be performed; therefore the neural network is interpreted as generic computer component, which does not amount to significantly more (see MPEP 2106.05(d)).
Claim 4:
Step 1: see above for the statutory rejection of claim 4.
Step 2A, Prong 1: Claim 4 recites the following abstract ideas:
a ranking of a feature is a function of an output of a node multiplied by a gradient of the node (multiplying an output of a node by a gradient of a node is interpreted as a mathematical calculation).
Step 2A, Prong 2: Claim 4 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 4 does not recite any additional elements and therefore does not amount to significantly more.
Claim 5:
Step 1: see above for the statutory rejection of claim 5.
Step 2A, Prong 1: Claim 5 recites the abstract ideas from claim 3 on which it depends.
Step 2A, Prong 2: Claim 5 recites the following additional elements:
the predetermined layer is a last convolutional layer of the neural network. The limitation defines a predetermined layer as a convolutional layer of a neural network does not actively require machine learning to be performed; therefore the layers of a neural network are interpreted as generic computer components, which do not integrate the abstract idea into a practical application.
Step 2B: Claim 5 recites the following additional elements:
the predetermined layer is a last convolutional layer of the neural network. The limitation defines a predetermined layer as a convolutional layer of a neural network does not actively require machine learning to be performed; therefore the layers of a neural network are interpreted as generic computer components, which do not amount to significantly more (see MPEP 2106.05(d)).
Claim 6:
Step 1: see above for the statutory rejection of claim 6.
Step 2A, Prong 1: Claim 6 recites the following abstract ideas:
rank-ordering the non-zero outputs from nodes of the predetermined layer (mental step directed to evaluation – a person could rank order a set of node outputs in their mind); 
and selecting a predetermined number of highest ranked features (mental step directed to evaluation, judgement – a person could select a number of highest ranked features in their mind).
Step 2A, Prong 2: Claim 6 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 6 does not recite any additional elements and therefore does not amount to significantly more.
Claim 7:
Step 1: see above for the statutory rejection of claim 7.
Step 2A, Prong 1: Claim 7 recites the following abstract ideas:
inverting the outputs of the nodes of the predetermined layer to maximize activation of the overlapping features (inverting node outputs is also interpreted as a mathematical calculation).
Step 2A, Prong 2: Claim 7 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 7 does not recite any additional elements and therefore does not amount to significantly more.
Claim 8:
Step 1: see above for the statutory rejection of claim 8.
Step 2A, Prong 1: Claim 8 recites the following abstract ideas:
determining a Euclidean distance between nodes of an intermediate layer as a function of the non-zero outputs of the nodes of the intermediate layer and gradients of the nodes of the intermediate layer (determining a Euclidean distance between nodes is interpreted as a mathematical calculation).
Step 2A, Prong 2: Claim 8 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 8 does not recite any additional elements and therefore does not amount to significantly more.
Claim 9:
Step 1: see above for the statutory rejection of claim 9.
Step 2A, Prong 1: Claim 9 recites the following abstract ideas:
using a heat map or a feature map to correlate a predetermined number of features of the set of overlapping features (correlating features is interpreted as a mathematical relationship; Examiner notes that using a heat map is a specific way to display data, which is interpreted as transmitting data over a network).
Step 2A, Prong 2: Claim 9 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 9 does not recite any additional elements and therefore does not amount to significantly more.
Claim 10:
Step 1: see above for the statutory rejection of claim 10.
Step 2A, Prong 1: Claim 10 recites the following abstract ideas:
determining areas of the first and second samples that cause the activation of the overlapping features using one of a heat map or a feature map (mental step directed to observation, evaluation – a person could determine which samples cause features to overlap in their mind having observed a heat map).
Step 2A, Prong 2: Claim 10 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B: Claim 10 does not recite any additional elements and therefore does not amount to significantly more.
Claim 12 is a method claim and its limitation is included in claim 4. Claim 12 is rejected for the same reasons as claim 4.
Claim 13 is a method claim and its limitation is included in claim 5. Claim 13 is rejected for the same reasons as claim 5.
Claim 14 is a method claim and its limitation is included in claim 6. Claim 14 is rejected for the same reasons as claim 6.
Claim 15 is a method claim and its limitation is included in claim 7. Claim 15 is rejected for the same reasons as claim 7.
Claim 16 is a method claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Claim 18 is a method claim and its limitation is included in claim 7. Claim 18 is rejected for the same reasons as claim 7.
Claim 19 is a method claim and its limitation is included in claim 8. Claim 19 is rejected for the same reasons as claim 8.
Claim 20 is a method claim and its limitation is included in claim 9. Claim 20 is rejected for the same reasons as claim 9.
Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-14, 16-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wojton et al (US 20120033863 A1, herein Wojton) in view of Selvaraju et al (“Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, herein Selvaraju).
Regarding claim 1, Wojton teaches a method for analyzing data samples of a machine learning model (para. [0008] recites “These and other aspects, features, and implementations, and combinations of them, can be expressed as methods, means or steps for performing functions, business methods, program products, compositions, apparatus, systems, components, and manufactures, and in other ways”. Fig. 10 and para. [0016] recite “In the training data set, each case is characterized by values 20 that are derived from the data representing the case and are values for corresponding features 22”. Para. [0016] also recites “These features (and combinations of them) can be analyzed to produce scores 24 that represent an expected relative usefulness of the features in classifying cases. The scores can serve as a basis for selecting a preferred feature 26 (or preferred set of features) for use in classifying cases that are in a test data set 28 (labeled as a non-training data set in FIG. 10), having cases that may be partly or completely distinct from the cases of the training data set 10” (i.e. a method to analyze data samples of a classification model)), the method comprising:
determining a first set of features of a first sample and a second set of features of a second sample (para. [0019] recites “a process 100 can be followed that begins with receiving 105 the first subset of cases in the training data set, and receiving the second subset of cases in the training data set 110”. Para. [0020] recites “For each feature, a value of that feature is determined 115 for each case that belongs to the first subset of the training data set, thus defining a first subset of feature values (that is, a feature vector). The process is then repeated 120 for that feature for each case that belongs to the second subset of the training data set, thereby defining a second subset of feature values (a second feature vector)” (i.e. determining features of a first input and a second input));
determining a set of overlapping features of the first and second sets of features (para. [0037] recites “Many characteristics of distributions of values of a feature may be used to determine an uncertainty measure. For example, an uncertainty measure of a value of a feature may relate to a standard-deviation of values of the feature in the distribution, a range of the values of the feature in the distribution, an integral over a region of a histogram of the values of the feature, a maximum probability in a histogram of values of the feature, or an overlap in histograms of values of the features” Para. [0091] recites “FIG. 8 shows distributions of the values of a feature for both classes for 10 of the 1000 sweeps ( corresponding to sweep indexes 0, 100, 200, 300, 400, 500, 600, 700, 800 and 900)”. Figs. 8E and 8G show examples of overlapping features).
However, Wojton does not explicitly teach presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample.
Selvaraju teaches presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (the abstract of Selvaraju recites “We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept” (i.e. a visualization technique to present analyzed features of a machine learning model)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the feature determination method from Wojton to more robustly determine the features in the visualization method from Selvaraju. Wojton and Selvaraju are both directed to methods of analyzing classification models; therefore, one of ordinary skill in the art would understand how to use the known techniques related to determining input features from Wojton to improve the known classification system from Selvaraju.
Regarding claim 2, the combination of Wojton and Selvaraju teaches the method of claim 1, wherein the first sample is an input sample to the machine learning model for classification and the second sample is a nearest neighbor to the first sample (Wojton para. [0071] recites “The input data may be classified based on the value of the preferred feature (615). A wide variety of classifiers or combinations of classifiers may be used to classify the input data. For example, the classifier may use a nearest-neighbor technique, a maximum-likelihood technique, or a mutual information-maximization technique”. Para. [0074] recites “A nearest-neighbor algorithm could then determine a classification by comparing a value of a preferred feature of an input (testing) data case to that in the templates. As another example, a nearest-neighbor algorithm could determine a classification by comparing a value of a preferred feature of an input (testing) data case to all training-data cases. The classification assigned to the input testing data case may then be the same as the classification of the training data case which was identified to be the nearest neighbor in this analysis” (i.e. the first sample may be a preferred input and a second sample may be a determine nearest neighbor input)).
Regarding claim 3, the combination of Wojton and Selvaraju teaches the method of claim 1, wherein the machine learning model is based on a neural network having a plurality of layers, and wherein the first and second sets of features are non-zero outputs from nodes of a predetermined layer of the plurality of layers (Selvaraju section 3 para. 1 recites “convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. The neurons in these layers look for semantic class-specific information in the image (say object parts). Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only” (i.e. the machine learning model is a convolutional neural network with a plurality of layers, the features are outputs from a specific layer of the CNN)).
Regarding claim 4, the combination of Wojton and Selvaraju teaches the method of claim 3, wherein a ranking of a feature is a function of an output of a node multiplied by a gradient of the node (the description of Selvaraju’s fig. 2 recites “Grad-CAM overview: Given an image and a class of interest (e.g., ‘tiger cat’ or any other type of differentiable output) as input, we forward propagate the image through the CNN part of the model and then through task-specific computations to obtain a raw score for the category. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the rectified convolutional feature maps of interest, which we combine to compute the coarse Grad-CAM localization (blue heatmap) which represents where the model has to look to make the particular decision. Finally, we pointwise multiply the heatmap with guided backpropagation to get Guided Grad-CAM visualizations which are both high-resolution and concept-specific” (i.e. the backpropagated output and the blue gradient are multiplied to calculate a final result. Examiner’s Note: Selvaraju discusses rank correlation broadly, it does not specifically teach rank-ordering features. However, one of ordinary skill in the art would recognize that the rank-ordering methods from Wojton (see para. [0061]) could be applied to the outputs from Selvaraju in order to compare feature scores or outputs)).
Regarding claim 5, the combination of Wojton and Selvaraju teaches the method of claim 3, wherein the predetermined layer is a last convolutional layer of the neural network (Selvaraju section 3 para. 1 recites “convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. The neurons in these layers look for semantic class-specific information in the image (say object parts). Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only” (i.e. the machine learning model is a convolutional neural network with a plurality of layers, the features are outputs from a specific layer of the CNN)).
Regarding claim 6, the combination of Wojton and Selvaraju teaches the method of claim 3, wherein determining a set of overlapping features of the first and second sets of features further comprises: rank-ordering the non-zero outputs from nodes of the predetermined layer; and selecting a predetermined number of highest ranked features (para. [0059] recites “As shown in FIG. 1, multiple scores may be determined, each score being associated with a feature or combination of features”. Para. [0060] recites “A preferred feature may be identified (135, FIG. 1) based on the scores. This can be done in a wide variety of ways and combinations of ways. For example, a feature associated with the highest score may be identified as the preferred feature”. Para. [0061] recites “two or more scores are each associated with a single feature. A subset of the scores may be identified as desirable and the corresponding subset of features or the combination of them may be identified as preferred
features. In some examples, a feature ( or a combination of features) may be identified as preferred if an associated score exceeds a threshold (e.g., a pre-determined threshold). In some examples, a set of features ( or combination of features) may be rank-ordered based on their associated scores. A feature ( or combination of features) may be identified as preferred if it has met a ranking criterion. For example, a feature may be preferred if it is associated with one of the three highest scores” (i.e. outputs can be rank-ordered in order to select a number of highest ranked features)).
Regarding claim 8, the combination of Wojton and Selvaraju teaches the method of claim 3, wherein determining a set of overlapping features of the first and second sets of features further comprises determining a Euclidean distance between nodes of an intermediate layer as a function of the non-zero outputs of the nodes of the intermediate layer and gradients of the nodes of the intermediate layer (Wojton para. [0034] recites “Using a value of a feature associated with cases in data subsets, a score can be determined that indicates a degree to which a value of the feature can be used to predict accurately a classification of other cases (125)”. Para. [0035] recites “The score may depend on a first component that is based on the values or the distributions of the values and a second component that is also based on the values or the distributions of the values”. Para. [0046] recites “The second component (i.e. of the score) may comprise a distance metric component (Fl G. 4, element 410). The distance-metric component may comprise, for example, a Euclidean distance or a Minkowski distance” (i.e. determining a Euclidean distance between non-zero outputs of a machine learning model). Selvaraju section 3 para. 1 recites “convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. The neurons in these layers look for semantic class-specific information in the image (say object parts). Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only” (i.e. the convolutional neural network from Selvaraju teaches nodes and gradients of an intermediate layer of a neural network)).
Regarding claim 9, the combination of Wojton and Selvaraju teaches the method of claim 1, wherein presenting the set of overlapping features using a predetermined visualization technique further comprises using a heat map or a feature map to correlate a predetermined number of features of the set of overlapping features (Selvaraju figs. 1-3 show examples of how heat maps are used to correlate overlapping features. The description of fig. 2 recites “Grad-CAM overview: Given an image and a class of interest (e.g., ‘tiger cat’ or any other type of differentiable output) as input, we forward propagate the image through the CNN part of the model and then through task-specific computations to obtain a raw score for the category. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the rectified convolutional feature maps of interest, which we combine to compute the coarse Grad-CAM localization (blue heatmap) which represents where the model has to look to make the particular decision. Finally, we pointwise multiply the heatmap with guided backpropagation to get Guided Grad-CAM visualizations which are both high-resolution and concept-specific”).
Regarding claim 10, the combination of Wojton and Selvaraju teaches the method of claim 1, wherein presenting the set of overlapping features using a predetermined visualization technique further comprises determining areas of the first and second samples that cause the activation of the overlapping features using one of a heat map or a feature map (Selvaraju figs. 1-3 show examples of how heat maps are used to correlate overlapping features. Section 3.2 para. 1 recites “See Figure 1c, where Grad-CAM can easily localize the cat; however, it is unclear from the coarse heatmap why the network predicts this particular instance as ‘tiger cat’. In order to combine the best aspects of both, we fuse Guided Backpropagation and Grad-CAM visualizations via element-wise multiplication (LcGrad-CAM is first upsampled to the input image resolution using bilinear interpolation). Fig. 2 bottom-left illustrates this fusion. This visualization is both high-resolution (when the class of interest is ‘tiger cat’, it identifies important ‘tiger cat’ features like stripes, pointy ears and eyes) and class-discriminative (it highlights the ‘tiger cat’ but not the ‘boxer (dog)’)” (i.e. determining areas of the sample that cause activation of the overlapping features using a heat map)).
Regarding claim 11, Wojton teaches a method for analyzing data samples of a machine learning model [based on a neural network having a plurality of layers] (para. [0008] recites “These and other aspects, features, and implementations, and combinations of them, can be expressed as methods, means or steps for performing functions, business methods, program products, compositions, apparatus, systems, components, and manufactures, and in other ways”. Fig. 10 and para. [0016] recite “In the training data set, each case is characterized by values 20 that are derived from the data representing the case and are values for corresponding features 22”. Para. [0016] also recites “These features (and combinations of them) can be analyzed to produce scores 24 that represent an expected relative usefulness of the features in classifying cases. The scores can serve as a basis for selecting a preferred feature 26 (or preferred set of features) for use in classifying cases that are in a test data set 28 (labeled as a non-training data set in FIG. 10), having cases that may be partly or completely distinct from the cases of the training data set 10” (i.e. a method to analyze data samples of a classification model)), the method comprising:
determining a first set of features of a first sample and a second set of features of a second sample (para. [0019] recites “a process 100 can be followed that begins with receiving 105 the first subset of cases in the training data set, and receiving the second subset of cases in the training data set 110”. Para. [0020] recites “For each feature, a value of that feature is determined 115 for each case that belongs to the first subset of the training data set, thus defining a first subset of feature values (that is, a feature vector). The process is then repeated 120 for that feature for each case that belongs to the second subset of the training data set, thereby defining a second subset of feature values (a second feature vector)” (i.e. determining features of a first input and a second input));
determining a set of overlapping features of the first and second sets of features (para. [0037] recites “Many characteristics of distributions of values of a feature may be used to determine an uncertainty measure. For example, an uncertainty measure of a value of a feature may relate to a standard-deviation of values of the feature in the distribution, a range of the values of the feature in the distribution, an integral over a region of a histogram of the values of the feature, a maximum probability in a histogram of values of the feature, or an overlap in histograms of values of the features” Para. [0091] recites “FIG. 8 shows distributions of the values of a feature for both classes for 10 of the 1000 sweeps ( corresponding to sweep indexes 0, 100, 200, 300, 400, 500, 600, 700, 800 and 900)”. Figs. 8E and 8G show examples of overlapping features).
However, Wojton does not explicitly teach a neural network having a plurality of layers; wherein the first and second sets of features are a function of non-zero outputs from nodes of a predetermined layer of the plurality of layers; and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample.
Selvaraju teaches a neural network having a plurality of layers; wherein the first and second sets of features are a function of non-zero outputs from nodes of a predetermined layer of the plurality of layers (Selvaraju section 3 para. 1 recites “convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. The neurons in these layers look for semantic class-specific information in the image (say object parts). Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only” (i.e. the machine learning model is a convolutional neural network with a plurality of layers, the features are outputs from a specific layer of the CNN)); 
and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (the abstract of Selvaraju recites “We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept” (i.e. a visualization technique to present analyzed features of a machine learning model)).
See claim 1 for motivation to combine.
Claim 12 is a method claim and its limitation is included in claim 4. Claim 12 is rejected for the same reasons as claim 4.
Claim 13 is a method claim and its limitation is included in claim 5. Claim 13 is rejected for the same reasons as claim 5.
Claim 14 is a method claim and its limitation is included in claim 6. Claim 14 is rejected for the same reasons as claim 6.
Claim 16 is a method claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Regarding claim 17, Wojton teaches a method for analyzing data samples of a machine learning model [based on a neural network having a plurality of layers] (para. [0008] recites “These and other aspects, features, and implementations, and combinations of them, can be expressed as methods, means or steps for performing functions, business methods, program products, compositions, apparatus, systems, components, and manufactures, and in other ways”. Fig. 10 and para. [0016] recite “In the training data set, each case is characterized by values 20 that are derived from the data representing the case and are values for corresponding features 22”. Para. [0016] also recites “These features (and combinations of them) can be analyzed to produce scores 24 that represent an expected relative usefulness of the features in classifying cases. The scores can serve as a basis for selecting a preferred feature 26 (or preferred set of features) for use in classifying cases that are in a test data set 28 (labeled as a non-training data set in FIG. 10), having cases that may be partly or completely distinct from the cases of the training data set 10” (i.e. a method to analyze data samples of a classification model)), the method comprising:
determining a first set of features of a first sample and a second set of features of a second sample (para. [0019] recites “a process 100 can be followed that begins with receiving 105 the first subset of cases in the training data set, and receiving the second subset of cases in the training data set 110”. Para. [0020] recites “For each feature, a value of that feature is determined 115 for each case that belongs to the first subset of the training data set, thus defining a first subset of feature values (that is, a feature vector). The process is then repeated 120 for that feature for each case that belongs to the second subset of the training data set, thereby defining a second subset of feature values (a second feature vector)” (i.e. determining features of a first input and a second input));
determining a set of overlapping features of the first and second sets of features by rank-ordering outputs of the nodes [of the last convolutional layer] and selecting a predetermined number of highest ranked overlapping features (para. [0037] recites “Many characteristics of distributions of values of a feature may be used to determine an uncertainty measure. For example, an uncertainty measure of a value of a feature may relate to a standard-deviation of values of the feature in the distribution, a range of the values of the feature in the distribution, an integral over a region of a histogram of the values of the feature, a maximum probability in a histogram of values of the feature, or an overlap in histograms of values of the features” Para. [0091] recites “FIG. 8 shows distributions of the values of a feature for both classes for 10 of the 1000 sweeps ( corresponding to sweep indexes 0, 100, 200, 300, 400, 500, 600, 700, 800 and 900)”. Figs. 8E and 8G show examples of overlapping features).	
However, Wojton does not explicitly teach a neural network having a plurality of layers; wherein the first and second sets of features are based on gradients of nodes of a last convolutional layer of the plurality of layers; and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample.
Selvaraju teaches a neural network having a plurality of layers; wherein the first and second sets of features are based on gradients of nodes of a last convolutional layer of the plurality of layers (Selvaraju section 3 para. 1 recites “convolutional layers naturally retain spatial information which is lost in fully-connected layers, so we can expect the last convolutional layers to have the best compromise between high-level semantics and detailed spatial information. The neurons in these layers look for semantic class-specific information in the image (say object parts). Grad-CAM uses the gradient information flowing into the last convolutional layer of the CNN to assign importance values to each neuron for a particular decision of interest. Although our technique is fairly general in that it can be used to explain activations in any layer of a deep network, in this work, we focus on explaining output layer decisions only” (i.e. the machine learning model is a convolutional neural network with a plurality of layers, the features are outputs from a specific layer of the CNN)); 
and presenting the set of overlapping features using a predetermined visualization technique to analyze features the machine learning model used to determine the first sample is similar to the second sample (the abstract of Selvaraju recites “We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say ‘dog’ in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept” (i.e. a visualization technique to present analyzed features of a machine learning model)).
See claim 1 for motivation to combine.
Claim 19 is a method claim and its limitation is included in claim 8. Claim 19 is rejected for the same reasons as claim 8.
Claim 20 is a method claim and its limitation is included in claim 9. Claim 20 is rejected for the same reasons as claim 9.

Claims 7, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wojton et al (US 20120033863 A1, herein Wojton) in view of Selvaraju et al (“Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, herein Selvaraju), in further view of Dosovitskiy et al ("Inverting Visual Representations with Convolutional Networks", herein Dosovitskiy).
Regarding claim 7, the combination of Wojton and Selvaraju teaches the method of claim 3. 
However, the combination of Wojton and Selvaraju does not teach wherein presenting the set of overlapping features using a predetermined visualization technique further comprises inverting the outputs of the nodes of the predetermined layer to maximize activation of the overlapping features .
Dosovitskiy teaches wherein presenting the set of overlapping features using a predetermined visualization technique further comprises inverting the outputs of the nodes of the predetermined layer to maximize activation of the overlapping features (section 1 para. 2 recites “We train neural networks to invert feature representations in the following sense. Given a feature vector, the network is trained to predict the expected pre-image, that is, the (weighted) average of all natural images which could have produced the given feature vector. The content of this expected pre-image shows image properties which can be confidently inferred from the feature vector”. Section 4.3 para. 1 recites “We took an image of a red apple (Figure 8 top left) from Flickr and modified its hue to make it green or blue. Then we extracted AlexNet FC8 features of the resulting images. Remind that FC8 is the last layer of the network, so the FC8 features, after application of softmax, give the network’s prediction of class probabilities. The largest activation, hence, corresponds to the network’s prediction of the image class. To check how class-dependent the results of inversion are, we passed three versions of each feature vector through the inversion network: 1) just the vector itself, 2) all activations except the 5 largest ones set to zero, 3) the 5 largest activations set to zero” (i.e. inverting the output of a predetermined layer to maximize activation of a feature)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by using the inversion technique from Dosovitskiy to provide a further level of feature analysis to the classification system of Wojton (as modified by Selvaraju). Dosovitskiy, Wojton, and Selvaraju are all directed to improving methods of image classification; therefore, one of ordinary skill in the art would be motivated to combine this inversion technique with the classification method from Wojton in order to provide more insight on which features best inform and improve the performance of the classifier.
Claim 15 is a method claim and its limitation is included in claim 7. Claim 15 is rejected for the same reasons as claim 7.
Claim 18 is a method claim and its limitation is included in claim 7. Claim 18 is rejected for the same reasons as claim 7.

		
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
“Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning” (Papernot et al) teaches a hybrid classifier that combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the deep neural network.
“Learning Deep Features for Discriminative Localization” (Zhou et al) teaches a method for generating class activation maps (CAM) using global average pooling(GAP) in CNNs.
“Augmented Grad-CAM: Heat-Maps Super Resolution Through Augmentation” (Morbidelli et al) teaches a framework to provide a high-resolution visual explanation of CNN outputs using image augmentation to aggregate multiple low-resolution heat-maps computed from augmented copies of the same input image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/L.M.F./             Examiner, Art Unit 2121                                                                                                                                                                                           
	/Li B. Zhen/             Supervisory Patent Examiner, Art Unit 2121