DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 16/253,892, filed January 22, 2019.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed September 24, 2021 has been entered. Examiner acknowledges receipt of Amendments to Application 16/253,892, which include: Amendments to the Drawings p.2 and Appendix (1 page), Amendments to the Specification pp.3-5, Amendments to the Claims pp.6-17, and Remarks pp.18-30 (containing applicant’s amendments). 
Regarding applicant’s Remarks on p.18, examiner has acknowledged Claims 1-6, 10, 12, and 14-15 have been amended. Claims 1-20 remain pending in the application. However, examiner has noted that amended Claim 10 has introduced a new claim objection, which will be identified in the section indicated below.
Regarding applicant’s Remarks on p.18, examiner has acknowledged applicant’s Amendments to the Specification and Drawings, and they have overcome each and every specification and drawing objection previously set forth in the Non-Final Office Action mailed June 3, 2021, and therefore the earlier respective specification and drawing objections are now withdrawn.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 16/253,892, which include: Remarks pp.18-30 (containing applicant’s arguments). 
Regarding applicant’s Remarks on pp.18-25 for Claims 1-20 under 35 U.S.C. 101, examiner acknowledges applicant’s arguments and have considered them, and have found them to be persuasive, and as such, the earlier §101 rejections previously set forth in the Non-Final Office Action mailed June 3, 2021 are now withdrawn. 
Regarding applicant’s Remarks on pp.25-30 for Claims 1-8 and 10-20 under 35 U.S.C. 103 as being unpatentable over Cortez et al., Using sensitivity analysis and visualization techniques to open black box data mining models, Information Sciences 225 (2013), Elsevier, pp.1017, incorporated by reference herein of Kewley et al., Data Strip Mining for the Virtual Design of Pharmaceuticals with Neural Networks, IEEE Transactions on Neural Networks, Vol. 11, No.3, May 2000, pp.668-679 [hereafter referred as Cortez], in view of Bien et al., Classification by Set Cover: The Prototype Vector Machine, arXiv:0908.2284v1, August 17 2009 [hereafter referred as Bien] and for Claim 9 under U.S.C. 103 as being unpatentable over Cortez in view of Bien, in further view of Marchiori, Class Conditional Nearest Neighbor for Large Margin Instance Selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32 No. 2, February 2010, pp.364-370 [hereafter referred as Marchiori], examiner acknowledges applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner has also noted applicant has amended the claims such that it necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the applicant’s amended claims are provided in the relevant sections indicated below. 
Regarding applicant’s Remarks on pp.25-30, examiner has noted that the main scope of the applicant’s arguments (points 1, 2, and 3 on pp.25-29) are directed towards the amended claim limitations, which were not present in the original set of claims. However, examiner has noted certain sub-arguments interspersed within those main sets of arguments that need to be addressed. 
"mapping, by the at least one processor, the features of the plurality of data points to a feature space and the plurality of outputs to a label space," "determining, by the at least one processor, distances between the plurality of data points in the feature space and the label space," and "determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space," as in claim 1. (Emphasis added). Rather, Bien merely describes using epsilon balls to select prototypes based on labeled classes. Indeed, Bien does not teach anything related to determining distances of data points in a label space.”. Examiner has considered this argument and finds the argument to be not persuasive. As indicated in the Non-Final Office Action mailed June 3, 2021, Bien p.9 Section 4.2 Prototypes not on training points, 2nd paragraph: “Another inherent flexibility of the PVM is in the choice of Z, the set of potential prototypes. While Z = X is a standard choice, we have experimented with other possibilities as well. … Z may be further augmented to include other points. For example, one could run K-means on each class's points individually (or on the training set as a whole) and add these L∙K centroids to Z. … Another successful choice for Z is to sample uniformly within the convex hull of each class's training points.”, where examiner notes that L refers to the class label space for the set of training points X, such that the set of unlabeled points Z in which to form the set of prototypes using the prototype vector machine can also include examination of data points in the label space (Bien p.1 Section 1. Introduction and Bien p.3 Section 1.1. The set cover integer program), resulting in the interpretation that the Bien reference teaches both prototype selection in both feature space as well as the label space.
Applicant has made a sub-argument that the Cortez reference does not teach “mapping, by the at least one processor, the features of the plurality of data points to a feature space and the plurality of outputs to a label space," "determining, by the at least one processor, distances between the plurality of data points in the feature space and the label space," and "determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space," as recited by currently amended independent claim 1, and as similarly recited by currently amended independent claim 5.”, with an additional sub-argument that “In particular, Cortez generally relates to a visualization approach for extracting knowledge from black box data mining models ( e.g., neural networks). Cortez, Abstract. Cortez also does not teach anything related to selecting prototypes "based on the distances between the plurality of data points in the feature space and the label space," as in the amended claims”. Examiner has considered this argument and finds the argument to be not persuasive. Applicant’s original Claim 1 is recited as follows: 
1. In a digital medium environment for machine-learning interpretation, a computer- implemented method of prototype selection and analysis to determine feature sensitivity comprising: 
identifying, by at least one processor, features of a plurality of data points used to generate a plurality of outputs via a machine-learning model; 
mapping, by the at least one processor, the features of the plurality of data points to a feature space and the plurality of outputs to a label space; and 
a step for determining an impact of the features within the machine-learning model using one or more prototypes from the plurality of data points.
As indicated above, the two claim limitations of “identifying … features of a plurality of data points” and “mapping … the features of the plurality of data points to a feature space and the plurality of outputs to a label space” are broadly recited in a context of machine-learning interpretation, where the body of the claim does not indicate a strong association that any of the claims are associated with prototype selection and analysis. While the term “one or more prototypes” is used in the claim limitation “a step for determining an impact of the features … using one or more prototypes”, under its broadest reasonable interpretation, the term “one or more prototypes” is interpreted to indicate one or more representative data points, which when taken in context to the claim limitation, does not strongly suggest or claim a concept for prototype selection and analysis, but rather a concept of how one or more representative data points are being used to determine an impact of features. Hence, the original claim language as recited in independent Claim 1 allows for an interpretation and a claim mapping associated with a machine-learning algorithm that performs a sensitivity analysis used for machine-learning interpretation that requires identification of features and mapping of features to a feature space and a mapping of outputs to a label space, thereby allowing the Cortez reference to be used for teaching the first two claim limitations. 
Examiner also notes that while the Cortez reference proposes a visualization approach for interpreting black-box models, this visualization approach is meant to serve as a mechanism to display the results of the sensitivity analysis (Cortez p.10 Section 3.3. Visualizations for the sensitivity analysis), whereas the sensitivity analysis algorithm itself and equations used for the sensitivity analysis involve identifying the data points, and mapping feature space and label space data points to interpret black-box models, all of which were properly recited and mapped to the appropriate claim limitations identified in the original Claim 1 limitation (Cortez p.2 Section 2.1 Sensitivity methods, pp.3-4 Section 2.2. Sensitivity measures of input importance). 
As indicated in the Non-Final Office Action mailed June 3, 2021, the claim limitation “a step for determining an impact of the features” invoked a 112(f) interpretation to incorporate additional claim limitations: “determining distances between the plurality of data points in the feature space and the label space; determining a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space; and determining, using the set of prototypes, an impact of the features within the machine-learning model”, such that the introduction of these claim limitations introduced 
As noted above, applicant’s remaining arguments are directed to the newly amended claim limitations, such that it necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the applicant’s amended claims are provided in the relevant sections indicated below.

Claim Objections
Claim 10 is objected to because of the following informality: The amended claim limitation “generate to the gradient for the selected data point based on a first relative direction from the selected data point to the first adjacent prototype and a second relative direction from the selected data point to the second adjacent prototype” recites the phrase “generate to the gradient for the selected data point based on a first relative direction …”, which lacks a target noun for what is being generated. It appears that the phrase should be “generate [[to]] the gradient for the selected data point based on a first relative direction …”, and for purposes of examination, the amended claim limitation will be interpreted using the above corrected phrase. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.











The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3-4, 14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bien et al., Classification by Set Cover: The Prototype Vector Machine, arXiv:0908.2284v1, August 17, 2009, pp.1-24 [hereafter referred as Bien] in view of Parades et al., Learning prototypes and distances: A prototype reduction technique based on nearest neighbor error minimization, Pattern Recognition 39 (2006), Elsevier Ltd., 2005, pp.180-188 [hereafter referred as Parades].
Regarding amended Claim 1, Bien teaches
(Currently Amended) In a digital medium environment for machine-learning interpretation, a computer- implemented method of prototype selection and analysis to determine feature sensitivity comprising:
determining a set of prototypes by:
identifying, by at least one processor, features of a plurality of data points used to generate a plurality of outputs via a machine-learning model (Examiner’s note: Bien teaches a set of training points 𝒳={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    x
                                
                                
                                    n
                                
                            
                        
                    } and 𝒵={                        
                            
                                
                                    z
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    z
                                
                                
                                    m
                                
                            
                        
                    } (“plurality of data points”) where the data points are a subset of                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                    , with associated class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     (“plurality of outputs”) where the data points are elements in the set {1, …, L}, and where the identification of the set of training points (where these training points are used in a machine-learning model to produce the outputs) and associated class labels corresponds to “identifying … features of a plurality of data points used to generate a plurality of outputs …” aspect in the context of “determining a set of prototypes” (Bien p.1 last paragraph-p.2 4th paragraph (Section 1. Introduction): “Suppose we are given a set of training set of points 𝒳={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    x
                                
                                
                                    n
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     with corresponding class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     ∈ {1, … L} and in addition, a set of unlabeled points 𝒵={                        
                            
                                
                                    z
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    z
                                
                                
                                    m
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                    . Our goal is to choose a relatively small set of prototypes                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆ 𝒵 for each class l in such a way that the collection                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    represents a summary or distillation of the training set (i.e., someone given only                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                     would have a good sense of the original training data, 𝒳 and y) … In this paper, we introduce the prototype vector machine (PVM), which describes a particular choice for the sets                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    .”). Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages and machine learning datasets (Bien pp.10-17 Section 6. Examples on simulated and real data, Section 6.4. UCI data sets), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory) processing the inputs and outputs associated with the machine learning datasets, thus corresponding to the “by at least one processor” and “via a machine-learning model” aspects of the claim limitation.);
mapping, by the at least one processor, the features of the plurality of data points to a feature space and the plurality of outputs to a label space (Examiner’s note: As indicated earlier, Bien teaches generating the collection                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                     from a prototype vector machine (PVM), where this collection represents a summary of the training set 𝒳⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                      (where                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     corresponds to a feature space) and the associated class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     ∈ {1, … L} (where the associated class labels {1, … L} correspond to a label space, and Bien teaches augmenting 𝒵 to include class label points, where these class label points corresponds to “a plurality of outputs to a label space” (Bien p.9 Section 4.2. Prototypes not on training points, 2nd paragraph: “… 𝒵 may be further augmented to include other points. For example, one could run K-means on each class’s points individually (or on the training set as a whole) and add these L∙K centroids to 𝒵.”)), such that this generated summary corresponds to “mapping … the features of the plurality of data points to a feature space and the plurality of outputs to a label space” aspect in the context of “determining a set of prototypes” (Bien p.1 last paragraph-p.2 4th paragraph (Section 1. Introduction): “Suppose we are given a set of training set of points 𝒳={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    x
                                
                                
                                    n
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     with corresponding class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     ∈ {1, … L} and in addition, a set of unlabeled points 𝒵={                        
                            
                                
                                    z
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    z
                                
                                
                                    m
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                    . Our goal is to choose a relatively small set of prototypes                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆ 𝒵 for each class l in such a way that the collection                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    represents a summary or distillation of the training set (i.e., someone given only                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                     would have a good sense of the original training data, 𝒳 and y) … In this paper, we introduce the prototype vector machine (PVM), which describes a particular choice for the sets                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    .”). As indicated earlier, Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages (Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “by at least one processor” aspect of the claim limitation.);
determining, by the at least one processor, distances between the plurality of data points in the feature space and the label space (Examiner’s note: Bien teaches using the prototype vector machine (which is an extension of the set cover integer program, Bien pp.3-6 Section 2. The prototype vector machine) to determine distances between data points in the feature space by analyzing a number of elements in 𝒵 (“a plurality of data points”) that are within a distance ϵ of a given data point                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    , where the determination of these distances corresponds to “determining … distances between the plurality of data points in the feature space …” (Bien p.3 1st paragraph Section 1.1. The set cover integer program: “The goal is to find the smallest subset of points 𝒫⊆𝒵 such that every point                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ∈𝒳 is within of some point in 𝒫 (i.e., there exists                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                        
                    𝒫 with d(                        
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ,
                        
                                             
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ) < ϵ ). Let                         
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                    (x) = x’ ∈                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     : d(x’, x) < ϵ denote the ball of radius ϵ centered at x. … From a machine learning point of view, set cover can be seen as a clustering problem in which we wish to find the smallest number of clusters such that every point is within of at least one cluster center.”). Bien further teaches augmenting 𝒵 to include class label points and applying the same set cover integer program to determine distances for the label space, where the determination of these distances corresponds to “determining … distances between the plurality of data points in … a label space” (Bien p.9 Section 4.2. Prototypes not on training points, 2nd paragraph: “… 𝒵 may be further augmented to include other points. For example, one could run K-means on each class’s points individually (or on the training set as a whole) and add these L∙K centroids to 𝒵.”). As indicated earlier, Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “by at least one processor” aspect of the claim limitation.); and
determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space (Examiner’s note: Bien teaches implementing prototype vector machine (PVM) to solve the set cover integer program by determining a minimum set of data points (“plurality of data points”; “one or more prototypes”), where according to Bien Figure 1, the set of data points within a prototype region are as close to (“adjacent”) each other as possible (where each prototype region containing adjacent data points are represented by ϵ-balls) (Bien pp.3-5 Section 2. The prototype vector machine and Section 2.1 PVM as an integer program: “The PVM seeks a set of prototypes for each class that is optimal … that will be made precise in what follows. For a given choice of                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆𝒵 , we consider the set of 𝛜-balls centered at each                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                             
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                             
                        
                    (see Figure 1). A desirable prototype set for class                         
                            l
                        
                     is one that induces a set of balls which (a) covers as many training points of class                         
                            l
                        
                     as possible, (b) covers as few training points as possible of classes other than                         
                            l
                        
                    , and (c) is sparse (i.e., uses as few prototypes as possible for the given ϵ). … We now express the three properties above as an integer program, taking as a starting point the set cover problem of Equation 2. … We define the PVM to be a solution to the following integer program: <refer to Bien p.5 equations (3a) (3b)> …”). Bien further teaches a greedy algorithm that approximates the solution to the set cover integer Bien p.8 algorithm, line 2 while loop) adding data points from 𝒵 (“prototypes from the plurality of data points”) represented by a feature-space/label-space pair (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) that have the least ratio of cost to number of points newly covered (refer to Bien p.8 equations for ∆ξ, ∆η, and ∆Obj), where these calculations used for determining ∆Obj = ∆ξ - ∆η – λ in this greedy algorithm over the set of data points in 𝒵  correspond to “determining … a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space” (Bien pp.7-8 Section 3.2 A greedy approach: “At each step, we add the prototype that has the least ratio of cost to number of points newly covered. … At each step we find the                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                             
                        
                    𝒵 and class                         
                            l
                        
                     for which adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     most decreases the objective function. That is, we find the (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) pair with the best tradeoff of covering previously uncovered training points of class                         
                            l
                        
                     while avoiding covering points of other classes.”). Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages (Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “by at least one processor” aspect of the claim limitation.); 
determining, by the at least one processor using the set of prototypes, an impact of the features within the machine-learning model (Examiner’s note: According to applicant’s specification paragraph [0032], the term “impact” is defined as “a measure of change to an output of a machine-learning model as a result of a feature input to the machine-learning model. … the model analysis system can determine impact using a variety of different measures, including … a number of prototypes within a label space”. Bien p.2 2nd paragraph (Section 1. Introduction): “Having a well-selected set of prototypes                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    ⊆ 𝒵 is advantageous for two main reasons: interpretability and classification. For domain specialists, examining a handful of representative examples of each class can be highly informative especially when n is large … a well-chosen set                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆ 𝒵 of prototypes for class l should capture the full spread of variation within this class while also taking into account how class l differs from other classes.”), where this interpretability facilitates domain specialists to further analyze and extract additional information from the prototypes providing a representative sample of data points for each class (corresponding to a number of prototypes within a label space), thus providing a method for “determining … an impact of the features within the machine-learning model”. As indicated earlier, Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages (Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “by at least one processor” aspect of the claim limitation.) …
While Bien teaches using a prototype vector machine to determine an impact of features, as well as suggesting other related adaptive prototype methods such as learning vector quantization (LVQ) involving gradients (Bien p.10 3rd paragraph), Bien does not explicitly teach 
determining, by the at least one processor using the set of prototypes, an impact of the features within the machine-learning model by:
generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes;
determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients 
Parades teaches
determining, by the at least one processor using the set of prototypes, an impact of the features within the machine-learning model by:
generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes (Examiner’s note: Parades teaches a nearest-neighbor classification based method (named “learning prototypes and distances” or LPD) involving calculations of gradients based on applying a gradient descent procedure on an estimate of the nearest-neighbor error (Parades p.181 col.1 Section 2. Approach and p.181 col.2 Section 2.1 Learning the prototypes and their weights), shown in Parades p.181 equation (2) (where the nearest-neighbor error is based on a weighted distance of data points x to a prototype y, Parades p.181 equation (1), where the weight w represents a weight associated with a feature j for each prototype, and where the nearest-neighbor error is expressed as a ratio of a set of prototypes represented by same-class and different-class nearest neighbors of x, Parades p.182 equation (5), resulting in these data points x and the same-class and different-class nearest-neighbors of y corresponding to “the plurality of data points and corresponding adjacent prototypes of the set of prototypes”). Parades further teaches approximating the nearest-neighbor error estimate shown in Parades p.182 equation (2) using a sigmoid function to make it differentiable such that a gradient descent procedure can be applied (with the approximation shown in Parades p.182 equations (4) and (5), and with the corresponding derivatives shown in Parades p.182 equations (7) and (8)). Parades teaches that applying these derivatives leads to corresponding gradient descent update equations and the LPD algorithm shown in Parades p.182 Figure 1, where each prototype x in T is visited and are updated based on the positions and weights associated with the same-class and different-class nearest neighbors of x, eventually resulting in a reduced set of prototypes containing weighted data points (associated with a corresponding feature) that are close to decision boundaries around the given minimum error estimation, with each data point within the reduced set of prototypes reflecting an importance based on relative distances/proximities to same-class or different-class nearest neighbors (Parades p.183 col.1 2nd paragraph-last paragraph, where this data point importance represents an aspect of “determining an impact of the features within the machine-learning model”). Hence, the resulting derivative equations and corresponding LPD algorithm incorporating the gradient descent procedure to determine a set of prototypes taught in Parades describes a method corresponding to “generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes” (Parades p.181 col.1 last paragraph-col.2 2nd paragraph Section 2. Approach: “Let T ={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                        
                    ,…,                        
                             
                            
                                
                                    x
                                
                                
                                    N
                                
                            
                        
                    } be a training set; i.e., a collection of training vectors or class-labeled points                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                     ∈ E, 1≤i≤N in a suitable representation space E =                        
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                    . … We seek to use T to obtain a reduced set of prototypes, P={                        
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                        
                    ,…,                        
                             
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                    }⊂ E, n ≪ N, and a suitable weighted distance d: E x P → ℝ associated to P, which optimize the NN classification performance.” and Parades p.181 col.2-p.182 col.2 (Section 2.1 Learning the prototypes and their weights): “In order to find both a matrix W and a suitable reduced set of prototypes P that results in a low error rate of the NN classifier, we propose to minimize a criterion index which is an approximation to the NN classification error of T using P and d(∙,∙). … As in previous work [7,12-14], a gradient descent procedure is proposed to minimize this index. This requires J to be differentiable with respect to all the parameters to be optimized … To obtain the partial derivatives from Eqs. (4) to (5), required for gradient descent, it should be noted that J depends on P and W through the distances d(·,·) in two different ways. First, it depends directly through the prototypes and weights involved in the definition of d(·,·) (1). The second, more subtle dependence is due to the fact that, for some x ∈ T ,                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     and                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     may be different as prototype positions and their associated weights are varied. … Using these derivatives leads to the corresponding gradient descent update equations. A simple manner to implement these equations is by visiting each prototype x in T and updating the positions and the weights associated with the same-class and different-class NNs of x. This is shown in the procedure presented in Fig.1.”).);
determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients (Examiner’s note: As indicated earlier, Parades teaches a nearest-neighbor classification based method (named “learning prototypes and distances” or LPD) involving calculating and applying derivatives representing the gradient descent update equations and the LPD algorithm as shown in Parades p.182 Figure 1, where each prototype x in T is visited and are updated based on the positions and weights associated with the same-class and different-class nearest neighbors of x, eventually resulting in a reduced set of prototypes containing weighted data points (associated with a corresponding feature) that are close to decision boundaries around the given minimum error estimation, with each data point within the reduced set of prototypes reflecting an importance based on relative distances/proximities to same-class or different-class nearest neighbors (Parades p.183 col.1 2nd paragraph-last paragraph, where this data point importance varies based on the relative proximity of the data x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     according to the effects of the gradient update equations (where the gradient update equations corresponds to a mechanism that produces “locally sensitive directions of the features”), thus corresponding to “determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients” (Parades p.183 col.1 2nd paragraph (Section 2.1 Learning the prototypes and their weights): “The effects of the update equations in the LPD algorithm are intuitively clear. For each training vector x, its same-class NN,                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     , is moved towards x, while its different-class NN,                         
                            
                                
                                    y
                                
                                
                                    k
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     , is moved away from x. Similarly, the feature-dependent weights associated with                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                      are modified so as to make it appear closer to x in a feature-dependent manner, while those of                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     are modified so that it will similarly appear farther from x. Since these update steps are weighted by the distance ratio, r(x), their importance depends upon the relative proximity of x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                    . This is further divided by the corresponding squared distance, thereby reducing the update importance for large distances. Finally, the resulting steps are windowed by the derivative of the sigmoid function applied to the distance ratio, r(x). This way, only those prototypes (and their weights) which are sufficiently close to the decision boundaries are actually updated.”).).
	Both Bien and Parades are analogous art since they both teach prototype selection and classification based on nearest neighbor analysis.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the prototype vector machine taught in Bien and enhance it to incorporate the gradient descent technique demonstrated in the LPD algorithm taught in Parades as a way to generate a plurality of gradients using a plurality of data points (Parades p.183 col.1 3rd-4th paragraphs: “The attentive reader will find the above prototype update rules closely related to the so-called reward–punishment rules heuristically introduced in such popular procedures as LVQ1, LVQ2 and DSM [16–18]. … It is remarkable that an intuitive interpretation of the formally derived LPD prototype update rules is so closely related with popular heuristics which, without formal proof of their potential usefulness, have proved quite helpful to improve accuracy in many practical situations. Nevertheless, the advantages of LPD are clear: not only the update policy for prototype positions, but also for the associated metric weights, along with the corresponding smoothing and windowing terms, come from a mathematical derivation which guarantees convergence towards an (approximate) local minimum of the empirical NN error estimation.).
Regarding amended Claim 3, Bien in view of Parades teaches
(Currently Amended) The computer-implemented method as recited in claim 1, wherein 
determining a number of prototypes in a first region and a number of prototypes in a second region of the feature space or the label space (Examiner’s note: Bien teaches a set of prototypes                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                     , … ,                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     (each corresponding to a prototype region, i.e., “a first region”, “a second region”), with each region consisting of a plurality of data points from a training set (Bien p.1 last paragraph – p.2 first paragraph: “Our goal is to choose a relatively small set of prototypes                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     ⊆ 𝒵 for each class                         
                            l
                        
                     in such a way that the collection                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                     , … ,                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     represents a summary or distillation of the training set…”). Bien further teaches using the set cover integer program to determine distances between data points in the feature space by analyzing a number of elements in 𝒵 that are within a distance ϵ of a given data point                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    , hence also defining a number of prototypes within each region defined by distance 𝛜, thus corresponding to a method for “determining a number of prototypes in a first region and a number of prototypes in a second region of the feature space or the label space” (Bien p.3 1st paragraph Section 1.1 The set cover integer program: “The goal is to find the smallest subset of points 𝒫⊆𝒵 such that every point                         
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                        
                    ∈𝒳 is within of some point in 𝒫 (i.e., there exists                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                        
                    𝒫 with d(                        
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            ,
                        
                                             
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ) < ϵ ). Let                         
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                    (x) = x’ ∈                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     : d(x’, x) < ϵ denote the ball of radius ϵ centered at x. … From a machine learning point of view, set cover can be seen as a clustering problem in which we wish to find the smallest number of clusters such that every point is within of at least one cluster center.”).); and 
determining the impact of the features of the plurality of data points based on the number of prototypes in the first region and the second region (Examiner’s note: According to applicant’s specification paragraph [0032], the term “impact” is defined as “a measure of change to an output of a machine-learning model as a result of a feature input to the machine-learning model. … the model analysis system can determine impact using a variety of different measures, including … a number of prototypes within a label space”. Bien teaches that generating a set of prototypes allows for ease of interpretability, through the identification of a representative sample of data points for each class, as well as capturing a full spread of variation within a class and between other classes (Bien p.2 2nd paragraph (Section 1. Introduction): “Having a well-selected set of prototypes                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    ⊆ 𝒵 is advantageous for two main reasons: interpretability and classification. For domain specialists, examining a handful of representative examples of each class can be highly informative especially when n is large … a well-chosen set                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆ 𝒵 of prototypes for class l should capture the full spread of variation within this class while also taking into account how class l differs from other classes.”), where this interpretability facilitates domain specialists to further analyze and extract additional information from the prototypes providing a representative sample of data points for each class (corresponding to a number of prototypes within a label space), thus providing a method for “determining an impact of the features” for different sets of prototypes (corresponding to different regions), resulting in this method corresponding to a method for “determining the impact of the features of the plurality of data points based on the number of prototypes in the first region and the second region”.).  
Regarding amended Claim 4, Bien in view of Parades teaches
(Currently Amended) The computer-implemented method as recited in claim 1, wherein 
determining, for a selected data point of the plurality of data points, a plurality of adjacent prototypes to the selected data point within the label space (Examiner’s note: This claim limitation is similar in scope to the combined scope of two claim limitations recited in independent Claim 1: “determining, by the at least one processor, distances between the plurality of data points based on the distances between the plurality of data points in the feature space and label space; and determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space”, where the end result is a set of prototypes (corresponding to “determining … a plurality of adjacent prototypes”) that are within an epsilon ball of radius 𝛜 (where this radius 𝛜 corresponds to “… for a selected data point of the plurality of data points, a plurality of adjacent prototypes to the selected data point within the label space”), and hence is ); 
analyzing the plurality of adjacent prototypes to determine a mean and a variance of the plurality of adjacent prototypes (Examiner’s note: As indicated earlier, Parades teaches a nearest-neighbor classification based method (named “learning prototypes and distances” or LPD) involving calculations of gradients based on applying a gradient descent procedure on an estimate of the nearest-neighbor error (Parades p.181 col.1 Section 2. Approach and p.181 col.2 Section 2.1 Learning the prototypes and their weights), shown in Parades p.181 equation (2) (where the nearest-neighbor error is based on a weighted distance of data points x to a prototype y, Parades p.181 equation (1), where the weight w represents a weight associated with a feature j for each prototype, and where the nearest-neighbor error is expressed as a ratio of a set of prototypes represented by same-class and different-class nearest neighbors of x, Parades p.182 equation (5), resulting in these data points x and the same-class and different-class nearest-neighbors of y corresponding to “the plurality of data points and corresponding adjacent prototypes of the set of prototypes”). As indicated earlier, Parades further teaches approximating the nearest-neighbor error and performing a differentiation to obtain derivatives in order to generate corresponding gradient descent update equations and the LPD algorithm shown in Parades p.182 Figure 1, where each prototype x in T is visited and are updated based on the positions and weights associated with the same-class and different-class nearest neighbors of x, eventually resulting in a reduced set of prototypes containing weighted data points (associated with a corresponding feature) that are close to decision boundaries around the given minimum error estimation, with each data point within the reduced set of prototypes reflecting an importance based on relative distances/proximities to same-class or different-class nearest neighbors (Parades p.183 col.1 2nd paragraph-last paragraph). Parades teaches that the LPD algorithm requires the application of learning step factors                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     and                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     that need to learned as part of analyzing the set of prototypes in order to                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     and                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     correspond to “a mean and variance” for given data point i and a feature j (Parades p.182 col.2 last paragraph: “Two sets of learning step factors,                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                    ,                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     , are needed by this algorithm. They can take just a fixed value for all i, j or may depend on i, j following simple rules; for instance,                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     may be inversely proportional to the variance of each feature j. In addition, for smoother (but slower) convergence, these values may be decreased along the successive iterations of the LPD while loop. Large values of                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     give more importance to the learning of the prototypes themselves while large values of                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     emphasize the learning of the distance associated to these prototypes.”) and that this LPD algorithm corresponds correspond to similar heuristics found in LVQ algorithms (Parades p.183 col.1 3rd paragraph) as well as being analogous to similar well-known EM estimations of Gaussian mixtures involving mean and covariance matrices (Parades p.181 col.1 2nd paragraph), and as such, makes this LPD algorithm correspond to a process for “analyzing the plurality of adjacent prototypes to determine a mean and a variance of the plurality of adjacent prototypes”.); and 
determining, based on the mean and the variance of the plurality of adjacent prototypes in the label space, the impact of the features of the plurality of data points within the machine-learning model (Examiner’s note: As indicated earlier, Parades teaches a nearest-neighbor classification based method (named “learning prototypes and distances” or LPD) involving calculating and applying derivatives representing the gradient descent update equations and the LPD algorithm as shown in Parades p.182 Figure 1, where each prototype x in T is visited and are updated based on the positions and weights associated with the same-class and different-class nearest neighbors of x, eventually resulting in a reduced set of prototypes containing weighted data points (associated with a corresponding feature) that are close to decision boundaries around the given minimum error estimation, with each data point within the reduced set of prototypes reflecting an importance based on relative distances/proximities to same-class Parades p.183 col.1 2nd paragraph-last paragraph), where this importance (representing an aspect of “determining an impact of the features within the machine-learning model”) varies based on the relative proximity of the data x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     according to the effects of the gradient update equations (where the gradient update equations corresponds to a mechanism that produces “locally sensitive directions of the features”). As indicated earlier, the gradient update equations used in the LPD algorithm requires the application of learning step factors                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     and                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     that need to learned as part of analyzing the set of prototypes in order to update/converge the set of prototypes to an optimal form, where these factors                         
                            
                                
                                    μ
                                
                                
                                    i
                                    j
                                
                            
                        
                     and                         
                            
                                
                                    v
                                
                                
                                    i
                                    j
                                
                            
                        
                     correspond to “a mean and variance” for given data point i and a feature j (Parades p.182 col.2 last paragraph). Hence, the usage of the LPD algorithm to determine a set of prototypes, where the set of prototypes contain data points reflecting an importance based on relative distances to same-class and different-class nearest neighbors corresponds to a process for “determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients” (Parades p.183 col.1 2nd paragraph (Section 2.1 Learning the prototypes and their weights): “The effects of the update equations in the LPD algorithm are intuitively clear. For each training vector x, its same-class NN,                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     , is moved towards x, while its different-class NN,                         
                            
                                
                                    y
                                
                                
                                    k
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     , is moved away from x. Similarly, the feature-dependent weights associated with                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                      are modified so as to make it appear closer to x in a feature-dependent manner, while those of                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     are modified so that it will similarly appear farther from x. Since these update steps are weighted by the distance ratio, r(x), their importance depends upon the relative proximity of x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                    . This is further divided by the corresponding squared distance, thereby reducing the update importance for large distances. Finally, the resulting steps are windowed by the derivative of the sigmoid function applied to the distance ratio, r(x). This way, only those prototypes (and their weights) which are sufficiently close to the decision boundaries are actually updated.”), where the determination of importances corresponds to an aspect of “determining … the impact of the features of the plurality of data points within the machine-learning model”.).  
Regarding amended Claim 14, Bien teaches
(Currently Amended) In a digital medium environment for machine-learning interpretation, a system for prototype selection and analysis to determine feature sensitivity comprising: 
at least one processor (Examiner’s note: Bien teaches a prototype selection method involving a prototype vector machine (PVM), where the PVM is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages (Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “at least one processor” aspect of the claim limitation.); and 
a non-transitory computer memory (Examiner’s note: Bien teaches a prototype selection method involving a prototype vector machine (PVM), where the PVM is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages (Bien pp.10-17 Section 6. Examples on simulated and real data), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to the “a non-transitory computer memory” aspect of the claim limitation.) comprising: 
a plurality of data points used to generate a plurality of outputs via a machine-learning model (Examiner’s note: Bien teaches a set of training points 𝒳={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    x
                                
                                
                                    n
                                
                            
                        
                    } and 𝒵={                        
                            
                                
                                    z
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    z
                                
                                
                                    m
                                
                            
                        
                    } (“plurality of data points”) where the data points are a subset of                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                    , with associated class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     (“plurality of outputs”) where the data points are elements in the set {1, …, L}, and where the identification of the set of training points (where these training points are used in Bien p.1 last paragraph-p.2 4th paragraph (Section 1. Introduction): “Suppose we are given a set of training set of points 𝒳={                        
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    x
                                
                                
                                    n
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                     with corresponding class labels                         
                            
                                
                                    y
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     ∈ {1, … L} and in addition, a set of unlabeled points 𝒵={                        
                            
                                
                                    z
                                
                                
                                    1
                                
                            
                             
                        
                    , …,                         
                            
                                
                                    z
                                
                                
                                    m
                                
                            
                        
                    } ⊂                         
                            
                                
                                    R
                                
                                
                                    p
                                
                            
                        
                    . Our goal is to choose a relatively small set of prototypes                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                    ⊆ 𝒵 for each class l in such a way that the collection                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    represents a summary or distillation of the training set (i.e., someone given only                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                     would have a good sense of the original training data, 𝒳 and y) … In this paper, we introduce the prototype vector machine (PVM), which describes a particular choice for the sets                         
                            
                                
                                    P
                                
                                
                                    1
                                
                            
                        
                    , …,                         
                            
                                
                                    P
                                
                                
                                    L
                                
                            
                        
                    .”). Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages and machine learning datasets (Bien pp.10-17 Section 6. Examples on simulated and real data, Section 6.4. UCI data sets), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory) processing the inputs and outputs associated with the machine learning datasets, thus corresponding to the “via a machine-learning model” aspect of the claim limitation.), and 
instructions that, when executed by the at least one processor (Examiner’s note: As indicated earlier, Bien further teaches that this method involving the prototype vector machine (PVM) is used to perform analysis on several datasets and comparisons with other prototype methods using various R packages and machine learning datasets (Bien pp.10-17 Section 6. Examples on simulated and real data, Section 6.4. UCI data sets), where these R packages are code modules running on a computer (where a computer contains a processor and non-transitory memory), thus corresponding to “instructions that, when executed by the at least one processor”.), cause the system to: 
map features of the plurality of data points to a feature space and the plurality of outputs to a label space (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
identify a set of prototypes by: 
determining a subset of data points within a threshold distance relative to a first data point of the plurality of data points within the feature space (Examiner’s note: This claim limitation of “determining a subset of data points within a threshold distance relative to a first data point of the plurality of data points within the feature space” is similar in scope to the combined scope of two claim limitations recited in independent Claim 1: “determining, by the at least one processor, distances between the plurality of data points based on the distances between the plurality of data points in the feature space and label space; and determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space”, where the end result is a set of prototypes (corresponding to “a subset of data points”) that are within an epsilon ball of radius 𝛜 (where this radius 𝛜 corresponds to “a threshold distance relative to a first data point in the plurality of data points within the feature space”), and hence is rejected under similar rationale identified by those two claim limitations recited in independent Claim 1.);
adding the first data point to the set of prototypes based on distances between the first data point and the subset of data points in the label space (Examiner’s note: Bien teaches                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) representing a cost based on the distances                         
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                     and a number of prototypes λ (corresponding to “determine a cost … based on the distances between the first data point and the subset of data points … and a total number of prototypes …”) (Bien p.5 2nd paragraph-p.6 2nd paragraph (Section 2.1 PVM as an integer program): “λ ≥ 0 is a parameter specifying the cost of adding a prototype. Its effect is to control the number of prototypes chosen … We generally choose λ = 1/n …where                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) is the cost of adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     …                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) = λ + |                        
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                    (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     ) ∩ (X \                         
                            
                                
                                    X
                                
                                
                                    l
                                
                            
                        
                    )|.”). As indicated earlier, the set of prototypes can be established in both feature space and label space based on the data points included in Z (Bien p.9 Section 4.2 Prototypes not on training points, 2nd paragraph). Bien further teaches approximating the solution to the set cover integer program using an greedy algorithm by iteratively (see Bien p.8 algorithm, line 2 while loop) adding data points from 𝒵 (corresponding to a set containing “the first data point”) represented by a feature-space/label-space pair (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) that have the least ratio of cost to number of points newly covered (Bien p.8 equations for ∆ξ, ∆η, and ∆Obj), where in line 2 the data point z* is added into the set of prototypes which includes the first data point as a prototype (corresponding to “adding the first data point to the set of prototypes based on distances between the first data point and the subset of data points in the label space”) (Bien pp.7-8 Section 3.2 A greedy approach: “At each step, we add the prototype that has the least ratio of cost to number of points newly covered. … At each step we find the                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                             
                        
                    𝒵 and class                         
                            l
                        
                     for which adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     most decreases the objective function. That is, we find the (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) pair with the best tradeoff of covering previously uncovered training points of class                         
                            l
                        
                     while avoiding covering points of other classes.”).);
determine, using the set of prototypes, an impact of the features within the machine-learning model (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …  
Bien p.10 3rd paragraph), Bien does not explicitly teach 
determine, using the set of prototypes, an impact of the features within the machine-learning model by:
generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes;
determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients.
	Parades teaches
determine, using the set of prototypes, an impact of the features within the machine-learning model by:
generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.);
determining rank orders for the features of the plurality of data points according to locally sensitive directions of the features based on the plurality of gradients (This claim limitation is similar in scope to a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).
	Both Bien and Parades are analogous art since they both teach prototype selection and classification based on nearest neighbor analysis.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the prototype vector machine taught in Bien and (Parades p.183 col.1 3rd-4th paragraphs: “The attentive reader will find the above prototype update rules closely related to the so-called reward–punishment rules heuristically introduced in such popular procedures as LVQ1, LVQ2 and DSM [16–18]. … It is remarkable that an intuitive interpretation of the formally derived LPD prototype update rules is so closely related with popular heuristics which, without formal proof of their potential usefulness, have proved quite helpful to improve accuracy in many practical situations. Nevertheless, the advantages of LPD are clear: not only the update policy for prototype positions, but also for the associated metric weights, along with the corresponding smoothing and windowing terms, come from a mathematical derivation which guarantees convergence towards an (approximate) local minimum of the empirical NN error estimation.).
Regarding original Claim 16, Bien in view of Parades teaches
(Original) The system as recited in claim 14, wherein the instructions that cause the system to determine the impact of the features within the machine-learning model cause the system to: 
determine a number of prototypes in a first region and a number of prototypes in a second region of the feature space or the label space (This claim element is similar in scope to a corresponding claim element in Claim 3, and hence is rejected under similar rationale.); and 
determine the impact of the features of the plurality of data points in the first region and the second region based on the number of prototypes in the first region and the second region (This claim element is similar in scope to a corresponding claim element in Claim 3, and hence is rejected under similar rationale.).  
Regarding original Claim 17, Bien in view of Parades teaches
(Original) The system as recited in claim 14, wherein the instructions that cause the system to determine the impact of the features within the machine-learning model cause the system to: 
determine, for a selected data point of the plurality of data points, a plurality of adjacent prototypes to the selected data point in the label space (This claim element is similar in scope to a corresponding claim element in Claim 4, and hence is rejected under similar rationale.); 
analyze the plurality of adjacent prototypes to determine a mean and a variance of the plurality of adjacent prototypes in the label space (This claim element is similar in scope to a corresponding claim element in Claim 4, and hence is rejected under similar rationale.); and 
determine, based on the mean and the variance of the plurality of adjacent prototypes in the label space, the impact of the features of the plurality of data points within the machine-learning model (This claim element is similar in scope to a corresponding claim element in Claim 4, and hence is rejected under similar rationale.).  
Regarding original Claim 18, Bien in view of Parades teaches
(Original) The system as recited in claim 17, wherein the instructions that cause the system to determine the impact of the features within the machine-learning model cause the system to: 
determine a bias of the selected data point by determining a distance between the selected data point and the mean in the label space (Examiner’s note: According to the applicant’s specification paragraph [0068], the term “bias” indicates a variance (computed as a difference or distance between a data point represented by g(x) and a selected data point (i.e., the center of the set of prototypes), and a mean squared bias determines an optimization for selecting an epsilon ball size (i.e., a set of prototypes with radius ϵ). Hence, in the context of the claims, the bias (or distance between a data point and a selected data point in a set of prototypes) is interpreted as a distance metric that is used to perform optimization of the set of prototypes centered around an epsilon ball size with radius ϵ (where space around the center of a set of prototypes represents “a mean in the … space”). As indicated earlier, Parades teaches approximating the nearest-neighbor error and performing a differentiation to obtain derivatives in order to generate corresponding gradient descent update equations and the LPD algorithm shown in Parades p.182 Figure 1. As shown in Parades p.182 Figure 1, a conditional check involving a distance metric and a small constant ε (|λ’-λ| > ε) is used as a terminating condition that decides whether to continue with the LPD optimization or to terminate the optimization (where the result of the termination will yield an optimal set of prototypes), with the value λ updated during each while iteration towards identifying a smaller space dictated by the nearest neighbor error estimate J(P,W) (Parades p.182 equation (4), which corresponds to “a mean in the … space”). Hence, this distance metric |λ’-λ| corresponds to a step in which to “determine a bias of the selected data point by determining a distance between the selected data point and the mean in the … space”. When combined with the teachings of Bien, indicating that the set of data points can also include data points in the label space (Bien p.9 Section 4.2 Prototypes not on training points, 2nd paragraph), both Bien and Parades teach the limitations identified in this claim, which correspond to a process that “determine a bias of the selected data point by determining a distance between the selected data point and the mean in the label space”.); and 
determine the impact of the features of the plurality of data points based on the bias of the selected data point (Examiner’s note: As indicated earlier, Parades teaches the LPD algorithm as shown in Parades p.182 Figure 1, where each prototype x in T is visited and are updated based on the positions and weights associated with the same-class and different-class nearest neighbors of x, eventually resulting in a reduced set of prototypes containing weighted data points (associated with a corresponding feature) that are close to decision boundaries around the given minimum error estimation, with each data point within the reduced set of prototypes reflecting an importance based on relative distances/proximities to same-class or different-class nearest neighbors (Parades p.183 col.1 2nd paragraph-last paragraph), where this importance (representing an aspect of “determining an impact of the features within the machine-learning model”) varies based on the relative proximity of the data x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     according to the effects of the gradient update equations (where the gradient update equations corresponds to a mechanism that produces “locally sensitive directions of the features”). As shown in Parades p.182 Figure 1, a conditional check involving a distance metric and a small constant ε (|λ’-λ| > ε) is used as a terminating condition that decides whether to continue with the LPD optimization or to terminate the optimization (where the result of the termination will yield an optimal set of prototypes), with the value λ updated during each while iteration towards identifying a smaller space dictated by the nearest neighbor error estimate J(P,W) (Parades p.182 equation (4), which corresponds to “a mean in the … space”). This distance metric |λ’-λ| corresponds to a step in which to “determine a bias of the selected data point by determining a distance between the selected data point and the mean in the … space”. When combined with the teachings of Bien, indicating that the set of data points can also include data points in the label space (Bien p.9 Section 4.2 Prototypes not on training points, 2nd paragraph), both Bien and Parades teach the limitations identified in this claim, which correspond to a process that “determine a bias of the selected data point by determining a distance between the selected data point and the mean in the label space”. Hence, the usage of the LPD algorithm to Parades p.183 col.1 2nd paragraph (Section 2.1 Learning the prototypes and their weights): “The effects of the update equations in the LPD algorithm are intuitively clear. For each training vector x, its same-class NN,                         
                            
                                
                                    y
                                
                                
                                    i
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     , is moved towards x, while its different-class NN,                         
                            
                                
                                    y
                                
                                
                                    k
                                
                            
                        
                     =                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     , is moved away from x. Similarly, the feature-dependent weights associated with                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                      are modified so as to make it appear closer to x in a feature-dependent manner, while those of                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                     are modified so that it will similarly appear farther from x. Since these update steps are weighted by the distance ratio, r(x), their importance depends upon the relative proximity of x to                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    =
                                
                            
                        
                     or                         
                            
                                
                                    y
                                
                                
                                    x
                                
                                
                                    ≠
                                
                            
                        
                    . This is further divided by the corresponding squared distance, thereby reducing the update importance for large distances. Finally, the resulting steps are windowed by the derivative of the sigmoid function applied to the distance ratio, r(x). This way, only those prototypes (and their weights) which are sufficiently close to the decision boundaries are actually updated.”), where the determination of importances corresponds to an aspect of “determining … the impact of the features of the plurality of data points”, where this determination is based on a LPD algorithm that determines a bias distance (corresponding to “based on the bias of the selected data point”).).  
Regarding original Claim 19, Bien in view of Parades teaches
(Original) The system as recited in claim 14, wherein the instructions that cause the system to identify the set of prototypes cause the system to: 
determine a cost based on the distances between the first data point and the subset of data points in the label space and a total number of prototypes (Examiner’s note: Bien teaches                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) representing a cost based on the distances                         
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                     and a number of prototypes λ (corresponding  (Bien p.5 2nd paragraph-p.6 2nd paragraph (Section 2.1 PVM as an integer program): “λ ≥ 0 is a parameter specifying the cost of adding a prototype. Its effect is to control the number of prototypes chosen … We generally choose λ = 1/n …where                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) is the cost of adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     …                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) = λ + |                        
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                    (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     ) ∩ (X \                         
                            
                                
                                    X
                                
                                
                                    l
                                
                            
                        
                    )|.”). As indicated earlier, the set of prototypes can be established in both feature space and label space based on the data points included in Z (Bien p.9 Section 4.2 Prototypes not on training points, 2nd paragraph). Bien further teaches approximating the solution to the set cover integer program using an greedy algorithm by iteratively (see Bien p.8 algorithm, line 2 while loop) adding data points from 𝒵 (corresponding to a set containing “the first data point”) represented by a feature-space/label-space pair (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) that have the least ratio of cost to number of points newly covered (Bien p.8 equations for ∆ξ, ∆η, and ∆Obj), where in line 2 the data point z* is added into the set of prototypes which includes the first data point as a prototype (corresponding to “ … between the first data point and the subset of data points …”) (Bien pp.7-8 Section 3.2 A greedy approach: “At each step, we add the prototype that has the least ratio of cost to number of points newly covered. … At each step we find the                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                             
                        
                    𝒵 and class                         
                            l
                        
                     for which adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     most decreases the objective function. That is, we find the (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) pair with the best tradeoff of covering previously uncovered training points of class                         
                            l
                        
                     while avoiding covering points of other classes.”).); and 
add the first data point to the set of prototypes based on the cost (Examiner’s note: Bien teaches                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) representing a cost based on the distances                         
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                     and a number of prototypes λ (Bien p.5 2nd paragraph-p.6 2nd paragraph (Section 2.1 PVM as an integer program): “λ ≥ 0 is a parameter specifying the cost of adding a prototype. Its effect is to control the number of prototypes chosen … We generally choose λ = 1/n …where                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) is the cost of adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     …                         
                            
                                
                                    C
                                
                                
                                    l
                                
                            
                        
                    (j) = λ + |                        
                            
                                
                                    B
                                
                                
                                    ϵ
                                
                            
                        
                    (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     ) ∩ (X \                         
                            
                                
                                    X
                                
                                
                                    l
                                
                            
                        
                    )|.”). Bien further teaches approximating the Bien p.8 algorithm, line 2 while loop) adding data points from 𝒵 (corresponding to “a first data point of the plurality of data points”) represented by a feature-space/label-space pair (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) that have the least ratio of cost to number of points newly covered (Bien p.8 equations for ∆ξ, ∆η, and ∆Obj), where in Bien p.8 algorithm line 2 the data point z* is added into the set of prototypes, thus corresponding to “add the first data point to the set of prototypes based on the cost” (Bien pp.7-8 Section 3.2 A greedy approach: “At each step, we add the prototype that has the least ratio of cost to number of points newly covered. … At each step we find the                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                            ∈
                             
                        
                    𝒵 and class                         
                            l
                        
                     for which adding                         
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                     to                         
                            
                                
                                    P
                                
                                
                                    l
                                
                            
                        
                     most decreases the objective function. That is, we find the (                        
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ,                         
                            l
                        
                    ) pair with the best tradeoff of covering previously uncovered training points of class                         
                            l
                        
                     while avoiding covering points of other classes.”).).  
Regarding original Claim 20, Bien in view of Parades teaches
(Original) The system as recited in claim 19, wherein the instructions that cause the system to identify the set of prototypes cause the system to: 
identify, for a second data point of the plurality of data points, a second subset of data points within the threshold distance relative to the second data point within the feature space (Examiner’s note: This claim limitation is similar in scope to a corresponding claim limitation from independent Claim 14: “identify a set of prototypes by: determining a subset of data points within a threshold distance relative to a first data point of the plurality of data points within the feature space”, where the end result is a set of prototypes (corresponding to “a second subset of data points”) that are within an epsilon ball of radius 𝛜 (where this radius 𝛜 corresponds to “a threshold distance relative to the second data point within the feature space”), such that this claim as a whole is directed towards a mere iterative process of identifying a separate set of prototypes (a second subset of data points representing a second set of prototypes), and hence is rejected under similar rationale as indicated in independent Claim 14.); 
determine distances between the second data point and the second subset of data points within the label space (Examiner’s note: This claim limitation is similar in scope to a corresponding claim limitation from independent Claim 14: “identify a set of prototypes by: determining a subset of data points within a threshold distance relative to a first data point of the plurality of data points within the feature space”, which is of the same scope as the combined claim limitations recited in independent Claim 1: “determining, by the at least one processor, distances between the plurality of data points based on the distances between the plurality of data points in the feature space and label space; and determining, by the at least one processor, a set of prototypes from the plurality of data points based on the distances between the plurality of data points in the feature space and the label space”, where the end result is a set of prototypes (corresponding to “a second subset of data points”) that are within an epsilon ball of radius 𝛜 (where this radius 𝛜 corresponds to “a distance between the second data point and the second subset of data points within the label space”), such that this claim as a whole is directed towards a mere iterative process of identifying a separate set of prototypes (a second subset of data points representing a second set of prototypes), and hence is rejected under similar rationale as indicated in independent Claim 14 and the combined claim limitations recited in independent Claim 1.); and 
determine a cost associated with adding the second data point to the set of prototypes based on the distances between the second data point and the second subset of data points and the total number of prototypes including the first data point as a prototype within the set of prototypes (Examiner’s note: This claim limitation is similar in scope to a combined scope of the corresponding claim limitations from dependent Claim 19: “determine a cost based on the distances between the first data point and the subset of data points in the label space and a total number of prototypes; and add the first data point to the set of prototypes based on the cost”, such that this claim as a whole is directed towards a mere iterative process of running the same greedy algorithm based on a cost function to identify and add more prototypes to an ϵ), where this set of prototypes includes a prototype that was already added (a first data point as a prototype) in addition to a new prototype currently being added as part of this iteration (a second data point), and hence is rejected under similar rationale identified by those two claim limitations recited in dependent Claim 19.).  

Allowable Subject Matter



Claims 5-9 and 11-13 are identified as allowable over the prior art.  
The following is a statement of reasons for the indication of allowable subject matter. Independent claim 5 recites the following newly amended claim limitation: 
… generating a plurality of gradients for the plurality of data points, the plurality of gradients comprising, for a selected data point of the plurality of data points, a gradient based on a first adjacent prototype with a first model prediction lower than a model prediction of the selected data point and a second adjacent prototype with a second model prediction higher than the model prediction of the selected data point; …
While the prior art teaches generating a plurality of gradients based on the plurality of data points and corresponding adjacent prototypes of the set of prototypes, the prior art does not teach generating a plurality of gradients based on two data points that are adjacent below and adjacent above a selected data point within the set of prototypes, such that it is possible that these two data points are not necessarily adjacent to each other, but only adjacent to a selected data point (i.e., a data point centered within the set of prototypes defined by a radius ϵ), thereby making this claim allowable over the prior art.
Claims 6-9 and 11-13 are dependent claims based on independent Claim 5, therefore these dependent claims are also allowable over the prior art.
Claim 10 is a dependent claim based on independent Claim 5, and as such, is allowable over the prior art. However, Claim 10 contains an identified claim objection (a Claim 10 is objected to due to the presence of this unresolved claim objection.
Claims 2 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Dependent Claims 2 and 15 both contain allowable subject matter that is of the same scope as the identified claim limitation recited in independent Claim 5 (see above statement of reasons for allowability of independent Claim 5), since both of these claims recite claim limitations directed to the selection of a first adjacent prototype lower than (i.e., below) a selected data point, the selection of a second adjacent prototype higher than (i.e., above) a selected data point, and the generation of a gradient using these first and second adjacent prototypes.
In an effort to advance prosecution to progress this case towards allowability, examiner (in collaboration with the examiner’s supervisor and the applicant’s attorney) had proposed examiner amendments (i.e., identifying the claim limitation from independent Claim 5 that are similar in scope to dependent Claims 2 and 15, and incorporating them into independent Claims 1 and 14 to make all independent claims allowable, as well as correcting the claim objection identified in Claim 10), as well as suggesting an alternative of incorporating Claims 2 and 15 into Claims 1 and 14 during an examiner’s interview occurring on December 1-3, 2021. However, according to attorney’s response on December 3, 2021, both proposals were respectfully declined by the applicant, and hence no agreement was reached during this round of prosecution.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121