DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the application filed on 1/11/21.
	Claims 1-24 are pending. 

Claim Objections
Claim 8 is objected to for reciting an undefined acronym or term without a plain or normally accepted meaning:  “tfidf.” The applicant is requested to put on the record any meaning applied to this term (e.g., clear reference to the specification). 

Allowable Subject Matter
Claim 7, 8, 19, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6, 9-16, 18, and 21-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zeng et al. (US 20210173829, Here’ “Zeng”)  in view of Bent et al. (US 20200402098, Herein “Bent”) in view of Wheaton et al. (US 20210110527, Herein “Wheaton”).
Regarding claim 1, Zeng teaches A computer-executed method for producing a global explanation for a black box machine learning model that is used to generate predictions for textual documents (predicted corrections based on identified subset of tokens [0014]; compute (fig. 1) based on training [0024] and [0025]), comprising: 
tokenizing text data in a plurality of textual documents to produce a plurality of tokens (words of a user question [0014]); identifying a set of candidate important tokens from the plurality of tokens (a subset of tokens corresponding with important or confusion words or tokens [0014]); wherein the set of candidate important tokens is less all of the plurality of tokens (subset [0014]); 
for each candidate important token, of the set of candidate important tokens: 
determining a selectivity metric that indicates how selective said each candidate important token is for one or more predictions of a plurality of predictions of the black box machine learning model (probability metric for selecting replacement tokens [0033]), and responsive to the selectivity metric for said each candidate important token satisfying inclusion criteria, including said each candidate important token in a set of important explainer tokens (replacement tokens explaining or clarifying the confusion tokens based on the probability [0033]; score threshold replacement [0042]); and generating explanation information for the black box machine learning model based on the set of important explainer tokens (replaced/clarified text based on the token substitution as performed [0042]); wherein the method is performed by one or more computing devices (processor [0017]; fig. 1).

However, Zeng fails to specifically teach black box machine learning model.
Yet, in a related art, Bent discloses learning regression modeling ([0051] to [0053], [0062] to [0070], and [0090]). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the black box learning model of Bent with the token analysis of Zeng to have black box machine learning model. The combination would allow for, according to the motivation of Bent, learning, after the fact, about the value of certain content such as tokens for presentation to users, so that the projected impact of a content item can be known before a respective campaign is launched, based on an analysis of targeted tokens [0003] and [0004].  
Furthermore, Bent teaches:
a computer-executed method for producing a global explanation for a black box machine learning model that is used to generate predictions for textual documents (regression modeling with text strings as input [0053]), comprising: 
tokenizing text data in a plurality of textual documents to produce a plurality of tokens (based on assets tokens determined from, e.g.,  sentences [0004], [0017] and [0041]); identifying a set of candidate important tokens from the plurality of tokens (token subset, such as for inputting the subset into  model [0053]); wherein the set of candidate important tokens is less all of the plurality of tokens (subset teaches “less” [0053]); 
for each candidate important token, of the set of candidate important tokens: 
determining a selectivity metric that indicates how selective said each candidate important token is for one or more predictions of a plurality of predictions of the black box machine learning model (based on regression analysis, perform correlation between each token of each token subset to possible categories [0053] and [0070]), and responsive to the selectivity metric for said each candidate important token satisfying inclusion criteria, including said each candidate important token in a set of important explainer tokens (based on the regression analysis, determine the significant relationships between each of the tokens of each token subset and possible categories, so that those tokens that are determined to be significant, are included as significant predictors in determining categories to correspond with the respective sentence [0053]); and generating explanation information for the black box machine learning model based on the set of important explainer tokens (categorical coverage results, such as with respect to coverage of each text string ([0053] and [0057])); wherein the method is performed by one or more computing devices (processor [0011]).

	However, in an effort to advance prosecution, Wheaton makes abundantly clear the assets of Bent as, specifically, documents, as follows:  tokenization including determining a set of word tokens for each document [0005].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the tokenization of determining set of word tokens for each document of Wheaton with the word token analysis of Bent to have documents. The combination would allow for, according to the motivation of Wheaton, among a plurality of documents, perform analysis of word tokens among the various documents, so that based on the determined word tokens, perform document-based analyses such as transfomrations between documents [0005], as such, summaries are produced based on the token analysis, such as a threshold occurrence of tokens among certain documents [0006]. 
Furthermore, Wheaton teaches:
a set of candidate important tokens from the plurality of tokens (word tokens of a given set of documents [0005]); wherein the set of candidate important tokens is less all of the plurality of tokens (word tokens of a given document cluster from among a larger set of documents [0005]); 
for each candidate important token, of the set of candidate important tokens: 
determining a selectivity metric that indicates how selective said each candidate important token is for one or more predictions of a plurality of predictions of the black box machine learning model (common word token determiner [0005]), and responsive to the selectivity metric for said each candidate important token satisfying inclusion criteria, including said each candidate important token in a set of important explainer tokens (produce common words for each determined occurrence of common words and further perform regression modeling based on a location threshold [0005]); and generating explanation information for the black box machine learning model based on the set of important explainer tokens (summary table of common words further determined based on regression modeling [0005]); wherein the method is performed by one or more computing devices (summary produced for user of a computer, such as for user feedback of the determined token analysis [0005]).

Regarding claim 2, Zeng in view of Bent in view of Wheaton teaches the limitations of claim 1, as above.
Furthermore, Wheaton teaches The computer-executed method of Claim 1, wherein: each textual document, of the plurality of textual documents, is associated with an associated prediction of the plurality of predictions (e.g., image hashes [0005]); and the prediction associated with each textual document, of the plurality of textual documents, is one of: 
generated for said each textual document using the black box machine learning model, or based on metadata associated with said each textual document (generated baszed on word tokens such as common words and locations [0005] and further generated based on regression model [0005]).

Regarding claim 3, Zeng in view of Bent in view of Wheaton teaches the limitations of claims 1 and 2, as above.
Furthermore, Zeng teaches The computer-executed method of Claim 2, further comprising: 
generating a perturbed document based on a particular document of the plurality of textual documents by performing a perturbation operation over a particular candidate important token of the set of candidate important tokens; wherein the perturbation operation is one of: token omission, context elimination, or token insertion (token substitution [0023]).

Furthermore, Wheaton teaches:
generating a perturbed document based on a particular document of the plurality of textual documents by performing a perturbation operation over a particular candidate important token of the set of candidate important tokens (of the common word tokens, remove tokens such as those that exceed a threshold [0005]); 
wherein the perturbation operation is one of: token omission, context elimination, or token insertion (token removal such as based on a regression analysis, further involving removing context such as non-textual tokens [0005]).

Regarding claim 4, Zeng in view of Bent in view of Wheaton teaches the limitations of claims 1-3, as above.
Furthermore, Wheaton teaches The computer-executed method of Claim 3, wherein: said generating the perturbed document based on the particular document is implemented by performing token omission for the particular candidate important token, of the set of candidate important tokens, within the particular document; wherein performing token omission comprises removing, from the particular document, one or more instances of the particular candidate important token to generate the perturbed document (not only extracting words from the document to generate the common word document/report [0005], but also filtering during regression round(s) [0232], [0239]; [0301] to [0307]).

Regarding claim 6, Zeng in view of Bent in view of Wheaton teaches the limitations of claims 1-3, as above.
Furthermore, Zeng teaches The computer-executed method of Claim 3, wherein: said generating the perturbed document based on the particular document is implemented by performing token insertion for the particular candidate important token, of the set of candidate important tokens, within the particular document; wherein performing token insertion comprises inserting, into the particular document, one or more instances of the particular candidate important token to generate the perturbed document (inserting a corresponding instance such as inserting “country” for “code” [0023]).

Furthermore, Wheaton discloses inserting into the common words document (fig. 27B) based on the source document [0005] common words determined to be significant based on rounds of regression, causing perturbations to common words document report 2762 [0095], [0233], [0234], and [0302] to [0307].  

Regarding claim 9, Zeng in view of Bent in view of Wheaton teaches the limitations of claims 1 and 2, as above.
Furthermore, Wheaton teaches The computer-executed method of Claim 2, wherein identifying the set of candidate important tokens from the plurality of tokens comprises: using feature selection to identify one or more tokens, of the plurality of tokens, that satisfy an inclusion criterion; and responsive to determining that the one or more tokens satisfy the inclusion criterion, including the one or more tokens in the set of candidate important tokens (filtering of words for analysis [0239]; see also [0305] and [0307]; for instance, filtering for the first and last instance of each common word for use in black box regression analysis [0307]). 

Regarding claim 10, Zeng in view of Bent in view of Wheaton teaches the limitations of claims 1, 2, and 9, as above.
Furthermore, Wheaton teaches The computer-executed method of Claim 9, wherein using feature selection to identify the one or more tokens, of the plurality of tokens, that satisfy the inclusion criterion comprises, for each prediction of the plurality of predictions: for each token of the plurality of tokens, computing an association metric that indicates an association between said each token and said each prediction with respect to all other predictions of the plurality of predictions; and determining that particular one or more tokens satisfy the inclusion criterion by determining that the particular one or more tokens have computed association metrics that are among a top number of computed association metrics for said each prediction (metrics for determining inclusion such as which word tokens and corresponding locations are for inclusion in a document determination  (fig. 27B), further involving inclusion within a perturbed document model based on one or more linear regressions in one or more rounds [0302]). 

Regarding claim 11, Zeng in view of Bent in view of Wheaton teaches the limitations of claim 1, as above
Furthermore, Wheaton teaches The computer-executed method of Claim 1, wherein determining a particular selectivity metric that indicates how selective a particular candidate important token, of the set of candidate important tokens, is for a particular prediction of the plurality of predictions comprises: 
generating a set of perturbed documents by performing one or more perturbation operations (rounds of regressions, each perturbing a document to produce a perturbed document structure, such as the table of common words 2762 (fig. 27B)), using the particular candidate important token, to perturb one or more documents of the plurality of textual documents, wherein each perturbed document, of the set of perturbed documents, is associated with a base document of the plurality of textual documents, for each perturbed document, of the set of perturbed documents, generating a prediction using the black box machine learning model to produce a first set of predictions corresponding to the set of perturbed documents (the perturbed document of Common words 2762 (fig. 27B) associated with base document (e.g., source document [0005])), and determining the particular selectivity metric for the particular candidate important token based, at least in part, on differences between the first set of predictions corresponding to the set of perturbed documents, and a second set of predictions associated with base documents for the set of perturbed documents (moving from different rounds of analysis, selectively determining included word tokens (e.g., common words) as based on multiple rounds of linear regressions performed based on common words may be excluded from subsequent rounds of regressions, each set of predictions associated with base documents (e.g.,soure document and, even further,  candidate templates) [0005], [0095] and [0096]).

Regarding claim 12, the claim recites similar limitations as claims 1 and 11; however, the following is made abundantly clear by Wheaton:
Wheaton teaches a computer-executed method for producing a global explanation for a black box machine learning model that is used to generate predictions for textual documents (computer apparatus for processing a base document including text (and image) data of documents [0005]), comprising: 
identifying a set of candidate important tokens from a plurality of tokens present in a plurality of textual documents (word tokens among a plurality of document (e.g., images) [0005]); 
wherein the set of candidate important tokens is less all of the plurality of tokens (just words common among documents [0005]); wherein each textual document, of the plurality of textual documents, is associated with a prediction of a plurality of predictions of the black box machine learning model (for performing predictions (e.g., regression analyses) based on the documents and corresponding word tokens [0005]); 
for each candidate important token, of the set of candidate important tokens: 
generating a set of perturbed documents by performing one or more perturbation operations, using said each candidate important token, to perturb one or more documents of the plurality of textual documents, wherein each perturbed document, of the set of perturbed documents, is associated with a base document of the plurality of textual documents, for each perturbed document, of the set of perturbed documents, generating a prediction using the black box machine learning model to produce a first set of predictions corresponding to the set of perturbed documents (the perturbation consisting of an extracted set of common words [0005], for use in performing predictions based on regression analyses to generate regression analysis predictions corresponding to the perturbed document of a respective document set of common words [0094]),
determining a token importance metric for said each candidate important token based, at least in part, on differences between the first set of predictions corresponding to the set of perturbed documents, and a second set of predictions associated with base documents for the set of perturbed documents (a second round or regression based on a first regression corresponding with predictions based on the perturbed documents corresponding with determined common words of document and a second round of regression-based predictions based on the first regression round and further associated with the original, base document from which the perturbed document formed of the extracted common words was determined (fig. 27B, [0094])), and responsive to the token importance metric for said each candidate important token satisfying inclusion criteria, including said each candidate important token in a set of important explainer tokens; and generating explanation information for the black box machine learning model based on the set of important explainer tokens; wherein the method is performed by one or more computing devices (a determined grouping of significant common words for determining a set of template words that are likely part of document template and can be considered template words and/or metadata words [0308]).

Regarding claim 13, Zeng teaches One or more non-transitory computer-readable media storing one or more sequences of instructions for producing a global explanation for a black box machine learning model that is used to generate predictions for textual documents, wherein the one or more sequences of instructions comprise instructions that, when executed by one or more processors, cause (memory, processor, and instructions (fig. 1)):
The claim recites similar limitations as claim 1 – see above.
 
Regarding claim 14, the claim recites similar limitations as claim 2 – see above.

Regarding claim 15, the claim recites the same limitations as claim 3 – see above

Regarding claim 16, the claim recites similar limitations as claim 4 – see above.

Regarding claim 18, the claim recites similar limitations as claim 6 – see above.

Regarding claim 21, the claim recites similar limitations as claim 9 – see above.

Regarding claim 22, the claim recites similar limitations as claim 10 – see above.

Regarding claim 23, the claim recites similar limitations as claim 11 – see above.

Regarding claim 24, the claim recites similar limitations as claims 1, 11, and 12; however, Wheaton makes abundantly clear the following:
one or more non-transitory computer-readable media storing one or more sequences of instructions for producing a global explanation for a black box machine learning model that is used to generate predictions for textual documents (regression rounds for processing a source document and generating perturbed document(s) [0005] and [0302]; executed on compute [0005] to [0020]), wherein the one or more sequences of instructions comprise instructions that, when executed by one or more processors, cause (processor and processing [0005]): 
identifying a set of candidate important tokens from a plurality of tokens present in a plurality of textual documents (word tokens such as each corresponding with words for each document image, each having a commonality [0005]); wherein the set of candidate important tokens is less all of the plurality of tokens (the common words only those that are common, from among all word tokens [0005]); wherein each textual document, of the plurality of textual documents, is associated with a prediction of a plurality of predictions of the black box machine learning model (based on documents, perform black box regression analysis including linear regressions based on locations of the set of common words [0005]); 
for each candidate important token, of the set of candidate important tokens: 
generating a set of perturbed documents by performing one or more perturbation operations (perturb by removing locations from the document table [0005]), using said each candidate important token, to perturb one or more documents of the plurality of textual documents (removing word tokens from the document table through at least a first round of regression analysis for perturbing the document common word table [0302]), wherein each perturbed document, of the set of perturbed documents, is associated with a base document of the plurality of textual documents (a base image document corresponding with an extraction of common words [0005]), for each perturbed document, of the set of perturbed documents, generating a prediction using the black box machine learning model to produce a first set of predictions corresponding to the set of perturbed documents (based on regression model, determine correspondence between perturbed common words and structure and candidate template in the set of candidate templates, thus identifying a transformation between the perturbance and the templates [0005]), determining a token importance metric for said each candidate important token based, at least in part, on differences between the first set of predictions corresponding to the set of perturbed documents, and a second set of predictions associated with base documents for the set of perturbed documents (determined statistical significance and further thresholds for determining correspondence between perturbed document and certain templates [0298], the second set of predictions being associated with word tokens and locations of respective base document (e.g., candidate template); in other words, the difference is a difference between word tokens and locations further for determining an importance metric including coefficients and significance values based on each word token (i.e., regression modeling) (fig. 27B)), and responsive to the token importance metric for said each candidate important token satisfying inclusion criteria, including said each candidate important token in a set of important explainer tokens (set of statistically significant word tokens and corresponding locations [0298]; based on the filtered set of words further based on common words [0301], perform regression analysis to determine significant common words suh as based on location ([0302] to [0308])); and generating explanation information for the black box machine learning model based on the set of important explainer tokens (determined words in the same location for corresponding with a template matching documents [0308]). 


Claim(s) 5 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zeng in view of Bent in view of Wheaton, as above, and in view of Das et al. (US 20200265316, Herein “Das”).
Regarding claim 5, Zeng in view of Bent in view of Wheaton teaches the limitations of claim 1-3, as above.
However, Zeng in view of Bent in view of Wheaton fails to specifically teach The computer-executed method of Claim 3, wherein: said generating the perturbed document based on the particular document is implemented by performing context elimination for the particular candidate important token, of the set of candidate important tokens, within the particular document; wherein performing context elimination comprises replacing, within the particular document, all tokens, other than one or more instances of the particular candidate important token, with neutral tokens to generate the perturbed document.
	Yet, in a related art, Das discloses word tokens [0043] for replacing duplicate tokens such as by retaining the more important parent token [0026], such as replacing with a count increment [0027].
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the context elimination based on neutral replacement of Das with the token analysis based on documents and regression black box analysis of Zeng in view of Bent in view of Wheaton to have said generating the perturbed document based on the particular document is implemented by performing context elimination for the particular candidate important token, of the set of candidate important tokens, within the particular document; wherein performing context elimination comprises replacing, within the particular document, all tokens, other than one or more instances of the particular candidate important token, with neutral tokens to generate the perturbed document. The combination would allow for, according to the motivation of Das, replacing duplicate tokens with a counter that measures the tokens replaced using a counter with respect to the more important (e.g., parent) token, thus making more efficient the amount of data necessary to represent the same meaning of the text prior to the replacement ([0026] and [0027]); see especially reduced resource requirement [0026].

Regarding claim 17, the claim recites similar limitations as claim 5 – see above.

Conclusion
Other art:
Patel et al. (US 20200265059, Herein “Patel”)
Wang et al. (US 20170068655, Herein “Wang”)
Chowdhury et al. (US 20180276196, Herein “Chwdhury”)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.

	/JASON T EDWARDS/              Examiner, Art Unit 2144