DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to claims filed 9 September 2022 for application 16/201,393. Claims 1, 4, 7, 10, 13, 16, and 20 have been amended. Currently claims 1-20 are pending and have been examined.
	
Response to Arguments
Applicant’s arguments, see pages 7-11, filed 9 September 2022, with respect to the feature “binary values indicative of whether the first machine learning model classified respective inputs in the first/second category” as recited in independent claim 1 (and similarly in independent claims 10 and 20) have been considered but are moot because the new ground of rejection (citing new reference Vasudev for teaching the new limitation) does not rely on any reference combination applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Furthermore, applicant's arguments, see pages 10 and 11, with respect to the rejection of the dependent claims under 35 USC § 103 have been fully considered but they are not persuasive because these claims depend from one of the independent claims 1, 10, or 20 and the combination of references cited teach every element of the amended claims as shown below.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 5-11, 14-17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro et al (“Why Should I Trust You?” Explaining the Predictions of Any Classifier, 2016) in view of Martens et al (US 20140229164 A1) and further in view of Vasudev (What is One Hot Encoding? Why And When do you have to use it?, 2017).
Regarding claim 1
Ribeiro teaches: A method comprising: training, using a first set of training data, to produce a first machine learning model that generates an output indicative of a classification of an input in one of a plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 (right side), we explain the predictions of a support vector machine with RBF kernel trained on uni-grams to differentiate \Christianity" from \Atheism" (on a subset of the 20 newsgroup dataset). Note: Also see Fig. 2 (left side). Note: Support vector machine with RBF kernel trained corresponds to training to produce a first machine learning model);
generating a particular output indicative of a particular input being classified in a first category of the plurality of categories (Note: Figure 2 (left box) shows Algorithm 1 prediction as Atheism. The Document shown below it corresponds to the input. Atheism corresponds to particular output indicative of a particular input being classified in a first category of the plurality of categories. Atheism and Christianity correspond to the plurality of categories);
generating a second set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in the first category of the plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 we explain the predictions of a support vector machine with RBF kernel trained. Note: Support vector machine with RBF kernel trained corresponds to a first machine learning model. Figure 2 (left box) shows Algorithm 1 classifying the magenta words classified as Atheism which corresponds to the first category);
generating a third set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in a second category of the plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 we explain the predictions of a support vector machine with RBF kernel trained. Note: Support vector machine with RBF kernel trained corresponds to a first machine learning model. Figure 2 shows Algorithm 1 classifying the green words classified as Christianity which corresponds to the second category);
training using the second and third sets of training data, to produce a second model that generates an explanation output matching the particular output for the particular input ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifer in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. [Page 3, Column 2, Section 3.2, Paragraph 2] Let the model being explained be denoted f : Rd ! R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. [Page 4, Column 1, Figure 3: Legend] The bold red cross is the instance being explained. Note: Learning an interpretable model locally around the prediction corresponds to training the second model and the explanation corresponds to the output);
and producing, using the second model, an explanation of the decision-making process of the machine learning model ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. Note: An interpretable model locally around the prediction corresponds to the second model).
However, Ribeiro does not explicitly disclose: receiving a query to explain a decision-making process of the machine learning model; binary values, responsive to the query; and in response to the query.
Martens teaches, in an analogous system: receiving a query to explain a decision-making process of the first machine learning model ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision or a developer is examining model performance on historical cases. Alternatively, a developer may be exploring decision-making performance by giving the system a set of theoretical test cases. In both scenarios, it is necessary for the system to provide explanations for specific individual cases. Note: Question corresponds to query); 
responsive to the query; and in response to the query ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision. Note: Question corresponds to query).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ribeiro to incorporate the teachings of Martens to receive a query to explain a decision-making process of the machine learning model. One would have been motivated to do this modification because doing so would give the benefit of making every decision that the system actually makes understood as taught by Martens paragraph [0027].
Vasudev teaches, in an analogous system: binary values ([Page 3, Paragraph 2] This is why we use one hot encoder to perform “binarization” of the category and include it as a feature to train the model. Note: One hot encoding teaches a way to encode labels in binary).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Ribeiro and Martens to incorporate the teachings of Vasudev to use binary values. One would have been motivated to do this modification because doing so would give the benefit of using one hot encoder to perform “binarization” of the category and include it as a feature to train the model as taught by Vasudev [Page 3, Paragraph 2].

Regarding claim 2
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 1, wherein the first set of training data comprises (as shown above).
However, Ribeiro does not explicitly disclose: a set of data elements, each element including a corresponding category label.
Martens teaches, in an analogous system: a set of data elements, each element including a corresponding category label ([0079] Classification models can be generated using a training set of labeled documents).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to use labels. One would have been motivated to do this modification because doing so would give the benefit of mapping it to a score representing the likelihood of belonging to the class as taught by Martens paragraph [0079].




Regarding claim 5
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 4, training to produce the second model further comprising:  from the second set of training data, from the first set of training data (as shown above).
However, Ribeiro does not explicitly disclose: comparing a generated category label to a category label.
Martens teaches, in an analogous system: comparing a generated category label to a category label ([0079] comparing the true label with the predicted label. Note:  predicted label corresponds to the generated category label and true label corresponds to category label from the first set of training data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to compare true label with predicted label. One would have been motivated to do this modification because doing so would give the benefit of assessing the performance of the model as taught by Martens paragraph [0079].

Regarding claim 6
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 5 (as shown above).
Ribeiro further teaches: re-training ([Page 7, Column 1, Section 5.4, Paragraph 1]  repeatedly training).  
However, Ribeiro does not explicitly disclose: further comprising: generating, in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element; and in response to generating a new set of training data, the second model using the new set of training data.
Martens further teaches: : further comprising: generating, in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element ([0084] replace the given class labels of data instances with those provided (e.g., predicted) by the black box model. By applying a rule or tree induction technique on this new data set, the resulting model is a comprehensible tree or rule set that can explain the functioning of the black box model. Note: The given class labels corresponds to the category label, provided/predicted label corresponds to the generated category label and new data set corresponds to new set of training data);
and in response to generating a new set of training data, the second model using the new set of training data (By applying a rule or tree induction technique on this new data set, the resulting model is a comprehensible tree or rule set that can explain the functioning of the black box model [0084]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ribeiro to incorporate the teachings of Martens to generate, in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element, and in response to generating a new set of training data, the second model using the new set of training data. One would have been motivated to do this modification because doing so would give the benefit of resulting in a  model that is a comprehensible tree or rule set that can explain the functioning of the black box mode as taught by Martens paragraph [0084].
Regarding claim 7
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 1 (as shown above).
Ribeiro further teaches: wherein the first machine learning model is a neural network ([Page 1, Column 1, Abstract, Paragraph 2] e.g. neural networks).

Regarding claim 8
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 1 (as shown above).
Ribeiro further teaches: wherein the second model is a decision tree ([Page 3, Column 2, Section 3.2, Paragraph 1] Formally, we define an explanation as a model g 2 G, where G is a class of potentially interpretable models, such as linear models, decision trees).

Regarding claim 9
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 1 (as shown above).
Ribeiro further teaches: wherein the method is embodied in a computer program product comprising one or more computer- readable storage devices and computer-readable program instructions which are stored on the one or more computer- readable tangible storage devices and executed by one or more processors ([Page 4, Column 1, Paragraph 1] on a laptop).

Regarding claim 10
Ribeiro teaches: A computer usable program product for generating result explanations for neural networks, the computer usable program product comprising a computer-readable storage device, and program instructions stored on the storage device, the stored program instructions comprising: ([Page 4, Column 1, Paragraph 1] on a laptop. [Page 1, Column 1, Abstract, Paragraph 2] (e.g. neural networks). [Page 1, Column 1, Abstract, Paragraph 2] In this work, we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction): 
program instructions to train, using a first set of training data, to produce a first machine learning model that generates an output indicative of a classification of an input in one of a plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 (right side), we explain the predictions of a support vector machine with RBF kernel trained on uni-grams to differentiate \Christianity" from \Atheism" (on a subset of the 20 newsgroup dataset). Note: Also see Fig. 2 (left side). Note: Support vector machine with RBF kernel trained corresponds to training to produce a first machine learning model); 
generating a particular output indicative of a particular input being classified in a first category of the plurality of categories (Note: Figure 2 (left box) shows Algorithm 1 prediction as Atheism. The Document shown below it corresponds to the input. Atheism corresponds to particular output indicative of a particular input being classified in a first category of the plurality of categories. Atheism and Christianity correspond to the plurality of categories);
program instructions to generate a second set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in the first category of the plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 we explain the predictions of a support vector machine with RBF kernel trained. Note: Support vector machine with RBF kernel trained corresponds to a first machine learning model. Figure 2 (left box) shows Algorithm 1 classifying the magenta words classified as Atheism which corresponds to the first category);
program instructions to generate a third set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in a second category of the plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 we explain the predictions of a support vector machine with RBF kernel trained. Note: Support vector machine with RBF kernel trained corresponds to a first machine learning model. Figure 2 shows Algorithm 1 classifying the green words classified as Christianity which corresponds to the second category);
program instructions to train using the second and third sets of training data, to produce a second model that generates an explanation output matching the particular output for the particular input ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. [Page 3, Column 2, Section 3.2, Paragraph 2] Let the model being explained be denoted f : Rd ! R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. [Page 4, Column 1, Figure 3: Legend] The bold red cross is the instance being explained. Note: Learning an interpretable model locally around the prediction corresponds to training the second model and the explanation corresponds to the output);
and program instructions to produce,  using the second model, an explanation of the decision-making process of the first machine learning model ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. Note: An interpretable model locally around the prediction corresponds to the second model).
However, Ribeiro does not explicitly disclose: program instructions to receive a query to explain a decision-making process resulting in the machine learning model; binary values; responsive to the query, and in response to the query.
Martens teaches, in an analogous system: program instructions to receive a query to explain a decision-making process resulting in the first machine learning model ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision or a developer is examining model performance on historical cases. Alternatively, a developer may be exploring decision-making performance by giving the system a set of theoretical test cases. In both scenarios, it is necessary for the system to provide explanations for specific individual cases. Note: Question corresponds to query).
responsive to the query; and in response to the query ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision. Note: Question corresponds to query).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ribeiro to incorporate the teachings of Martens to receive a query to explain a decision-making process of the machine learning model. One would have been motivated to do this modification because doing so would give the benefit of making every decision that the system actually makes understood as taught by Martens paragraph [0027].
Vasudev teaches, in an analogous system: binary values ([Page 3, Paragraph 2] This is why we use one hot encoder to perform “binarization” of the category and include it as a feature to train the model. Note: One hot encoding teaches a way to encode labels in binary).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Ribeiro and Martens to incorporate the teachings of Vasudev to use binary values. One would have been motivated to do this modification because doing so would give the benefit of using one hot encoder to perform “binarization” of the category and include it as a feature to train the model as taught by Vasudev [Page 3, Paragraph 2].

Regarding claim 11
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 10, wherein the first set of training data comprises (as shown above).
However, Ribeiro does not explicitly disclose: a set of data elements, each element including a corresponding category label.
Martens teaches, in an analogous system: a set of data elements, each element including a corresponding category label ([0079] Classification models can be generated using a training set of labeled documents).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens wherein in a set of data elements, each element includes a corresponding category label. One would have been motivated to do this modification because doing so would give the benefit of mapping it to a score representing the likelihood of belonging to the class as taught by Martens paragraph [0079].
Regarding claim 14
The system of Ribeiro, Martens, and Vasudev teaches:  The computer usable program product of claim 13, the stored program instructions further comprising: program instructions to, from the second set of training data,  from the first set of training data (as shown above).
However, Ribeiro does not explicitly disclose: compare a generated category label to a category label.
Martens teaches, in an analogous system: compare a generated category label to a category label ([0079] comparing the true label with the predicted label. Note:  predicted label corresponds to the generated category label and true label corresponds to category label from the first set of training data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to compare a generated category label to a category label. One would have been motivated to do this modification because doing so would give the benefit of assessing the performance of the model as taught by Martens paragraph [0079].

Regarding claim 15
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 14, the stored program instructions further comprising: program instructions to generate (as shown above) 
Ribeiro further teaches: and program instructions to re-train ([Page 7, Column 1, Section 5.4, Paragraph 1]  repeatedly training).
However, Ribeiro does not explicitly disclose: in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element; in response to generating a new set of training data, the second model using the new set of training data.
Martens teaches, in an analogous system: in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element ([0084] replace the given class labels of data instances with those provided (e.g., predicted) by the black box model. By applying a rule or tree induction technique on this new data set, the resulting model is a comprehensible tree or rule set that can explain the functioning of the black box model. Note: The given class labels corresponds to the category label, provided/predicted label corresponds to the generated category label and new data set corresponds to new set of training data); 
in response to generating a new set of training data, the second model using the new set of training data ([0084] By applying a rule or tree induction technique on this new data set, the resulting model is a comprehensible tree or rule set that can explain the functioning of the black box model).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to generate, in response to the generated category label differing from the category label, a new set of training data, the new set of training data comprising the generated category label and the corresponding data element, and in response to generating a new set of training data, the second model using the new set of training data. One would have been motivated to do this modification because doing so would give the benefit of resulting in a  model that is a comprehensible tree or rule set that can explain the functioning of the black box mode as taught by Martens paragraph [0084].
Regarding claim 16
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 10 (as shown above) 
Ribeiro further teaches: wherein the first machine learning model is a neural network ([Page 1, Column 1, Abstract, Paragraph 2] e.g. neural networks).

Regarding claim 17
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 10 (as shown above) 
Ribeiro further teaches: wherein the second model is a decision tree ([Page 3, Column 2, Section 3.2, Paragraph 1] Formally, we define an explanation as a model g 2 G, where G is a class of potentially interpretable models, such as linear models, decision trees).

Regarding claim 20
Ribeiro teaches: A computer system for generating result explanations for neural networks, the computer system comprising a processor, a computer-readable memory, and a computer- readable storage device, and program instructions stored on the storage device for execution by the processor via the memory, the stored program instructions comprising ([Page 4, Column 1, Paragraph 1] on a laptop. [Page 1, Column 1, Abstract, Paragraph 2] In this work, we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks).): 
program instructions to train, using a first set of training data, to produce a first machine learning model that generates an output indicative of a classification of an input in one of a plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 (right side), we explain the predictions of a support vector machine with RBF kernel trained on uni-grams to differentiate \Christianity" from \Atheism" (on a subset of the 20 newsgroup dataset). Note: Also see Fig. 2 (left side). Note: Support vector machine with RBF kernel trained corresponds to training to produce a first machine learning model); 
generating a particular output indicative of a particular input being classified in a first category of the plurality of categories (Note: Figure 2 (left box) shows Algorithm 1 prediction as Atheism. The Document shown below it corresponds to the input. Atheism corresponds to particular output indicative of a particular input being classified in a first category of the plurality of categories. Atheism and Christianity correspond to the plurality of categories);
program instructions to generate a second set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in the first category of the plurality of categories ([Page 4, Column 2, Section 3.5] In Figure 2 we explain the predictions of a support vector machine with RBF kernel trained. Note: Support vector machine with RBF kernel trained corresponds to a first machine learning model. Figure 2 (left box) shows Algorithm 1 classifying the magenta words classified as Atheism which corresponds to the first category);
program instructions to generate a third set of training data by modifying the first set of training data to include values indicative of whether the first machine learning model classified respective inputs in a second category of the plurality of categories (Note: Figure 2 shows Algorithm 1 classifying the green words classified as Christianity which corresponds to the second category);
program instructions to train using the second and third sets of training data, to produce a second model that generates an explanation output matching the particular output for the particular input ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. [Page 3, Column 2, Section 3.2, Paragraph 2] Let the model being explained be denoted f : Rd ! R. In classification, f(x) is the probability (or a binary indicator) that x belongs to a certain class. [Page 4, Column 1, Figure 3: Legend] The bold red cross is the instance being explained. Note: Learning an interpretable model locally around the prediction corresponds to training the second model and the explanation corresponds to the output);
and program instructions to produce, using the second model, an explanation of the decision-making process of the first machine learning model ([Page 1, column 1, Abstract, Paragraph 2] we propose LIME, a novel explanation tech-nique that explains the predictions of any classifier in an in-terpretable and faithful manner, by learning an interpretable model locally around the prediction. Note: An interpretable model locally around the prediction corresponds to the second model).
However, Ribeiro does not explicitly disclose: program instructions to receive a query to explain a decision-making process resulting in the machine learning model; binary values; responsive to the query; and in response to the query.
Martens teaches, in an analogous system: program instructions to receive a query to explain a decision-making process resulting in the first machine learning model; ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision or a developer is examining model performance on historical cases. Alternatively, a developer may be exploring decision-making performance by giving the system a set of theoretical test cases. In both scenarios, it is necessary for the system to provide explanations for specific individual cases. Note: Question corresponds to query).
responsive to the query; in response to the query ([0026] Often, this need for individual case explanations can arise because particular decisions need to be justified after the fact, because (for example) a customer questions the decision. Note: Question corresponds to query).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Ribeiro to incorporate the teachings of Martens to receive a query to explain a decision-making process of the machine learning model. One would have been motivated to do this modification because doing so would give the benefit of making every decision that the system actually makes understood as taught by Martens paragraph [0027].
Vasudev teaches, in an analogous system: binary values ([Page 3, Paragraph 2] This is why we use one hot encoder to perform “binarization” of the category and include it as a feature to train the model. Note: One hot encoding teaches a way to encode labels in binary).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Ribeiro and Martens to incorporate the teachings of Vasudev to use binary values. One would have been motivated to do this modification because doing so would give the benefit of using one hot encoder to perform “binarization” of the category and include it as a feature to train the model as taught by Vasudev [Page 3, Paragraph 2].


Claims 3, 4, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro et al (“Why Should I Trust You?” Explaining the Predictions of Any Classifier, 2016) in view of Martens et al (US 20140229164 A1)  and Vasudev (What is One Hot Encoding? Why And When do you have to use it?, 2017) and further in view of Aslan et al (US 20170132528 A1).
Regarding claim 3
The system of Ribeiro, Martens, and Vasudev teaches: The method of claim 2 (as shown above).
Ribeiro further teaches: first set of training data ([Page 4, Column 2, Section 3.5] on a subset of the 20 newsgroup dataset).
However, the system of Ribeiro and Martens does not explicitly disclose: further comprising: filtering the... training data to remove the corresponding category label from the set of data elements to produce a filtered set of training data.
Aslan teaches, in an analogous system: further comprising: filtering the... training data to remove the corresponding category label from the set of data elements to produce a filtered set of training data ([0032] the training data 104 can be utilized by “throwing away” labels, if necessary, and processing the unlabeled training data 104).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro, Martens, and Vasudev to incorporate the teachings of Aslan to filter the training data to remove the corresponding category label from the set of data elements to produce a filtered set of training data. One would have been motivated to do this modification because doing so would give the benefit of helping each model learn how the other model thinks, which factors into its own training as taught by Aslan paragraph [0032].

Regarding claim 4
The system of Ribeiro, Martens, Vasudev, and Aslan teaches: The method of claim 3 (as shown above).
Ribeiro further teaches: further comprising: generating, using the first machine learning model, the second set of training data, the second set of training data comprising the set of data elements, each element (as shown above).
However, Ribeiro does not explicitly disclose: based on the filtered set of training data and including a generated category label.
Martens teaches, in an analogous system: including a generated category label ([0079] the predicted label. Note:  predicted label corresponds to the generated category label).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to include a generated category label. One would have been motivated to do this modification because doing so would give the benefit of assessing the performance of the model by comparing the true label with the predicted label as taught by Martens paragraph [0079].
Aslan teaches, in an analogous system: based on the filtered set of training data ([0032] the training data 104 can be utilized by “throwing away” labels, if necessary, and processing the unlabeled training data 104).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro and Martens to incorporate the teachings of Aslan to base on the filtered set of training data. One would have been motivated to do this modification because doing so would give the benefit of helping each model learn how the other model thinks, which factors into its own training as taught by Aslan paragraph [0032].

Regarding claim 12
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 11 (as shown above).
Ribeiro further teaches: the stored program instructions further comprising: program instructions ...the first set of training data ([Page 4, Column 2, Section 3.5] on a subset of the 20 newsgroup dataset).
However, the system of Ribeiro and Martens does not explicitly disclose: to filter ... to remove the corresponding category label from the set of data elements to produce a filtered set of training data.
Aslan teaches, in an analogous system: to filter ... to remove the corresponding category label from the set of data elements to produce a filtered set of training data ([0032] the training data 104 can be utilized by “throwing away” labels, if necessary, and processing the unlabeled training data 104).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro, Martens, and Vasudev to incorporate the teachings of Aslan to filter the training data to remove the corresponding category label from the set of data elements to produce a filtered set of training data. One would have been motivated to do this modification because doing so would give the benefit of helping each model learn how the other model thinks, which factors into its own training as taught by Aslan [0032].

Regarding claim 13
The system of Ribeiro, Martens, Vasudev, and Aslan teaches: The computer usable program product of claim 12 (as shown above).
Ribeiro further teaches: the stored program instructions further comprising: program instructions to generate, using the first machine learning model, the second set of training data, the second set of training data comprising the set of data elements, each element (as shown above).
However, Ribeiro does not explicitly disclose: based on the filtered set of training data and including a generated category label.
Martens teaches, in an analogous system: including a generated category label ([0079] the predicted label. Note:  predicted label corresponds to the generated category label).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro to incorporate the teachings of Martens to include a generated category label. One would have been motivated to do this modification because doing so would give the benefit of assessing the performance of the model by comparing the true label with the predicted label as taught by Martens paragraph [0079].
Aslan teaches, in an analogous system: based on the filtered set of training data ([0032] the training data 104 can be utilized by “throwing away” labels, if necessary, and processing the unlabeled training data 104).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro, Martens, and Vasudev to incorporate the teachings of Aslan to base on the filtered set of training data. One would have been motivated to do this modification because doing so would give the benefit of helping each model learn how the other model thinks, which factors into its own training as taught by Aslan paragraph [0032].

Claims 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro et al (“Why Should I Trust You?” Explaining the Predictions of Any Classifier, 2016) in view of Martens et al (US 20140229164 A1) and Vasudev (What is One Hot Encoding? Why And When do you have to use it?, 2017) and further in view of Chandrasekaran et al (US 20180129704 A1).
Regarding claim 18
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 10, wherein the computer usable program product is stored in a computer readable storage device (as shown above).
However, the system of Ribeiro, Martens, and Vasudev does not explicitly disclose: in a data processing system and wherein the computer usable program product is transferred over a network from a remote data processing system.
Chandrasekaran teaches, in an analogous system: in a data processing system and wherein the computer usable program product is transferred over a network from a remote data processing system ([0082] Those instructions or code may be stored in a computer readable storage medium in a data processing system after being downloaded over a network from a remote data processing system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro, Martens, and Vasudev to incorporate the teachings of Chandrasekaran wherein the computer usable program product is transferred over a network from a remote data processing system. One would have been motivated to do this modification because doing so would give the benefit of allowing the code to be used within the remote system as taught by Chandrasekaran [0082].

Regarding claim 19
The system of Ribeiro, Martens, and Vasudev teaches: The computer usable program product of claim 10, wherein the computer usable program product is stored in a computer readable storage device (as shown above).
However, the system of Ribeiro, Martens, and Vasudev does not explicitly disclose: in a server data processing system, and wherein the computer usable program product is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system.
Chandrasekaran teaches, in an analogous system: in a server data processing system, and wherein the computer usable program product is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system ([0082] Or, those instructions or code may be stored in a computer readable storage medium in a server data processing system and adapted to be downloaded over a network to a remote data processing system for use in a computer readable storage medium within the remote system).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Ribeiro, Martens, and Vasudev to incorporate the teachings of Chandrasekaran wherein the computer usable program product is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system. One would have been motivated to do this modification because doing so would give the benefit of allowing the code to be used within the remote system as taught by Chandrasekaran paragraph [0082].


	
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
Guo et al (2016) discloses Entity Embeddings of Categorical Variables.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                                        
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128