DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings are deemed acceptable for the purpose of examination.
Specification
The specification is deemed acceptable for the purpose of examination.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1 recites a computer-implemented method, comprising: receiving a single instance of training data; simplifying the single instance of training data to create a single instance of simplified training data; generating a plurality of training data variants, based on the single instance of simplified training data; and training a machine learning model, utilizing the plurality of training data variants.
The limitation of simplifying the single instance of training data to create a single instance of simplified training data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation 
The limitation of generating a plurality of training data variants, based on the single instance of simplified training data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user analyzing a training data and splitting the training data into multiple parts which is essentially “generating” training data variants.
If a claim limitation, under its broadest reasonable interpretation, covers performance
of the limitation in the mind but for the recitation of generic computer components, then it falls under the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an
abstract idea.
	The judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – training a machine learning model, utilizing the plurality of training data variants. Training a machine learning model is recited at a high level of generality (i.e. as a generic computer function of training a model) such that it amounts no more than mere instructions to apply the exception using a generic computing function. Further, the claim recites the receiving step (receiving a single instance of training data). The receiving step is recited at a high level of generality and amounts to mere data gathering which 
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of training a machine learning model amounts to no more than mere instructions to apply the exception using a generic computing component. Mere instructions to apply an exception using generic computing components cannot provide an inventive concept. Further, the receiving step is considered to be extra-solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2B to determine if it is more than what is well-understood, routine, conventional activity in the field. The court decisions cited in MPEP 2106.05(d)(II) indicate that merely “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving step is well-understood, routine, conventional activity is supported under Berkheimer.
This claim is not patent eligible under U.S.C. 101.
	Claim 2 recites the computer-implemented method of Claim 1, wherein the training data includes textual data. This limitation, as drafted, is a process that, under
its broadest reasonable interpretation, covers performance of the limitation in the mind. That
is, nothing in the claim limitation precludes the step from practically being performed in the
mind.

claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 3 recites the computer-implemented method of Claim 1, wherein the training data has an associated label. This limitation, as drafted, is a process that, under
its broadest reasonable interpretation, covers performance of the limitation in the mind. That
is, nothing in the claim limitation precludes the step from practically being performed in the
mind.
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.

its broadest reasonable interpretation, covers performance of the limitation in the mind. That
is, nothing in the claim limitation precludes the step from practically being performed in the
mind. For example, “simplifying”, in the context of the claim, encompasses a user analyzing training data and stemming the data items essentially reducing a word to its base form. For example, a user can analyze the word studying and stem it to its base form of study.
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 5 recites the computer-implemented method of Claim 1, wherein simplifying the single instance of training data includes replacing one or more terms within the single instance of training data with a genericized term. This limitation, as drafted, is a process that, under
its broadest reasonable interpretation, covers performance of the limitation in the mind. That
is, nothing in the claim limitation precludes the step from practically being performed in the

This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 6 recites the computer-implemented method of Claim 1, wherein simplifying the single instance of training data includes discarding one or more terms within the single instance of training data. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “simplifying”, in the context of the claim, encompasses a user analyzing date and discarding words he/she deems not relevant. For example, a user analyzes a data and decides to throw away stop words such as “to” and “he”.
 This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing

The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 7 recites the computer-implemented method of Claim 1, wherein simplifying the single instance of training data includes adjusting a length of the single instance of training data. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “simplifying”, in the context of the claim, encompasses a user analyzing data and manually adjusting lengths of data. For example, a user may see a string “datadata” and adjust it so it now reads “data”.
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 8 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes adjusting the single instance of training data in a 
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 9 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes changing an order of words within the single instance of simplified training data to create one of the plurality of training data variants. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user analyzing data and manually changing the order of a words in a string. For 
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 10 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes substituting a first word within the single instance of simplified training data with a second word determined to be similar to the first word. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user analyzing data and substituting words with synonyms of the word. For example, a user may see the word “good” and replaces it with the synonym “satisfactory”.
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing

The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 11 recites the computer-implemented method of Claim 1, wherein each of the plurality of training data variants are given a same associated label as the single instance of training data. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user creating variants of data and manually giving the same label as a training data to a training data variant.
This judicial exception is not integrated into a practical application. In particular, the
claim does not recite any additional elements. Accordingly, this does not integrate the abstract
idea into a practical application because it does not impose any meaningful limits on practicing
the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to the
integration of the abstract idea into a practical application, no additional elements are cited.
This claim is not patent eligible under U.S.C. 101.
	Claim 12 recites the computer-implemented method of Claim 1, wherein each of the plurality of training data variants are input into the machine learning model to train the 
This judicial exception is not integrated into a practical application. In particular, the claim does recite an additional element – training data variants are input into the machine learning model to train the machine learning model. Inputting is recited at a high level of generality (i.e. as a generic component) such that it amounts to no more than mere instructions to apply the exception using a generic component. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. In particular, the additional element of inputting amounts to no more than mere instructions to apply the exception using a generic component. 
Mere instructions to apply an exception using generic computing components cannot provide an inventive concept. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
This claim is not patent eligible under U.S.C. 101.
	Claim 13 recites the computer-implemented method of Claim 1, wherein the machine learning model is an artificial neural network (ANN). This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. 

The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. In particular, the additional element of the machine learning model being an artificial neural network amounts to no more than mere instructions to apply the exception using a generic component. Additionally, limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). This claim limitation is merely suggesting a field of use or technological environment in which to apply the exception such that it amounts to no more than merely linking the technological environment to a neural network. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
This claim is not patent eligible under U.S.C. 101.
	Claims 14-22 are rejected on the same grounds as claims 1-9 respectively
	Claim 23 is rejected on the same grounds as claim 1 respectively
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 12-16, and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20180189265 A1 to Chen, et al. (hereinafter, “Chen”), in view of U.S. Pub. No. US 20170061330 A1 to Kurata (hereinafter, “Kurata”)
As per claim 1, Chen teaches a computer-implemented method, comprising:
simplifying the single instance of training data to create a single instance of simplified training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing (Concurrence graphs, document anchors are simplified training data that have been preprocessed and simplified. Note that the training data to be simplified will come from Kurata as disclosed later below))
(Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” (vector embeddings are variants of the concurrence graphs which are simplified training data))
Chen fails to explicitly teach:
receiving a single instance of training data
training a machine learning model, utilizing the plurality of training data variants
However, Kurata (Kurata addresses the issue of training a classification based model) teaches:
receiving a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” (It is certainly implied that training data has to be received))
training a machine learning model, utilizing the plurality of training data variants (Kurata,  Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data” (Note that training data variants to be used for training comes from Chen))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify the training data pre-processing and generation of training data variants as disclosed by Chen to use the training data for training as disclosed by Kurata  The combination would have been obvious because a person of ordinary 

As per claim 2, the combination of Chen and Kurata teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein the training data includes textual data (Kurata, Para. [0031] discloses “In other optional embodiment according to the present invention, the training input is a text in a form of a natural sentence or representation of the natural sentence and each correct label is an attribute assigned for the text” and Para. [0097] discloses “In the describing embodiment, the training data 240 includes one or more instances of the training data, each of which has training input text such as news articles”)
Same motivation to combine Chen and Kurata as claim 1

As per claim 3, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein the training data has an associated label. (Kurata, Para. [0025] discloses “Each training data has a training input and one or more correct labels assigned to the training input.”)
Same motivation to combine Chen and Kurata as claim 1
As per claim 12, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein each of the plurality of training data variants are input into the machine learning model to train the machine learning model (Kurata, Para. [0025] discloses “Also the method includes training the classification model using the one or more training data” (Training implies data has to be input to the model. Note that training data to be input are the training data variants from Chen as shown in claim 1))
Same motivation to combine Chen and Kurata as claim 1

As per claim 13, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein the machine learning model is an artificial neural network (ANN). (Kurata, Para. [0048] discloses “Referring to FIG. 3, architecture 150 of the NLQ classification model 110 is depicted. In the describing embodiment, the NLQ classification model 110 is a neural network based classification model.” (an artificial neural network is a neural network))
Same motivation to combine Chen and Kurata as claim 1

As per claim 14, Chen teaches A computer program product for adjusting training data for a machine learning processor:
simplifying, by the processor, the single instance of training data to create a single instance of simplified training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing and Fig. 6 discloses processors 602 (Concurrence graphs, document anchors are simplified training data that have been preprocessed and simplified. Note that the training data to be simplified will come from Kurata as disclosed later below))
generating, by the processor, a plurality of training data variants, based on the single instance of simplified training data (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data))
Chen fails to explicitly teach:
the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method comprising
receiving, by the processor, a single instance of training data
training, by the processor, a machine learning model, utilizing the plurality of training data variant
However, Kurata teaches:
the computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the (Kurata, Para. [0168] discloses “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se” and Para. [0167] discloses “The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention”)
receiving, by the processor, a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” and Fig. 10 discloses processing unit 16 (It is certainly implied that training data has to be received))
training, by the processor, a machine learning model, utilizing the plurality of training data variants (Kurata,  Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data” and Fig. 10 discloses processing unit 16 (Note that training data variants to be used for training comes from Chen))
Same motivation to combine Chen and Kurata as claim 1
.
As per claim 15, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Kurata further teaches:
wherein the training data includes textual data (Kurata, Para. [0031] discloses “In other optional embodiment according to the present invention, the training input is a text in a form of a natural sentence or representation of the natural sentence and each correct label is an attribute assigned for the text” and Para. [0097] discloses “In the describing embodiment, the training data 240 includes one or more instances of the training data, each of which has training input text such as news articles”)
Same motivation to combine Chen and Kurata as claim 1

As per claim 16, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Kurata further teaches:
wherein the training data has an associated label. (Kurata, Para. [0025] discloses “Each training data has a training input and one or more correct labels assigned to the training input.”)
Same motivation to combine Chen and Kurata as claim 1

As per claim 23, Chen teaches a system comprising:
a processor (Chen, Fig. 6 discloses processors 602)
simplify the single instance of training data to create a single instance of simplified training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Concurrence graphs, document anchors, and tokenized text sequences are simplified training data that have been preprocessed and simplified. Note that the training data to be simplified will come from Kurata as disclosed later below))
(Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data))
Chen fails to explicitly teach:
and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to
receive a single instance of training data
train a machine learning model, utilizing the plurality of training data variants
However, Kurata teaches:
and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to (Kurata, Para. [0069] discloses “…program codes according to the embodiment of the present invention are loaded on a memory and executed by a processer”):
receive a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” (It is certainly implied that training data has to be received))
train a machine learning model, utilizing the plurality of training data variants (Kurata,  Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data” (Note that training data variants to be used for training comes from Chen))
Same motivation to combine Chen and Kurata as claim 1

As per claim 24, Chen teaches a computer-implemented method comprising:
simplifying the instance of data to create an instance of simplified data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Concurrence graphs, document anchors, and tokenized text sequences are simplified data that have been preprocessed and simplified. Note that the data to be simplified will come from Kurata as disclosed later below))
Chen fails to explicitly teach:
receiving an instance of data
applying the instance of simplified data to a trained machine learning model
and receiving a label prediction for the instance of simplified data from the trained machine learning model
However, Kurata teaches:
receiving an instance of data (Kurata, Fig. 2 discloses an input query 112 (instance of data) and Para. [0047] discloses “As shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query”)
(Kurata, Fig. 2 discloses an input query being fed into a trained model and Para. [0050] discloses “The NLQ classification model 110 may need to accept queries with variable length. The NLQ classification model 110 receives an input query in a form of natural sentence like “Where should I visit in Japan?” by the query input layer 152. Words in the input query are first subjected to appropriate pre-processing such as stop word removal, and then the processed words 154 are converted into distributed representation in the distributed representation layer 156. The convolutional layer 158 may have k kernels to produce k feature maps. Each feature map is then subsampled typically mean or max pooling. By applying convolution 158 and sub-sampling 160 over time, a fixed-length feature vectors are extracted from the distributed representation layer 156 into the top hidden layer 162. Then, the fixed-length feature vectors are then fed into the label prediction layer 164 to predict the one or more document labels 114 for the input query 112” (Note that although Kurata does some form of pre-processing before inputting the input data into a model, the pre-processed data is to come from Chen))
and receiving a label prediction for the instance of simplified data from the trained machine learning model (Kurata, Fig. 2 discloses a predicted document label and Para. [0051] discloses “The label prediction layer 164 has a plurality of units each corresponding to each predefined document label that is a document identifier identifying a document having an answer for the query. The document labels can be defined as labels appeared in the training data 140. The number of the units in the label prediction layer 164 may be same as the number of the document labels appeared in the training data 140. And as shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query 112 and outputs one or more predicted document labels 114” (Label prediction output to be produced for pre-processed data from Chen))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify the pre-processing of data technique as disclosed by Chen to use the input data for pre-processing before being fed into a model as disclosed by Karuta. The combination would have been obvious because a person of ordinary skill in the art would be motivated to pre-process data to transform it into a simpler and cleaner format such that it may be used to increase the accuracy of a label output by the machine learning model. Text preprocessing is a vital step in data mining and natural language processing, thus it would be second hand for a person of ordinary skill in the art to perform such pre-processing operations to clean up any data that may be input into a model.

As per claim 25, Chen teaches:
simplifying the single instance of training data to create a single instance of simplified training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pre-processing (Concurrence graphs, document anchors, and tokenized text sequences are simplified training data that have been preprocessed and simplified. Note that the training data to be simplified will come from Kurata as disclosed later below))
(Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data))
simplifying the instance of input data to create an instance of simplified input data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Concurrence graphs, document anchors, and tokenized text sequences are simplified data that have been preprocessed and simplified. Note that the data to be simplified will come from Kurata as disclosed later below))
Chen fails to explicitly teach:
receiving a single instance of training data
training a machine learning model, utilizing the plurality of training data variants
receiving an instance of input data
applying the instance of simplified input data into the trained machine learning model
and receiving a label prediction for the instance of simplified input data from the trained machine learning model
However, Kurata teaches:
(Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” (It is certainly implied that training data has to be received))
training a machine learning model, utilizing the plurality of training data variants (Kurata,  Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data” (Note that training data variants to be used for training comes from Chen))
receiving an instance of input data (Kurata, Fig. 2 discloses an input query 112 (instance of data) and Para. [0047] discloses “As shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query”)
applying the instance of simplified input data into the trained machine learning model (Kurata, Fig. 2 discloses an input query being fed into a trained model and Para. [0050] discloses “The NLQ classification model 110 may need to accept queries with variable length. The NLQ classification model 110 receives an input query in a form of natural sentence like “Where should I visit in Japan?” by the query input layer 152. Words in the input query are first subjected to appropriate pre-processing such as stop word removal, and then the processed words 154 are converted into distributed representation in the distributed representation layer 156. The convolutional layer 158 may have k kernels to produce k feature maps. Each feature map is then subsampled typically mean or max pooling. By applying convolution 158 and sub-sampling 160 over time, a fixed-length feature vectors are extracted from the distributed representation layer 156 into the top hidden layer 162. Then, the fixed-length feature vectors are then fed into the label prediction layer 164 to predict the one or more document labels 114 for the input query 112” (Note that although Kurata does some form of pre-processing before inputting the input data into a model, the pre-processed data is to come from Chen))
and receiving a label prediction for the instance of simplified input data from the trained machine learning model (Kurata, Fig. 2 discloses a predicted document label and Para. [0051] discloses “The label prediction layer 164 has a plurality of units each corresponding to each predefined document label that is a document identifier identifying a document having an answer for the query. The document labels can be defined as labels appeared in the training data 140. The number of the units in the label prediction layer 164 may be same as the number of the document labels appeared in the training data 140. And as shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query 112 and outputs one or more predicted document labels 114” (Label prediction output to be produced for pre-processed data from Chen))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify the pre-processing of data and generation of data variants as disclosed by Chen to use the input and training data and machine learning model as disclosed by Kurata. The combination would have been obvious because a person of ordinary skill in the art would be motivated to pre-process both training and input data into simpler and cleaner formats. Pre-processing both training and input data not only increases training accuracy of a machine learning model as variations in training data are .

Claims 4-8, 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Kurata, further in view of All you need to know about text preprocessing for NLP and Machine Learning (hereinafter, “Preprocessing”)
As per claim 4, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
wherein simplifying the single instance of training data includes ((replacing one or more terms within)) the single instance of training data ((with a word stem)) (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
replacing one or more terms within ((the single instance of training data)) with a word stem
However, Preprocessing (Preprocessing addresses the plurality of ways to preprocess data) teaches:
((the single instance of training data)) with a word stem (Preprocessing, Stemming section teaches replacing terms with word stems ((Training data to be stemmed comes from Kurata))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Chen as modified to use the stemming method as disclosed by Preprocessing. The combination would have been obvious because a person of ordinary skill in the art would be motivated to standardize text within a dataset as different variations of words are available, thus standardizing them to the stem of a word would allow for better training of a classifier.

As per claim 5, the combination of Chen and Kurata teaches the computer-implemented method of claim 1, Chen further teaches:
wherein simplifying the single instance of training data includes ((replacing one or more terms within)) the single instance of training data ((with a genericized term)) (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
replacing one or more terms within ((the single instance of training data)) with a genericized term 
However, Preprocessing teaches:
replacing one or more terms within ((the single instance of training data)) with a genericized term (Preprocessing, Normalization section discusses text normalization to a generic term (Preprocessing discusses a few examples of text normalization that is not fully exhaustive. Additional examples may include normalization of dates, etc.. Training data comes from Kurata))
Same motivation to combine Chen, Kurata, and Preprocessing as claim 4

As per claim 6, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
wherein simplifying the single instance of training data includes ((discarding one or more terms within)) the single instance of training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
discarding one or more terms within ((the single instance of training data))
However, Preprocessing teaches:
discarding one or more terms within ((the single instance of training data)) (Preprocessing, Stopword removal section teaches discarding one or more terms (Training data comes from Kurata))


As per claim 7, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
 wherein simplifying the single instance of training data includes ((adjusting a length of)) the single instance of training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence (Note that the data to be simplified will come from Kurata))
However, the combination of Chen and Kurata fails to explicitly teach:
adjusting a length of ((the single instance of training data))
	However, Preprocessing teaches:
adjusting a length of ((the single instance of training data))  (Preprocessing, Lemmatization and Stemming section (Lemmatization, much like stemming, maps a word to the root form of the word whereby the length of the word is adjusted due to the word now mapping to the root form of the word. Training data comes from Kurata))
Same motivation to combine Chen, Kurata, and Preprocessing as claim 4

As per claim 8, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
((adjusting)) the single instance of training data ((in a plurality of different ways, where each adjustment results in)) one of the plurality of training data variants (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
adjusting ((the single instance of training data)) in a plurality of different ways, where each adjustment results in ((one of the plurality of training data variants))
	However, Preprocessing teaches:
adjusting ((the single instance of training data)) in a plurality of different ways, where each adjustment results in ((one of the plurality of training data variants)) (Preprocessing, Types of text preprocessing techniques discloses “There are different ways to preprocess your text” (There are a plurality of ways to preprocess text which each way is different than the other. Further preprocessing of simplified training data will result in training data variants. Training data to be adjusted comes from Kurata)) 
Same motivation to combine Chen, Kurata and Preprocessing as claim 4

As per claim 17, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
wherein simplifying the single instance of training data includes ((replacing)), by the processor, ((one or more terms within)) the single instance of training data ((with a word stem)) (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence and Fig. 6 discloses processors 602 (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
replacing, ((by the processor)), one or more terms within ((the single instance of training data)) with a word stem
However, Preprocessing teaches:
replacing, ((by the processor)), one or more terms within ((the single instance of training data)) with a word stem (Preprocessing, Stemming section teaches replacing terms with word stems ((Training data comes from Kurata))
	Same motivation to combine Chen, Kurata and Preprocessing as claim 4

As per claim 18, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
wherein simplifying the single instance of training data includes ((replacing)), by the processor, ((one or more terms within)) the single instance of training data ((with a genericized term)) (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence and Fig. 6 discloses processors 602 (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
replacing, ((by the processor)), one or more terms within ((the single instance of training data)) with a genericized term 
However, Preprocessing teaches:
replacing, ((by the processor)), one or more terms within ((the single instance of training data)) with a genericized term (Preprocessing, Normalization section discusses text normalization to a generic term (Preprocessing discusses a few examples of text normalization that is not fully exhaustive. Additional examples may include normalization of dates, etc.. Training data comes from Kurata))
Same motivation to combine Chen, Kurata and Preprocessing as claim 4

As per claim 19, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
wherein simplifying the single instance of training data includes ((discarding)), by the processor, ((one or more terms within)) the single instance of training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence and Fig. 6 discloses processors 602 (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
discarding, ((by the processor)), one or more terms within ((the single instance of training data))
However, Preprocessing teaches:
discarding, ((by the processor)), one or more terms within ((the single instance of training data)) (Preprocessing, Stopword removal section teaches discarding one or more terms (Training data comes from Kurata))
Same motivation to combine Chen, Kurata and Preprocessing as claim 4

As per claim 20, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
 wherein simplifying the single instance of training data includes ((adjusting)), by the processor, ((a length of)) the single instance of training data (Chen, Para. [0004] discloses “a machine learning component deployed thereon and configured to pre-process training data to generate one or more concurrence graphs of named entities, words, and document anchors extracted from the training data” and Fig. 2 discloses pro-processing generating a tokenized text sequence and Fig. 6 discloses processors 602 (Note that the data to be simplified will come from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
adjusting, ((by the processor)), a length of ((the single instance of training data))
	However, Preprocessing teaches:
((by the processor)), a length of ((the single instance of training data)) (Preprocessing, Lemmatization and Stemming section (Lemmatization, much like stemming, maps a word to the root form of the word whereby the length of the word is adjusted due to the word now mapping to the root form of the word. Training data comes from Kurata))
Same motivation to combine Chen, Kurata and Preprocessing as claim 4

As per claim 21, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
wherein generating the plurality of training data variants includes ((adjusting)), by the processor, the single instance of training data ((in a plurality of different ways, where each adjustment results in)) one of the plurality of training data variants (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
adjusting, ((by the processor, the single instance of training data)) in a plurality of different ways, where each adjustment results in ((one of the plurality of training data variants))
	However, Preprocessing teaches:
((by the processor, the single instance of training data)) in a plurality of different ways, where each adjustment results in ((one of the plurality of training data variants))
 (Preprocessing, Types of text preprocessing techniques discloses “There are different ways to preprocess your text” (There are a plurality of ways to preprocess text which each way is different than the other. Further preprocessing of simplified training data will result in training data variants. Training data to adjust comes from Kurata)) 
Same motivation to combine Chen, Kurata and Preprocessing as claim 4

Claims 9, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Kurata, further in view of U.S. Pub. No. US 20160078361 A1 to Brueckner, et al. (hereinafter, “Brueckner”)
As per claim 9, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
wherein generating the plurality of training data variants includes ((changing an order of words within)) the single instance of simplified training data to create one of the plurality of training data variants (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
	The combination of Chen and Kurata fails to explicitly teach:
((the single instance of simplified training data to create one of the plurality of training data variants))
	However, Brueckner (Brueckner addresses the issue of training of machine learning models) teaches:
changing an order of words within ((the single instance of simplified training data to create one of the plurality of training data variants)) (Brueckner, Para. [0130] discloses “In order to train and evaluate a model, a number of filtering or input record rearrangement operations may sometimes have to be performed in a sequence on an input data set…Other input filtering operation types may include…shuffling (rearranging the order of the input data objects)”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Chen as modified to use the data shuffling as disclosed by Brueckner. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve training accuracy of a machine learning model as data modification techniques, such as shuffling (changing an order), are necessary “in order to train and evaluate a model” (Bruecker, Para. [0130]).

As per claim 22, the combination of Chen and Kurata as shown above teaches the computer program product of claim 14, Chen further teaches:
wherein generating the plurality of training data variants includes ((changing)), by the processor, ((an order of words within)) the single instance of simplified training data to create one of the plurality of training data variants (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
	The combination of Chen and Kurata fails to explicitly teach:
changing, ((by the processor)), an order of words within ((the single instance of simplified training data to create one of the plurality of training data variants))	
However, Brueckner teaches:
changing, ((by the processor)), an order of words within ((the single instance of simplified training data to create one of the plurality of training data variants))	 (Brueckner, Para. [0130] discloses “In order to train and evaluate a model, a number of filtering or input record rearrangement operations may sometimes have to be performed in a sequence on an input data set…Other input filtering operation types may include…shuffling (rearranging the order of the input data objects)” (Training data comes from Kurata))
	Same motivation to combine Chen, Kurata and Brueckner as claim 9

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Kurata, further in view of U.S. Patent No. US 8543381 B2 to Connor (hereinafter, “Connor”)
As per claim 10, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
wherein generating the plurality of training data variants includes ((substituting a first word within)) the single instance of simplified training data ((with a second word determined to be similar to the first word)) (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
substituting a first word within ((the single instance of simplified training data)) with a second word determined to be similar to the first word
However, Connor (Connor addresses the issue of text morphing) teaches:
substituting a first word within ((the single instance of simplified training data)) with a second word determined to be similar to the first word (Connor, Methods to Modify a Single Document by Phrase Substitution section discloses “There are methods in the prior art to modify a single document by selectively substituting alternative phrases (single words or multiple word combinations) for the phrases that were originally used in the document. For example, the alternative phrases may be similar in meaning, but different in style or complexity, as compared to the original phrases used in the document.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Chen as modified to use the synonym word replacement as disclosed by Connor. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy range of the training of the machine learning model as using synonyms of words ensures that variations of words are not missed in training a natural language processing model.

11 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Kurata, further in view of U.S. Pub. No. US 20170132512 A1 to Ioffe (hereinafter, “Ioffe”)
As per claim 11, the combination of Chen and Kurata as shown above teaches the computer-implemented method of claim 1, Chen further teaches:
wherein each of the plurality of training data variants are ((given a same associated label as the single instance of training data)) (Chen, Para. [0005] discloses “a training component configured to generate vector embeddings of entities and words based on the one or more concurrence graphs” and Fig. 6 discloses processors 602 (vector embeddings are variants of the concurrence graphs which are simplified training data. Training data comes from Kurata))
The combination of Chen and Kurata fails to explicitly teach:
((wherein each of the plurality of training data variants are)) given a same associated label as the single instance of training data
However, Ioffe (Ioffe addresses the issue of regularizing training data) teaches:
((wherein each of the plurality of training data variants are)) given a same associated label as the single instance of training data (Ioffe, Para [0006] discloses “The action of modifying may include, for each training item, determining whether or not to modify the label associated with the training item…” (Same label is applied as the training item))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Chen as modified to use the same labels for training data variants as disclosed by Ioffe. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve accuracy of training 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.


/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123