DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-25
The following claims are amended: 1-2, 4, 6-7, 14-15, and 17-25
The following claims are new: None
The following claims are cancelled: None
The following claims are rejected: 1-25
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/01/2021 has been entered.
Response to Arguments
Applicant’s arguments filed amendments on 11/01/2021 to address the 35 U.S.C. 101 rejection with respect to claims 1-23. In response to the Applicant’s arguments, the 35 U.S.C. 101 rejection still stands. Applicant argues “an improvement to the technical field of machine learning model training” (see Applicants remarks, page 10-13). Examiner respectfully disagrees 
Applicant’s arguments filed amendments on 11/01/2021 to address the 35 U.S.C. 102 and 35 U.S.C. 103 rejection with respect to claims 1-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


1-23 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1 recites a computer-implemented method, comprising: receiving a single instance of training data; adjusting a first length of the single instance of training data to create a single instance of simplified training data having a second length within a predetermined percentage of a predetermined length, where the single instance of training data includes a first string of text and the single instance of simplified training data includes a second string of text; generating a plurality of training data variants, based on the single instance of simplified training data, where each of the plurality of training data variants includes a respective string of text; and training a machine learning model, utilizing the plurality of training data variants.
The limitation of adjusting a first length of the single instance of training data to create a single instance of simplified training data having a second length within a predetermined percentage of a predetermined length as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “adjusting”, in the context of the claim, encompasses a user analyzing and preprocessing data, where the user preprocesses the data essentially “adjusting” the data. An example of a user preprocessing data can be a user analyzing data and modifying the length of data such that a percentage of the initial data of kept that is within a predetermined length.
The limitation of generating a plurality of training data variants, based on the single instance of simplified training data, where each of the plurality of training data variants includes a respective string of text, as drafted, is a process that, under its broadest reasonable 
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls under the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
The judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – training a machine learning model utilizing the plurality of training data variants. Training a machine learning model is recited at a high level of generality (i.e. as a generic computer function of training a model) such that it amounts no more than mere instructions to apply the exception using a generic computing function. Additionally, the additional element of training a machine learning model is considered to be an insignificant extra solution activity. Further, the claim recites the receiving step (receiving a single instance of training data). The receiving step is recited at a high level of generality and amounts to mere data gathering which is a form of insignificant extra solution activity. Accordingly, the additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of Berkheimer. Additionally, the training of a machine learning model is considered to be an extra solution activity in Step 2A Prong 2, and thus it is re-evaluated in Step 2b to determine if it is more than what is well-understood, routine, conventional activity in the field. Sirosh, et al. (U.S. Pub. No. US 20160148115 A1) discloses in Para. [0026] that “The trained machine learning model 106 can be a model that has been trained in any suitable fashion, as Berkheimer.
This claim is not patent eligible under U.S.C. 101. 
Claim 2 recites the computer-implemented method of Claim 1, replacing one or more terms within the single instance of training data with a word stem, replacing one or more terms within the single instance of training data with a genericized term, and discarding one or more terms within the single instance of training data
The limitation of replacing one or more terms within the single instance of training data with a word stem, replacing one or more terms within the single instance of training data with a genericized term, and discarding one or more terms within the single instance of training data as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “replacing”, in the context of the claim, encompasses a user analyzing a training data and replacing words with word stems and generic terms. Additionally, “discarding”, in the context of the claim, encompasses a user analyzing a training data and removing terms.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the 
Claim 3 recites the computer-implemented method of Claim 1, wherein the training data has an associated label. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101.
Claim 4 recites the computer-implemented method of Claim 1, comprising replacing one or more terms within the single instance of training data with a word stem. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “simplifying”, in the context of the claim, encompasses a user analyzing training data and stemming the data items essentially reducing a word to its base form. For example, a user can analyze the word studying and stem it to its base form of study.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. This claim is not patent eligible under U.S.C. 101.
Claim 5 recites the computer-implemented method of Claim 1, wherein in response to determining that the single instance of training data includes a specific product name, the specific product name is replaced with a generic product name term, and in response to determining that the single instance of training data includes a specific date, the specific date is replaced with a generic date term This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “determining”, in the context of the claim, encompasses a user analyzing training data and replacing specific terms with genericized terms. . For example, a user can normalize the date “12- 03-2020” to the generic term “DATE”.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.

This claim is not patent eligible under U.S.C. 101.
Claim 6 recites the computer-implemented method of Claim 1, comprising discarding one or more terms that appear more than a predetermined number of times within the single instance of training data. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “simplifying”, in the context of the claim, encompasses a user analyzing date and discarding words he/she deems not relevant. For example, a user analyzes a data and decides to throw away stop words such as “to” and “he” and determining the stop words appear more than two times.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101

This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101.
Claim 8 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes: changing an order of words within the single instance of simplified training data in response to determining that a total number of words within the single instance of simplified training data is less than a predetermined threshold, 
The limitation of changing an order of words within the single instance of simplified training data in response to determining that a total number of words within the single instance of simplified training data is less than a predetermined threshold, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “changing”, in the context of the claim, encompasses a user analyzing data and swapping the order of data upon the user analyzing the data is not big enough.
The limitation of substituting a first word within the single instance of simplified training data with a second word determined to be similar to the first word , as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. Nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “substituting”, in the context of the claim, encompasses a user analyzing data and substituting the data with synonyms.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 

Claim 9 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes changing an order of words within the single instance of simplified training data to create one of the plurality of training data variants. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user analyzing data and manually changing the order of a words in a string. For example, the user may see the string “catdogcow” and then decides to change the order to “cowdogcat”.
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101.
Claim 10 recites the computer-implemented method of Claim 1, wherein generating the plurality of training data variants includes: replacing a generic term within the single instance of simplified training data with a first specific term that correlates to the generic term to create a first training data variant, and replacing the generic term within the single instance of simplified 
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101
Claim 11 recites the computer-implemented method of Claim 1, wherein each of the plurality of training data variants are given a same associated label as the single instance of training data. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “generating”, in the context of the claim, encompasses a user creating variants of data and manually giving the same label as a training data to a training data variant.

The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101.
Claim 12 recites the computer-implemented method of Claim 1, wherein each of the plurality of training data variants are input into the machine learning model to train the machine learning model. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind.
This judicial exception is not integrated into a practical application. In particular, the claim does recite an additional element – training data variants are input into the machine learning model to train the machine learning model. Inputting is recited at a high level of generality (i.e. as a generic component) such that it amounts to no more than mere instructions to apply the exception using a generic component. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. In particular, the additional element of inputting 
This claim is not patent eligible under U.S.C. 101.
Claim 13 recites the computer-implemented method of Claim 1, wherein the machine learning model is an artificial neural network (ANN). This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim limitation precludes the step from practically being performed in the mind.
The judicial exception is not integrated into a practical application. In particular, the claim does recite an additional element – the machine learning model is an artificial neural network. The machine learning model is recited at a high level of generality (i.e. as a generic component) such that it amounts to no more than mere instructions to apply the exception using a generic component. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. In particular, the additional element of the machine learning model being an artificial neural network amounts to no more than mere instructions to apply the exception using a generic component. Additionally, limitations that the 
This claim is not patent eligible under U.S.C. 101.
Claim 20 recites the computer-implemented method of Claim 1, wherein: adjusting the first length of the single instance of training data includes repeating one or more words within the single instance of training data, and generating the plurality of training data variants includes: changing an order of words within the single instance of simplified training data in response to determining that a total number of words within the single instance of simplified training data is less than a predetermined threshold, replacing a generic term within the single instance of simplified training data with a first specific term that correlates to the generic term to create a first training data variant, and replacing the generic term within the single instance of simplified training data with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a second training data variant different from the first training data variant
The limitation of adjusting the first length of the single instance of training data includes repeating one or more words within the single instance of training data, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the 
The limitation of generating the plurality of training data variants includes: changing an order of words within the single instance of simplified training data in response to determining that a total number of words within the single instance of simplified training data is less than a predetermined threshold, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “changing”, in the context of the claim, encompasses a user analyzing data and swapping the order of data upon the user analyzing the data is not large enough.
The limitation of replacing a generic term within the single instance of simplified training data with a first specific term that correlates to the generic term to create a first training data variant, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is nothing in the claim limitation precludes the step from practically being performed in the mind. For example, “replacing”, in the context of the claim, encompasses a user analyzing data and replacing strings within the data with synonyms.
The limitation of replacing the generic term within the single instance of simplified training data with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a second training data variant different from the first training data variant, as drafted, is a process that, under its broadest 
This judicial exception is not integrated into a practical application. In particular, the claim does not recite any additional elements. Accordingly, this does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, no additional elements are cited. 
This claim is not patent eligible under U.S.C. 101
Claim 21 recites the computer program product of Claim 14, wherein generating the plurality of training data variants includes adjusting, by the one or more processors, the single instance of training data in a plurality of different ways, where each adjustment results in one of the plurality of training data variants. This limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, but for the “one or more processors” nothing in the claim limitation precludes the step from practically being performed in the mind. For example,  but for the “one or more processors” language,“generating” in the context of the claim encompasses a user analyzing and preprocessing data in a plurality of ways where preprocessing data results in a variant of data.

claim recites one additional element – one or more processors. One or more processors is recited at a high level of generality (i.e., as a generic computer hardware performing a generic computer function) such that it amounts to no more than mere instructions to apply the exception using a generic computing hardware. Accordingly, this additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea
The claim does not include additional elements that are sufficient to amount to
significantly more than the judicial exception. As discussed above with respect to integration of
the abstract idea into a practical application, the additional element of one or more processors amounts to no more than mere instructions to apply the exception using a generic computing component. Mere instructions to apply an exception using generic computing components cannot provide an inventive concept.
This claim is not patent eligible under U.S.C. 101.
Claims 14-19, and 22 rejected on the same grounds as claims 1-6, and 9 respectively
Claim 23 is rejected on the same grounds as claim 1 respectively
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 7, 9, 12-14, 16, 21-23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US 20170061330 A1 to Kurata (hereinafter, “Kurata”), in view of “EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks” to Wei, et al. (hereinafter, “Wei”)
As per claim 1, Kurata teaches a computer-implemented method, comprising:
receiving a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” (Data must be certainly received))
generating a plurality of training data variants, based on the single instance of [[simplified]] training data, where each of the plurality of training data variants includes a respective string of text (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
 (Kurata, Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data”)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]]
However, Wei teaches:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]] (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
st Para.)

As per claim 3, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches: 
wherein the training data has an associated label. (Kurata, Para. [0025] discloses “Each training data has a training input and one or more correct labels assigned to the training input.”) 

As per claim 7, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data  (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
Wei further teaches:
increasing a length of [[the single instance of training data]] by repeating one or more words within [[the single instance of training data]] so that the length of [[the single instance of training data]] matches the predetermined length (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

As per claim 9, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein generating the plurality of training data variants includes [[changing an order of words within the single instance of simplified training data]] to create one of the plurality of training data variants (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
Wei further teaches:
[[wherein generating the plurality of training data variants includes]] changing an order of words within the single instance of simplified training data [[to create one of the plurality of training data variants]] (Wei, EDA section discloses “Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

As per claim 12, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein each of the plurality of training data variants are input into the machine learning model to train the machine learning model (Kurata, Para. [0025] discloses “Also the method includes training the classification model using the one or more training data”)

As per claim 13, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein the machine learning model is an artificial neural network (ANN). (Kurata, Para. [0048] discloses “Referring to FIG. 3, architecture 150 of the NLQ classification model 110 is depicted. In the describing embodiment, the NLQ classification model 110 is a neural network based classification model.” (an artificial neural network is a neural network))

	As per claim 14, Kurata teaches a computer program product comprising one or more computer readable storage media and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising instructions configured to cause the one or more processors to perform a method comprising
	receiving, by the processors, a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” )
generating, by the one or more processors, a plurality of training data variants, based on the single instance of [[simplified]] training data, where each of the plurality of training data variants includes a respective string of text (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
training, by the one or more processors, a machine learning model, utilizing the plurality of training data variants (Kurata, Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data”)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:

[[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]]
However, Wei teaches:
adjusting, by the one or more processors, a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]] (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
	It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

As per claim 16, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
wherein the training data has an associated label. (Kurata, Para. [0025] discloses “Each training data has a training input and one or more correct labels assigned to the training input.”)

As per claim 21, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
wherein generating the plurality of training data variants includes [[adjusting]], by the one or more processors, the single instance of training data [[in a plurality of different ways, where each adjustment results in]] one of the plurality of training data variants (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” and Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
Wei further teaches:
[[wherein generating the plurality of training data variants includes]] adjusting, [[by the one or more processors, the single instance of training data]] in a plurality of different ways, where each adjustment results in [[one of the plurality of training data variants]] (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times”, the EDA section additionally discloses more ways of adjusting a length in different ways)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

As per claim 22, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
wherein generating the plurality of training data variants includes [[changing, by the one or more processors, an order of words within the single instance of simplified training data]] to create one of the plurality of training data variants (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
Wei further teaches:
[[wherein generating the plurality of training data variants includes]] changing, by the one or more processors, an order of words within the single instance of simplified training data [[to create one of the plurality of training data variants]] (Wei, EDA Section discloses “Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

	As per claim 23, Kurata teaches a system comprising:
a processor; (Kurata, Para. [0009] discloses “The system includes: a memory; a processor communicatively coupled to the memory…”)
and logic integrated with the processor, executable by the processor, or integrated with and executable by the processor, the logic being configured to: (Kurata, Para. [0009] discloses “The system includes: a memory; a processor communicatively coupled to the memory…”)
receive a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” )
generate a plurality of training data variants, based on the single instance of [[simplified]] training data, where each of the plurality of training data variants includes a respective string of text (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
train a machine learning model, utilizing the plurality of training data variants (Kurata, Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data”)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
adjust a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]]
	However, Wei teaches:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]] (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

	As per claim 25, Kurata teaches a computer-implemented method comprising:
receiving a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” (It is certain training data has to be received))
generating a plurality of training data variants, based on the single instance of [[simplified]] training data, where each of the plurality of training data variants includes a respective string of text (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
training a machine learning model, utilizing the plurality of training data variants (Kurata, Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data”)
(Kurata, Fig. 2 discloses an input query 112 (instance of data) and Para. [0047] discloses “As shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query”)
simplifying the instance of input data to create an instance of simplified input data, where the instance of input data includes a third string of text and the instance of simplified input data includes a fourth string of text (Kurata, Fig. 2 discloses an input query being fed into a trained model and Para. [0050] discloses “The NLQ classification model 110 may need to accept queries with variable length. The NLQ classification model 110 receives an input query in a form of natural sentence like “Where should I visit in Japan?” by the query input layer 152. Words in the input query are first subjected to appropriate pre-processing such as stop word removal, and then the processed words 154 are converted into distributed representation in the distributed representation layer 156.”)
applying the instance of simplified input data into the trained machine learning model (Kurata, Fig. 2 discloses an input query being fed into a trained model and Para. [0050] discloses “The NLQ classification model 110 may need to accept queries with variable length. The NLQ classification model 110 receives an input query in a form of natural sentence like “Where should I visit in Japan?” by the query input layer 152. Words in the input query are first subjected to appropriate pre-processing such as stop word removal, and then the processed words 154 are converted into distributed representation in the distributed representation layer 156. The convolutional layer 158 may have k kernels to produce k feature maps. Each feature map is then subsampled typically mean or max pooling. By applying convolution 158 and sub-sampling 160 over time, a fixed-length feature vectors are extracted from the distributed representation layer 156 into the top hidden layer 162. Then, the fixed-length feature vectors are then fed into the label prediction layer 164 to predict the one or more document labels 114 for the input query 112”)
and receiving a label prediction for the instance of simplified input data from the trained machine learning model (Kurata, Fig. 2 discloses a predicted document label and Para. [0051] discloses “The label prediction layer 164 has a plurality of units each corresponding to each predefined document label that is a document identifier identifying a document having an answer for the query. The document labels can be defined as labels appeared in the training data 140. The number of the units in the label prediction layer 164 may be same as the number of the document labels appeared in the training data 140. And as shown in FIG. 2, the computer system 100 includes the NLQ classification model 110 that receives an input query 112 and outputs one or more predicted document labels 114)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length, where [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]]
However, Wei teaches:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined [[the single instance of training data includes a first string of text and the single instance of]] simplified [[training data includes a second string of text]] (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1

Claims 2, 4, 6, 15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Kurata, in view of Wei, and further in view of “All you need to know about text preprocessing for NLP and Machine Learning” to Preprocessing (hereinafter, “Preprocessing”)
As per claim 2, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach comprising:
replacing one or more terms within [[the single instance of training data]] with a word stem, 
replacing one or more terms within [[the single instance of training data]] with a genericized term, 
and discarding one or more terms within [[the single instance of training data]]
However, Preprocessing teaches:
replacing one or more terms within [[the single instance of training data]] with a word stem, (Preprocessing, Stemming section teaches replacing terms with word stems)
replacing one or more terms within [[the single instance of training data]] with a genericized term, (Preprocessing, Normalization section discusses text normalization to a generic term (Preprocessing discusses a few examples of text normalization that is not fully exhaustive. Additional examples may include normalization of dates, etc.)
and discarding one or more terms within [[the single instance of training data]] (Preprocessing, Stopword removal section teaches discarding one or more terms)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei to use the data preprocessing methods as disclosed by Preprocessing. The combination would have been obvious because a person of ordinary skill in the art would be motivated to standardize text within a dataset as different variations of words are available, thus standardizing them to the would allow for better training of a classifier.

As per claim 4, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
comprising replacing one or more terms within [[the single instance of training data]] with a word stem
However, Preprocessing teaches:
comprising replacing one or more terms within [[the single instance of training data]] with a word stem (Preprocessing, Stemming section teaches replacing terms with word stems)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata as modified with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

As per claim 6, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
comprising discarding one or more terms that appear more than a predetermined number of times within [[the single instance of training data]]
However, Preprocessing teaches:
comprising discarding one or more terms that appear more than a predetermined number of times within [[the single instance of training data]] (Preprocessing, Stopword removal section discloses “Some libraries (e.g. sklearn) allow you to remove words that appeared in X% of your documents, which can also give you a stop word removal effect.”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata/Wei with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

As per claim 15, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach comprising:
replacing, by the one or more processors, one or more terms within [[the single instance of training data]] with a word stem, 
[[the single instance of training data]] with a genericized term, 
and discarding, by the one or more processors, one or more terms within [[the single instance of training data]]
However, Preprocessing teaches:
replacing, by he one or more processors, one or more terms within [[the single instance of training data]] with a word stem, (Preprocessing, Stemming section teaches replacing terms with word stems)
replacing, by he one or more processors, one or more terms within [[the single instance of training data]] with a genericized term, (Preprocessing, Normalization section discusses text normalization to a generic term (Preprocessing discusses a few examples of text normalization that is not fully exhaustive. Additional examples may include normalization of dates, etc..)
and discarding, by the one or more processors, one or more terms within [[the single instance of training data]] (Preprocessing, Stopword removal section teaches discarding one or more terms)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata/Wei with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

As per claim 17, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
(Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
comprising replacing, by the one or more procesors, one or more terms within [[the single instance of training data]] with a word stem
However, Preprocessing teaches:
comprising replacing, by the one or more procesors, one or more terms within [[the single instance of training data]] with a word stem (Preprocessing, Stemming section teaches replacing terms with word stems)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata/Wei with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

As per claim 18, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
[[the single instance of training data]] with a genericized term
However, Preprocessing teaches:
comprising replacing, by the one or more processors, one or more terms within [[the single instance of training data]] with a genericized term (Preprocessing, Normalization section discusses text normalization to a generic term (Preprocessing discusses a few examples of text normalization that is not fully exhaustive. Additional examples may include normalization of dates, etc.))
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata/Wei with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

As per claim 19, the combination of Kurata and Wei as shown above teaches the computer program product of claim 14, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
comprising discarding, by the one or more processors, one or more terms within [[the single instance of training data]]
However, Preprocessing teaches:
[[the single instance of training data]] (Preprocessing, Stopword removal section teaches discarding one or more)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata/Wei with the teachings of Preprocessing for at least the same reasons as discussed above in claim 2

Claims 5 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Wei, further in view of “Text Preprocessing in Python: Steps, Tools, and Examples” to Monsters (hereinafter, “Monsters”)
As per claim 5, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
The combination of Kurata and Wei fails to explicitly teach:
wherein in response to determining that [[the single instance of training data]] includes a specific product name, the specific product name is replaced with a generic product name term, 
and in response to determining that [[the single instance of training data]] includes a specific date, the specific date is replaced with a generic date term
 However, Monsters teaches:
wherein in response to determining that [[the single instance of training data]] includes a specific product name, the specific product name is replaced with a generic product name term, (Monsters, 2nd Para. discloses “After a text is obtained, we start with text normalization. Text normalization includes: converting numbers into words or removing numbers, text canonicalization) (Note that text canonicalization converts text into its standardized form))
and in response to determining that [[the single instance of training data]] includes a specific date, the specific date is replaced with a generic date term (Monsters, 2nd Para. discloses “After a text is obtained, we start with text normalization. Text normalization includes: converting numbers into words or removing numbers, text canonicalization) (Note that text canonicalization converts text into its standardized form))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei to use the data preprocessing methods as disclosed by Monsters. The combination would have been obvious because a person of ordinary skill in the art would be motivated to standardize text within a dataset as different variations of words are available, thus standardizing them to the would allow for better training of a classifier.

Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Wei, further in view of U.S. Patent No.  US8543381B2 to Connor (hereinafter, “Connor”)
As per claim 8, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
Wei further teaches
changing an order of words within [[the single instance of simplified training data]] in response to determining that a total number of words within [[the single instance of]] simplified [[training data is less than]] a predetermined threshold (Wei, EDA section discloses “Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.” And “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed,…”)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Kurata with the teachings of Wei for at least the same reasons as discussed above in claim 1
The combination of Kurata and Wei fails to explicitly teach:
and substituting a first word within [[the single instance of simplified training data]] with a second word determined to be similar to the first word
	However, Connor  teaches:
[[the single instance of simplified training data]] with a second word determined to be similar to the first word (Connor, Methods to Modify a Single Document by Phrase Substitution section discloses “There are methods in the prior art to modify a single document by selectively substituting alternative phrases (single words or multiple word combinations) for the phrases that were originally used in the document. For example, the alternative phrases may be similar in meaning, but different in style or complexity, as compared to the original phrases used in the document.”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei to use the synonym word replacement as disclosed by Connor. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy range of the training of the machine learning model as using synonyms of words ensures that variations of words are not missed in training a natural language processing model.

Claims 10 and 20 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Wei, further in view of “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” to Devlin, et al. (hereinafter, “Devlin”)
As per claim 10, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
first training data variant (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
the combination of Kurata and Wei fails to explicitly teach wherein generating the plurality of training data variants includes:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create a [[first training data variant]]
and replacing the generic term within [[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a [[second training data variant different from the first training data variant]]
However, Devlin teaches:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create a [[first training data variant]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))
and replacing the generic term within [[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a [[second training data variant different from the first training data variant]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei as modified to use the synonym word replacement as disclosed by Devlin. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy range of the training of the machine learning model as using synonyms of words ensures that variations of words are not missed in training a natural language processing model.

As per claim 20, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
the single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72”)
and generating the plurality of training data variants includes: (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
Wei further teaches:
adjusting the first length of [[the single instance of training data]] includes repeating one or more words within [[the single instance of training data]] (Wei, EDA section discloses random insertion of synonyms of words which adjusts a first length)
changing an order of words within [[the single instance of simplified training data]] in response to determining that a total number of words within [[the single instance of simplified training data]] is less than a predetermined threshold, (Wei, EDA section discloses “Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.” And “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed,…”)
The combination of Kurata and Wei fails to explicitly teach wherein:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create [[a first training data variant,]] 
[[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create [[a second training data variant different from the first training data variant]]
However, Devlin teaches:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create [[a first training data variant,]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))
and replacing the generic term within [[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create [[a second training data variant different from the first training data variant]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))


Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Wei, further in view of U.S. Pub. No. US 20170132512 A1 to Ioffe (hereinafter, “Ioffe”)
	As per claim 11, the combination of Kurata and Wei as shown above teaches the computer-implemented method of claim 1, Kurata further teaches:
wherein each of the plurality of training data variants are [[given a same associated label as the single instance of training data]] (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
the combination of Kurata and Wei fails to explicitly teach:
[[wherein each of the plurality of training data variants are]] given a same associated label as the single instance of training data
However, Ioffe  teaches:
[[wherein each of the plurality of training data variants are]] given a same associated label as the single instance of training data (Ioffe, Para [0006] discloses “The action of modifying may include, for each training item, determining whether or not to modify the label associated with the training item…” (Same label is applied as the training item))
 of the machine learning model as ensuring that labels stay the same as the original training data ensures no volatility may occur in training of the machine learning model.

Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Kurata in view of Wei, further in view of Monsters, and further in view of Devlin
As per claim 24, Kurata teaches a computer-implemented method, comprising:
receiving a single instance of training data (Kurata, Fig. 1 discloses training data 72 and Fig. 2 discloses training data 140 and Para. [0039] discloses “As shown in FIG. 1, some portions of the training data 72 may have multiple labels (or co-occurring labels) for a single instance of the training data 72” )
	simplifying the single instance of training data to create a single instance of simplified training data, where the single instance of training data includes a first string of text and the single instance of simplified training data includes a second string of text, the simplifying including: (Kurata, Para. [0062] discloses “The training input query may be prepared in a form of a natural sentence or representation of the natural sentence depending on the architecture of the neural network based NLQ classification model.”)
generating a plurality of training data variants, based on the single instance of [[simplified]] training data, where each of the plurality of training data variants includes a (Kurata, Para. [0029] discloses “…the obtaining of the combination of the co-occurring labels includes listing a plurality of combinations of labels co-occurred in the one or more training data; and selecting a subset from among the plurality of the combinations based on a frequency of appearance relevant to each combination in the one or more training data”)
training a machine learning model, utilizing the plurality of training data variants (Kurata, Para. [0025] discloses “According to an embodiment of the present invention, there is provided a method for learning a classification model using one or more training data”)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length
changing an order of words within [[the single instance of simplified training data]] in response to determining that a total number of words within [[the single instance of simplified training data is less than]] a predetermined threshold
However, Wei teaches:
adjusting a first length of [[the single instance of training data]] to create a single instance of simplified [[training data]] having a second length within a predetermined percentage of a predetermined length (Wei, EDA section discloses “Random Insertion (RI): Find a random synonym of a random word in the sentence that is not a stop word. Insert that synonym into a random position in the sentence. Do this n times” and “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed, n, for SR, RI, and RS based on the sentence length l with the formula n=αl, where α is a parameter that indicates the percent of the words in a sentence are changed (we use p=α for RD)”)
changing an order of words within [[the single instance of simplified training data]] in response to determining that a total number of words within [[the single instance of simplified training data is less than]] a predetermined threshold (Wei, EDA section discloses “Random Swap (RS): Randomly choose two words in the sentence and swap their positions. Do this n times.” And “Since long sentences have more words than short ones, they can absorb more noise while maintaining their original class label. To compensate, we vary the number of words changed,…”)
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify the training data as disclosed by Kurata to use the data augmentation methods as disclosed by Wei. The combination would have been obvious because a person of ordinary skill in the art would be motivated to augment text within a dataset to “boost performance on text classification tasks” (Wei, Conclusion section, 1st Para.)
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
replacing one or more terms with [[the single instance of training data]] with a word stem
[[the single instance of training data]] includes a specific product name, the specific product name is replaced with a generic product name term, 
and in response to determining that [[the single instance of training data]] includes a specific date, the specific date is replaced with a generic date term
However, Monsters teaches:
replacing one or more terms with [[the single instance of training data]] with a word stem (Monsters, Page 6, 1st Para. discloses stemming)
wherein in response to determining that [[the single instance of training data]] includes a specific product name, the specific product name is replaced with a generic product name term, (Monsters, 2nd Para. discloses “After a text is obtained, we start with text normalization. Text normalization includes: converting numbers into words or removing numbers, text canonicalization) (Note that text canonicalization converts text into its standardized form))
and in response to determining that [[the single instance of training data]] includes a specific date, the specific date is replaced with a generic date term (Monsters, 2nd Para. discloses “After a text is obtained, we start with text normalization. Text normalization includes: converting numbers into words or removing numbers, text canonicalization) (Note that text canonicalization converts text into its standardized form))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei to use the data preprocessing methods as disclosed by Monsters. The combination would have been obvious 
While Kurata teaches the single instance of training data and the training data in general (see Para. [0039] of Kurata), Kurata fails to explicitly teach:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create a [[first training data variant]]
and replacing the generic term within [[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a [[second training data variant different from the first training data variant]]
However, Devlin teaches:
replacing a generic term within [[the single instance of simplified training data]] with a first specific term that correlates to the generic term to create a [[first training data variant]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))
and replacing the generic term within [[the single instance of simplified training data]] with a second specific term different from the first specific term, where the second specific term correlates to the generic term to create a [[second training data variant different from the first training data variant]] (Devlin, 2nd Para, 1st Para. discloses “The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context” (BERT masks a word and attempts to find original words that fit the context of the sentence thus generating variants through mask predictions))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing data of the claimed invention to modify Kurata/Wei as modified to use the synonym word replacement as disclosed by Devlin. The combination would have been obvious because a person of ordinary skill in the art would be motivated to improve the accuracy range of the training of the machine learning model as using synonyms of words ensures that variations of words are not missed in training a natural language processing model.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Vijayarani, et al. (“Preprocessing Techniques for Text Mining - An Overview”) discloses preprocessing techniques used in data mining
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is (571)272-8833. The examiner can normally be reached M-TR 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/H.R.M./Examiner, Art Unit 2123                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145