Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 7, 8, 11, 14, 17 and 18 are rejected under 35 U.S.C. 103 as being anticipated by Le et al. (US Patent Pub. No. 2016/0117316), hereinafter Le, in view of Bojar et al. (US Patent Pub. No. 2020/0210772), hereinafter Bojar, in view of Song et al. (US Patent Pub. No. 2017/0060855), hereinafter Song.

Regarding claim 1, Le teaches a method of training a neural network (Le [0006] This specification describes how a system implemented as one or more computer programs on one or more computers can be trained to perform, and perform, natural language translations using neural network translation models and rare word post-processing), 
comprising: 
generating, 
by one or more processors of a processing system (Le [0006] This specification describes how a system implemented as one or more computer programs on one or more computers can be trained to perform, and perform, natural language translations using neural network translation models and rare word post-processing), 
a plurality of synthetic sentence pairs (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110), 
each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110)
and a modified passage of text (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110).
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.
Bojar teaches
generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
a first training signal of a plurality of training signals (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D [first training signal maps to sentences identified as from corpus D], which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2)
based on whether the given synthetic sentence pair was generated using backtranslation; (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D [corpus D is backtranslated], which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2);
and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used), 
a grade allocated by a human grader to the given human-graded sentence pair (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for back-translating and comparing to human translations. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar does not teach
and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair.
Song teaches
and one or more second training signals of the plurality of training signals 
based on a prediction from a backtranslation prediction model (Song [0131] After extraction of the translation rules, the computing device may extract features of the translation rules. The features of the translation rules may include: a forward translation probability, reverse translation probability, positive vocabulary probability, and reverse vocabulary probability. In these instances, the forward translation probabilities of phrases refer to a translation probability of a translation of a phrase from a source language to a target language. The reverse translation probabilities of phrases refer to a translation probability of a translation of a phrase from a target language to a source language. The positive vocabulary probability refers to a translation probability of a word from a source language to a target language. The reverse vocabulary probability refers to a translation probability of a translation of a word from a target language to a source language)
regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair (Song [0131] After extraction of the translation rules, the computing device may extract features of the translation rules. The features of the translation rules may include: a forward translation probability, reverse translation probability, positive vocabulary probability, and reverse vocabulary probability. In these instances, the forward translation probabilities of phrases refer to a translation probability of a translation of a phrase from a source language to a target language. The reverse translation probabilities of phrases refer to a translation probability of a translation of a phrase from a target language to a source language. The positive vocabulary probability refers to a translation probability of a word from a source language to a target language. The reverse vocabulary probability refers to a translation probability of a translation of a word from a target language to a source language);
pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language), 
the plurality of training signals for the given synthetic sentence pair (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language); 
Song is considered to be analogous to the claimed invention because it is in the same field of neural networks used for bilingual prediction models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Song to allow for predicting the likelihood of back-translation and training the neural network. Doing so would resolve issues relating to semantic inconsistency of candidate translations.

Regarding claim 4, Le in view of Bojar in view of Song teaches the method of claim 1.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein generating the plurality of synthetic sentence pairs comprises, 
for each given synthetic sentence pair of a first subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2): 
translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2),
to create a translated passage of text (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2);
and translating, by the one or more processors the translated passage of text from the second language into the first language, (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2),
to create the modified passage of text of the given synthetic sentence pair (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for translating to a second language. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Regarding claim 7, Le in view of Bojar in view of Song teaches the method of claim 1.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
Bojar teaches further comprising 
generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more third training signals of the plurality of training signals (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used)
based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics. (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for using an automatic validation metric such as BLEU. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Regarding claim 8, Le in view of Bojar in view of Song teaches the method of claim 7.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more automatic metrics includes at least one of the BLEU metric, the ROUGE metric, or the BERTscore metric.
Bojar teaches wherein the one or more automatic metrics includes at least one of 
the BLEU metric, the ROUGE metric, or the BERTscore metric (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for using an automatic validation metric such as BLEU. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Regarding claim 11, Le teaches a processing system (Le [0006] This specification describes how a system implemented as one or more computer programs on one or more computers can be trained to perform, and perform, natural language translations using neural network translation models and rare word post-processing)
comprising: 
a memory (Le [0006] This specification describes how a system implemented as one or more computer programs on one or more computers can be trained to perform, and perform, natural language translations using neural network translation models and rare word post-processing); 
and one or more processors coupled to the memory (Le [0006] This specification describes how a system implemented as one or more computer programs on one or more computers can be trained to perform, and perform, natural language translations using neural network translation models and rare word post-processing)
and configured to: 
generate a plurality of synthetic sentence pairs (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110), 
each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110)
and a modified passage of text (Le [0014] The translation system 100 receives sentences in a source natural language, e.g., a source language sentence 110, and translates the source natural language sentences into target sentences in a target natural language, e.g., a target language sentence 150 for the source language sentence 110).
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretrain the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tune the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.
Bojar teaches
generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
a first training signal of a plurality of training signals (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2)
based on whether the given synthetic sentence pair was generated using backtranslation (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2);
and fine-tune the neural network to predict (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used), 
for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for back-translating and comparing to human translations. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar does not teach
and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretrain the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair.
Song teaches
and one or more second training signals of the plurality of training signals (Song [0131] After extraction of the translation rules, the computing device may extract features of the translation rules. The features of the translation rules may include: a forward translation probability, reverse translation probability, positive vocabulary probability, and reverse vocabulary probability. In these instances, the forward translation probabilities of phrases refer to a translation probability of a translation of a phrase from a source language to a target language. The reverse translation probabilities of phrases refer to a translation probability of a translation of a phrase from a target language to a source language. The positive vocabulary probability refers to a translation probability of a word from a source language to a target language. The reverse vocabulary probability refers to a translation probability of a translation of a word from a target language to a source language)
based on a prediction from a backtranslation prediction model (Song [0131] After extraction of the translation rules, the computing device may extract features of the translation rules. The features of the translation rules may include: a forward translation probability, reverse translation probability, positive vocabulary probability, and reverse vocabulary probability. In these instances, the forward translation probabilities of phrases refer to a translation probability of a translation of a phrase from a source language to a target language. The reverse translation probabilities of phrases refer to a translation probability of a translation of a phrase from a target language to a source language. The positive vocabulary probability refers to a translation probability of a word from a source language to a target language. The reverse vocabulary probability refers to a translation probability of a translation of a word from a target language to a source language)
regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair (Song [0131] After extraction of the translation rules, the computing device may extract features of the translation rules. The features of the translation rules may include: a forward translation probability, reverse translation probability, positive vocabulary probability, and reverse vocabulary probability. In these instances, the forward translation probabilities of phrases refer to a translation probability of a translation of a phrase from a source language to a target language. The reverse translation probabilities of phrases refer to a translation probability of a translation of a phrase from a target language to a source language. The positive vocabulary probability refers to a translation probability of a word from a source language to a target language. The reverse vocabulary probability refers to a translation probability of a translation of a word from a target language to a source language);
pretrain the neural network to predict (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language), 
for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language).
Song is considered to be analogous to the claimed invention because it is in the same field of neural networks used for bilingual prediction models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Song to allow for predicting the likelihood of back-translation and training the neural network. Doing so would resolve issues relating to semantic inconsistency of candidate translations.

Regarding claim 14, Le in view of Bojar in view of Song teaches the system of claim 11.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs comprises being configured to, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translate the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translate the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs comprises 
being configured to, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2): 
translate the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2); 
and translate the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for translating to a second language. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Regarding claim 17, Le in view of Bojar in view of Song teaches the system of claim 11.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
Bojar teaches wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more third training signals of the plurality of training signals (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used)
based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for using an automatic validation metric such as BLEU. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Regarding claim 18, Le in view of Bojar in view of Song teaches the system of claim 17.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more automatic metrics includes at least one of the BLEU metric, the ROUGE metric, or the BERTscore metric.
Bojar teaches wherein the one or more automatic metrics includes at least one of 
the BLEU metric, the ROUGE metric, or the BERTscore metric (Bojar [0044] Training a translation model with the corpus FINAL is performed using well known approaches, such as tensor2tensor transformer and RNN—Recurrent neural network architectures. Automatic validation could be then performed using BLEU metric, Meteor, CHRF3 or other suitable automatic metric on both noisy and clean validation corpora. Alternatively as a “translation quality metric score” any scoring algorithm for evaluating the quality of translated text based on existing human translations could be used).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for using an automatic validation metric such as BLEU. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.

Claims 2, 3, 12 and 13 are rejected under 35 U.S.C. 103 as being anticipated by Le in view of Bojar in view of Song in further view of Kashihara et al. (US Patent Pub. No. 2021/0365837), hereinafter Kashihara.

Regarding claim 2, Le in view of Bojar in view of Song teaches the method of claim 1.
Le teaches one or more processors. However, Le does not teach
further comprising: pretraining, by the one or more processors, the neural network to predict a mask token in each of a plurality of masked language modeling tasks; and pretraining, by the one or more processors, the neural network to predict, for each given next-sentence prediction task of a plurality of next-sentence prediction tasks, whether a second passage of text of the given next-sentence prediction task directly follows a first passage of text of the given next-sentence prediction task.
Song teaches further comprising: 
pretraining, by the one or more processors, the neural network to predict (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language).
Song is considered to be analogous to the claimed invention because it is in the same field of neural networks used for bilingual prediction models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Song to allow for predicting the likelihood of back-translation and training the neural network. Doing so would resolve issues relating to semantic inconsistency of candidate translations.

Song teaches pretraining to predict. However, Song does not teach
pretraining, by the one or more processors, the neural network to predict a mask token in each of a plurality of masked language modeling tasks; and pretraining, by the one or more processors, the neural network to predict, for each given next-sentence prediction task of a plurality of next-sentence prediction tasks, whether a second passage of text of the given next-sentence prediction task directly follows a first passage of text of the given next-sentence prediction task.
Kashihara teaches
a mask token in each of a plurality of masked language modeling tasks (Kashihara [0027] BERT pre-trains the following two tasks with the raw corpus: Task 1: Masked Language Modeling (LM); [0032] Then, this sentence is applied Transformer and the model is trained to predict [MASK] part's token correctly);
and pretraining, by the one or more processors, the neural network to predict (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document),
for each given next-sentence prediction task of a plurality of next-sentence prediction tasks (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document), 
whether a second passage of text of the given next-sentence prediction task directly follows a first passage of text of the given next-sentence prediction task (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document).
Kashihara is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar in view of Song further in view of Kashihara to allow for next sentence prediction using mask tokens. Doing so would provide a method which performs better than previous methods since the replies can be considered thematically related.

Regarding claim 3, Le in view of Bojar in view of Song in view of Kashihara teaches the method of claim 2.
Le teaches one or more processors. However, Le does not teach
further comprising: generating, by the one or more processors, the plurality of masked language modeling tasks; and generating, by the one or more processors, the plurality of next-sentence prediction tasks.
Kashihara teaches further comprising: 
generating, by the one or more processors, the plurality of masked language modeling tasks (Kashihara [0027] BERT pre-trains the following two tasks with the raw corpus: Task 1: Masked Language Modeling (LM)); 
and generating, by the one or more processors, the plurality of next-sentence prediction tasks (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document).
Kashihara is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar in view of Song further in view of Kashihara to allow for next sentence prediction using mask tokens. Doing so would provide a method which performs better than previous methods since the replies can be considered thematically related.

Regarding claim 12, Le in view of Bojar in view of Song teaches the system of claim 11.
Le teaches one or more processors. However, Le does not teach
wherein the one or more processors are further configured to: pretrain the neural network to predict a mask token in each of a plurality of masked language modeling tasks; and pretrain the neural network to predict, for each given next-sentence prediction task of a plurality of next-sentence prediction tasks, whether a second passage of text of the given next- sentence prediction task directly follows a first passage of text of the given next-sentence prediction task.
Song teaches wherein the one or more processors are further configured to: 
pretrain the neural network to predict (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language);
and pretrain the neural network to predict (Song [0051] In implementations, the predetermined text vector prediction models of the target language and the source language are generated by reading a pre-stored parallel corpus, setting a training goal as to maximize average translation probabilities of sentences in the parallel corpus between the target language and the corresponding source language as background, training a predetermined bilingual encoding and decoding model for text vectors, designating an encoding part of the bilingual encoding and decoding model for text vectors after training as the predetermined text vector prediction model of the source language, and by designating a reverse model of the encoding part of the trained bilingual encoding and decoding model for text vectors as the predetermined text vector prediction model of the target language).
Song is considered to be analogous to the claimed invention because it is in the same field of neural networks used for bilingual prediction models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Song to allow for predicting the likelihood of back-translation and training the neural network. Doing so would resolve issues relating to semantic inconsistency of candidate translations.
Song teaches pretraining to predict. However, Song does not teach
pretrain the neural network to predict a mask token in each of a plurality of masked language modeling tasks; and pretrain the neural network to predict, for each given next-sentence prediction task of a plurality of next-sentence prediction tasks, whether a second passage of text of the given next- sentence prediction task directly follows a first passage of text of the given next-sentence prediction task.
Kashihara teaches 
a mask token in each of a plurality of masked language modeling tasks (Kashihara [0027] BERT pre-trains the following two tasks with the raw corpus: Task 1: Masked Language Modeling (LM); [0032] Then, this sentence is applied Transformer and the model is trained to predict [MASK] part's token correctly);

for each given next-sentence prediction task of a plurality of next-sentence prediction tasks (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document),
whether a second passage of text of the given next- sentence prediction task directly follows a first passage of text of the given next-sentence prediction task (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document).
Kashihara is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar in view of Song further in view of Kashihara to allow for next sentence prediction using mask tokens. Doing so would provide a method which performs better than previous methods since the replies can be considered thematically related.

Regarding claim 13, Le in view of Bojar in view of Song in view of Kashihara teaches the system of claim 12.
Le teaches one or more processors. However, Le does not teach
wherein the one or more processors are further configured to: generate the plurality of masked language modeling tasks; and generate the plurality of next-sentence prediction tasks.
Kashihara teaches wherein the one or more processors are further configured to: 
generate the plurality of masked language modeling tasks (Kashihara [0027] BERT pre-trains the following two tasks with the raw corpus: Task 1: Masked Language Modeling (LM));
and generate the plurality of next-sentence prediction tasks (Kashihara [0033] It is important to capture the relationship between two sentences in the tasks such as Question Answering and Textual Entailment Recognition. Then, Next Sentence Prediction task pre-trains the model. The model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document).
Kashihara is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar in view of Song further in view of Kashihara to allow for next sentence prediction using mask tokens. Doing so would provide a method which performs better than previous methods since the replies can be considered thematically related.

Claims 5, 6, 15 and 16 are rejected under 35 U.S.C. 103 as being anticipated by Le in view of Bojar in view of Song in further view of Roth et al. (US Patent Pub. No. 2012/0101804), hereinafter Roth.

Regarding claim 5, Le in view of Bojar in view of Song teaches the method of claim 4.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein generating the plurality of synthetic sentence pairs comprises, 
for each given synthetic sentence pair of a second subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for multiple subsets of synthetic pairs. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar teaches a second subset. However, Bojar does not teach
substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Roth teaches
substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair (Roth [0078] The sampling component 45 produces a set of neighbor translations by using a set of operators which are designed to perturb the current target sentence slightly, by making small changes to it, for example by performing one or more of inserting words, removing words, inserting words, reordering the words, and replacing words).
Roth is considered to be analogous to the claimed invention because it is in the same field of natural language processing involving training of translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Roth to allow for substituting one or more words of a text passage. Doing so would allow for generating a plurality of translation neighbors which in turn provides improved training for the translation model.

Regarding claim 6, Le in view of Bojar in view of Song in view of Roth teaches the method of claim 5.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein generating the plurality of synthetic sentence pairs further comprises, 
for each given synthetic sentence pair of a third subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for multiple subsets of synthetic pairs. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar teaches a third subset. However, Bojar does not teach
removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Roth teaches 
removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair (Roth [0078] The sampling component 45 produces a set of neighbor translations by using a set of operators which are designed to perturb the current target sentence slightly, by making small changes to it, for example by performing one or more of inserting words, removing words, inserting words, reordering the words, and replacing words).
Roth is considered to be analogous to the claimed invention because it is in the same field of natural language processing involving training of translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Roth to allow for removing one or more words of a text passage. Doing so would allow for generating a plurality of translation neighbors which in turn provides improved training for the translation model.

Regarding claim 15, Le in view of Bojar in view of Song teaches the system of claim 14.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs further comprises being configured to, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substitute one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs further comprises being configured to, 
for each given synthetic sentence pair of a second subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for multiple subsets of synthetic pairs. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar teaches a second subset. However, Bojar does not teach
substitute one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Roth teaches
substitute one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair (Roth [0078] The sampling component 45 produces a set of neighbor translations by using a set of operators which are designed to perturb the current target sentence slightly, by making small changes to it, for example by performing one or more of inserting words, removing words, inserting words, reordering the words, and replacing words).
Roth is considered to be analogous to the claimed invention because it is in the same field of natural language processing involving training of translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Roth to allow for substituting one or more words of a text passage. Doing so would allow for generating a plurality of translation neighbors which in turn provides improved training for the translation model.

Regarding claim 16, Le in view of Bojar in view of Song in view of Roth teaches the system of claim 15.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs further comprises being configured to, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, remove one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Bojar teaches wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs further comprises being configured to, 
for each given synthetic sentence pair of a third subset of the synthetic sentence pairs (Bojar [0026] the next step, in which a first auxiliary translation system is trained on the corpus C, said trained first auxiliary translation system is then used to translate the corpus B from third language to source language resulting in a back-translated corpus D, which is further filtered to keep only similar sentences to those contained in the noisy corpus G resulting in a synthetic parallel corpus D2).
Bojar is considered to be analogous to the claimed invention because it is in the same field of neural network based training for translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le further in view of Bojar to allow for multiple subsets of synthetic pairs. Doing so would provide a system which is tolerant to noisy inputs and improve translation accuracy even for low resource language pairs.
Bojar teaches a third subset. However, Bojar does not teach
remove one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Roth teaches
remove one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair (Roth [0078] The sampling component 45 produces a set of neighbor translations by using a set of operators which are designed to perturb the current target sentence slightly, by making small changes to it, for example by performing one or more of inserting words, removing words, inserting words, reordering the words, and replacing words).
Roth is considered to be analogous to the claimed invention because it is in the same field of natural language processing involving training of translation models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Roth to allow for removing one or more words of a text passage. Doing so would allow for generating a plurality of translation neighbors which in turn provides improved training for the translation model.

Claims 9, 10, 19 and 20 are rejected under 35 U.S.C. 103 as being anticipated by Le in view of Bojar in view of Song in further view of Yin et al. (US Patent Pub. No. 2021/0174204), hereinafter Yin.

Regarding claim 9, Le in view of Bojar in view of Song teaches the method of claim 7.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Yin teaches further comprising 
generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more fourth training signals of the plurality of training signals (Yin [0023] According to some embodiments, computing device 100 implements an architecture or framework whereby textual entailment module 130 is developed or trained as a textual entailment predictor, and then applied to one or more downstream NLP tasks)
based on a prediction from a textual entailment model (Yin [0023] According to some embodiments, computing device 100 implements an architecture or framework whereby textual entailment module 130 is developed or trained as a textual entailment predictor, and then applied to one or more downstream NLP tasks)
regarding a likelihood that the modified passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
entails (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
or contradicts the original passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment)).
Yin is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Yin to allow for using a textual entailment model. Doing so would allow for further optimization of the neural network model.

Regarding claim 10, Le in view of Bojar in view of Song teaches the method of claim 1.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Yin teaches further comprising 
generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more fourth training signals of the plurality of training signals (Yin [0023] According to some embodiments, computing device 100 implements an architecture or framework whereby textual entailment module 130 is developed or trained as a textual entailment predictor, and then applied to one or more downstream NLP tasks)
based on a prediction from a textual entailment model (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
regarding a likelihood that the modified passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
entails (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
or contradicts the original passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment)).
Yin is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Yin to allow for using a textual entailment model. Doing so would allow for further optimization of the neural network model.

Regarding claim 19, Le in view of Bojar in view of Song teaches the system of claim 17.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Yin teaches wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more fourth training signals of the plurality of training signals (Yin [0023] According to some embodiments, computing device 100 implements an architecture or framework whereby textual entailment module 130 is developed or trained as a textual entailment predictor, and then applied to one or more downstream NLP tasks) 
based on a prediction from a textual entailment model (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
regarding a likelihood that the modified passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
entails (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
or contradicts the original passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment)).
Yin is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Yin to allow for using a textual entailment model. Doing so would allow for further optimization of the neural network model.

Regarding claim 20, Le in view of Bojar in view of Song teaches the system of claim 11.
Le teaches generating the plurality of synthetic sentence pairs. However, Le does not teach
wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Yin teaches wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: 
one or more fourth training signals of the plurality of training signals (Yin [0023] According to some embodiments, computing device 100 implements an architecture or framework whereby textual entailment module 130 is developed or trained as a textual entailment predictor, and then applied to one or more downstream NLP tasks)
based on a prediction from a textual entailment model (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
regarding a likelihood that the modified passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
entails (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment))
or contradicts the original passage of text of the given synthetic sentence pair (Yin [0044] In an example, the entailment classifier is a three-way entailment classifier providing a prediction from three classes including “e” (entailment), “n” (neutral), and “c” (contradiction). In another example, the entailment classifier is a two-way entailment classifier providing a prediction from two classes including “e” (entailment), “ne” (non-entailment)).
Yin is considered to be analogous to the claimed invention because it is in the same field of natural language processing using neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Le in view of Bojar further in view of Yin to allow for using a textual entailment model. Doing so would allow for further optimization of the neural network model.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 7:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL J MUELLER/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657