DETAILED ACTION

Introduction
This office action is in response to Applicant’s submission filed on 04 December 2020. Claims 1-20 are pending in the application. As such, claims 1-20 have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings filed on 04 December 2020 have been accepted and considered by the Examiner.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of copending Application No. 17/003,572 in view of Lai et al. (US Patent Pub. No. 2021/0182662), hereinafter Lai.Although the claims at issue are not identical, they are not patentably distinct from each other because they both claim a neural network generating synthetic sentence pairs, using a plurality of training signals to indicate backtranslation and further having humans grade the sentence pairs.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Claim 1 of the instant application are similar to claim 1 of 17/003,572. Claim 1 of 17/003,572 teaches all the elements of claim 1 of the instant patent, except the student network, as per the table below. 
Claim 2 of the instant application is similar to a combination of claims 4 and 1 of 17/003,572.
Claim 3 of the instant application is similar to a combination of claims 5 and 1 of 17/003,572.
Claim 4 of the instant application is similar to a combination of claims 6 and 1 of 17/003,572.
Claim 5 of the instant application is similar to a combination of claims 4 and 1 of 17/003,572.
Claim 6 of the instant application is similar to a combination of claims 5 and 6 of 17/003,572.
Claim 7 of the instant application is similar to claim 4 of 17/003,572.
Claim 8 of the instant application is similar to claim 7 of 17/003,572.
Claim 9 of the instant application is similar to claim 9 of 17/003,572.
Claim 10 of the instant application are similar to claim 1 of 17/003,572.
Claim 11 of the instant application are similar to claim 1 of 17/003,572. Claim 1 of 17/003,572 teaches all the elements of claim 1 of the instant patent, except the student network, as per the table below. 
Claim 12 of the instant application is similar to claim 5 of 17/003,572.
Claim 13 of the instant application is similar to claim 6 of 17/003,572.
Claim 14 of the instant application is similar to claim 4 of 17/003,572.
Claim 15 of the instant application is similar to claim 1 of 17/003,572.
Claim 16 of the instant application is similar to a combination of claims 5 and 6 of 17/003,572.
Claim 17 of the instant application is similar to claim 4 of 17/003,572.
Claim 18 of the instant application is similar to claim 7 of 17/003,572.
Claim 19 of the instant application is similar to claim 9 of 17/003,572.
Claim 20 of the instant application are similar to claim 1 of 17/003,572.

17/112,285
17/003,572
Claim 1

1. A method of training a neural network, 
comprising: generating, by one or more processors of a processing system, for each given synthetic sentence pair of a plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text: 
1. A method of training a neural network, 
comprising: generating, by one or more processors of a processing system, a plurality of synthetic sentence pairs, each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text;
a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair;

a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals 
based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair;
training, by the one or more processors, a neural network based on each given synthetic sentence pair of the plurality of synthetic sentence pairs and the plurality of training signals, 
pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, 
and further based on a plurality of human-graded sentence pairs;
and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs,
generating, by the one or more processors, a plurality of graded sentence pairs, 
each graded sentence pair of the plurality of graded sentence pairs comprising
an original passage of text 
and a modified passage of text 
and a grade generated by the neural network 
based on the original passage of text and the modified passage of text;
and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs,
a grade allocated by a human grader to the given human-graded sentence pair
and training, by the one or more processors, a student network to predict, 


for each given graded sentence pair in a plurality of graded sentence pairs, the grade generated by the neural network
for each given human-graded sentence pair of a plurality of human-graded sentence pairs, 
a grade allocated by a human grader to the given human-graded sentence pair.
Claim 2

2. The method of claim 1, wherein the plurality of synthetic sentence pairs comprises text in a plurality of different languages, 

4. The method of claim 1, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
and the plurality of graded sentence pairs comprises text in only a subset of the plurality of different languages.
1. … and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, …
Claim 3

3. The method of claim 1, wherein generating the plurality of graded sentence pairs comprises, for each given graded sentence pair of a first subset of the graded sentence pairs, substituting one or more words of the original passage of text of the given graded sentence pair to create the modified passage of text of the given graded sentence pair.
5. The method of claim 4, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.

1. … and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, …
Claim 4

4. The method of claim 3, wherein generating the plurality of graded sentence pairs further comprises, for each given graded sentence pair of a second subset of the graded sentence pairs, removing one or more words of the original passage of text of the given graded sentence pair to create the modified passage of text the modified passage of text of the given graded sentence pair.
6. The method of claim 5, wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.

1. … and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, …
Claim 5

5. The method of claim 4, wherein generating the plurality of graded sentence pairs further comprises, for each given graded sentence pair of a third subset of the graded sentence pairs: translating, by the one or more processors, the original passage of text of the given graded sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given graded sentence pair.
4. The method of claim 1, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.

1. … and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, …
Claim 6

6. The method of claim 1, further comprising: generating, by the one or more processors, the plurality of synthetic sentence pairs; and wherein generating the plurality of synthetic sentence pairs comprises: for each given synthetic sentence pair of a first subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair; 
5. The method of claim 4, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
and for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
6. The method of claim 5, wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Claim 7

7. The method of claim 6, wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
4. The method of claim 1, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
Claim 8

8. The method of claim 1, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
7. The method of claim 1, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
Claim 9

9. The method of claim 8, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
9. The method of claim 7, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Claim 10

10. The method of claim 1, wherein training the neural network based on each given synthetic sentence pair of the plurality of synthetic sentence pairs and the plurality of training signals, and further based on a plurality of human-graded sentence pairs comprises: 















pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.
1. A method of training a neural network, comprising: generating, by one or more processors of a processing system, a plurality of synthetic sentence pairs, each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text; generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.
Claim 11

11. A processing system comprising: 
a memory; and one or more processors coupled to the memory and configured to: 
generate, for each given synthetic sentence pair of a plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text: 
1. A method of training a neural network, 
comprising: generating, by one or more processors of a processing system, a plurality of synthetic sentence pairs, each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text;
a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals 
based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; 
a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals 
based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair;
train a neural network based on each given synthetic sentence pair of the plurality of synthetic sentence pairs and the plurality of training signals, 
pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, 
and further based on a plurality of human-graded sentence pairs; generate a plurality of graded sentence pairs, each graded sentence pair of the plurality of graded sentence pairs comprising an original passage of text and a modified passage of text and a grade generated by the neural network based on the original passage of text and the modified passage of text; 
and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs,
and train a student network to predict,

for each given graded sentence pair in a plurality of graded sentence pairs, the grade generated by the neural network.
for each given human-graded sentence pair of a plurality of human-graded sentence pairs, 
a grade allocated by a human grader to the given human-graded sentence pair.
Claim 12 

12. The system of claim 11, wherein the one or more processors being configured to generate the plurality of graded sentence pairs comprises being configured to, for each given graded sentence pair of a first subset of the graded sentence pairs, substitute one or more words of the original passage of text of the given graded sentence pair to create the modified passage of text of the given graded sentence pair.
5. The method of claim 4, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Claim 13

13. The system of claim 12, wherein the one or more processors being configured to generate the plurality of graded sentence pairs further comprises being configured to, for each given graded sentence pair of a second subset of the graded sentence pairs, remove one or more words of the original passage of text of the given graded sentence pair to create the modified passage of text the modified passage of text of the given graded sentence pair.
6. The method of claim 5, wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Claim 14

14. The system of claim 13, wherein the one or more processors being configured to generate the plurality of graded sentence pairs further comprises being configured to, for each given graded sentence pair of a third subset of the graded sentence pairs: translate the original passage of text of the given graded sentence pair from a first language into a second language, to create a translated passage of text; and translate the translated passage of text from the second language into the first language, to create the modified passage of text of the given graded sentence pair.
4. The method of claim 1, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
Claim 15

15. The system of claim 11, wherein the one or more processors are further configured to generate the plurality of synthetic sentence pairs.
1. A method of training a neural network, comprising: generating, by one or more processors of a processing system, a plurality of synthetic sentence pairs, …
Claim 16

16. The system of claim 15, wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs comprises being configured to: for each given synthetic sentence pair of a first subset of the synthetic sentence pairs, substitute one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair; 
5. The method of claim 4, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, substituting one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
and for each given synthetic sentence pair of a second subset of the synthetic sentence pairs, remove one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
6. The method of claim 5, wherein generating the plurality of synthetic sentence pairs further comprises, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs, removing one or more words of the original passage of text of the given synthetic sentence pair to create the modified passage of text of the given synthetic sentence pair.
Claim 17

17. The system of claim 16, wherein the one or more processors being configured to generate the plurality of synthetic sentence pairs further comprises being configured to, for each given synthetic sentence pair of a third subset of the synthetic sentence pairs: translate the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translate the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
4. The method of claim 1, wherein generating the plurality of synthetic sentence pairs comprises, for each given synthetic sentence pair of a first subset of the synthetic sentence pairs: translating, by the one or more processors, the original passage of text of the given synthetic sentence pair from a first language into a second language, to create a translated passage of text; and translating, by the one or more processors, the translated passage of text from the second language into the first language, to create the modified passage of text of the given synthetic sentence pair.
Claim 18

18. The system of claim 11, wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
7. The method of claim 1, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more third training signals of the plurality of training signals based on one or more scores generated by comparing the original passage of text of the given synthetic sentence pair to the modified passage of text of the given synthetic sentence pair using one or more automatic metrics.
Claim 19

19. The system of claim 18, wherein the one or more processors are further configured to generate, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
9. The method of claim 7, further comprising generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: one or more fourth training signals of the plurality of training signals based on a prediction from a textual entailment model regarding a likelihood that the modified passage of text of the given synthetic sentence pair entails or contradicts the original passage of text of the given synthetic sentence pair.
Claim 20

20. The system of claim 11, wherein the one or more processors being configured to train the neural network based on each given synthetic sentence pair of the plurality of synthetic sentence pairs and the plurality of training signals, and further based on a plurality of human- graded sentence pairs comprises being configured to: 













pretrain the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tune the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.
1. A method of training a neural network, comprising: generating, by one or more processors of a processing system, a plurality of synthetic sentence pairs, each synthetic sentence pair of the plurality of synthetic sentence pairs comprising an original passage of text and a modified passage of text; generating, by the one or more processors, for each given synthetic sentence pair of the plurality of synthetic sentence pairs: a first training signal of a plurality of training signals based on whether the given synthetic sentence pair was generated using backtranslation; and one or more second training signals of the plurality of training signals based on a prediction from a backtranslation prediction model regarding a likelihood that one of the original passage of text or the modified passage of text of the given synthetic sentence pair could have been generated by backtranslating the other one of the original passage of text or the modified passage of text of the given synthetic sentence pair; pretraining, by the one or more processors, the neural network to predict, for each given synthetic sentence pair of the plurality of synthetic sentence pairs, the plurality of training signals for the given synthetic sentence pair; and fine-tuning, by the one or more processors, the neural network to predict, for each given human-graded sentence pair of a plurality of human-graded sentence pairs, a grade allocated by a human grader to the given human-graded sentence pair.


As illustrated in the table, the claims of the instant application are similar to the claims of 17/003,572, matching in all aspects except for the student network. However, Lai teaches:
student network (Lai [0010] Techniques are disclosed for training a reduced scale neural network based natural language processing (NLP) model using a full-scale NN based NLP model. The techniques are particularly well-suited for training transformer-based neural network models, such as BERT. In an embodiment, a dense knowledge distillation approach is used to train the reduced scale model. In this manner, the dense knowledge distillation can be used to effectively transfer knowledge acquired in the full-scale model to the reduced scale model. The full-scale model acts as a teacher model, and the reduced scale model acts as a student model. In more detail, and according to some such embodiments, training data used to train the student model comprises both masked tokens and unmasked tokens. A masked token comprises one or more words that are masked or hidden in the training data. Thus, the teacher and the student models have to predict the words corresponding to the masked token. An unmasked token includes one or more words that are explicitly mentioned in the training data. For purposes of training the student model, the teacher and the student models may be configured to predict the unmasked tokens as well (as if the tokens were masked tokens). So, for instance, the student model can be trained using a pre-trained teacher model as follows. Training data is input to both the student and teacher models. The training data includes a plurality of masked tokens and a plurality of unmasked tokens. The student model generates a first prediction and a second prediction, and the teacher model generates a third prediction and a fourth prediction. The first and third predictions are associated with a masked token of the training data, and the second and fourth predictions are associated with an unmasked token of the training data. The student model can then be trained based at least in part on the first, second, third, and fourth predictions. In some embodiments, the training of the student model uses loss functions associated with masked tokens of the training data, as well as loss functions associated with unmasked tokens of the training data. For instance, in an embodiment, a first loss function is generated based at least in part on a comparison of the first prediction and the third prediction (with respect to the masked token), and a second loss function is generated based at least in part on a comparison of the second prediction and the fourth prediction (with respect to the unmasked token). The student model is then trained based at least in part on the first and second loss functions).
Lai is considered to be analogous to the claimed invention because it is in the same field of training neural network language processing models. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified 17/003,572 further in view of Lai to allow for using a student model. Doing so would allow for effectively transferring knowledge acquired in the full-scale model to the reduced scale model (the student model).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL J. MUELLER whose telephone number is (571)272-1875. The examiner can normally be reached M-F 8:30am-5:30pm (Eastern).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PAUL J. MUELLER/Examiner, Art Unit 2657
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657