DETAILED ACTION

Note: This action supersedes the Non-final Rejection of 07/13/2022

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Claim Objections
Claims 4, 9, and 14 objected to because of the following informalities:  the preamble reads “fine-turning” which should read “fine-tuning”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-15 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  Using the subject matter eligibility test from page 74621 of the Federal Register Notice titled “2014 Interim Guidance on Patent Subject Matter Eligibility,” a two-step process is performed. Under step 1, the claims are analyzed to determine if the claim is directed to a process, machine, article of manufacture, or composition of matter. In this case, claims 1-5 are directed to a method, which is a process; claims 6-10 are directed to a device, which is a machine or an article of manufacture; and claims 11-15 are directed to a computer readable medium, which is a machine or an article of manufacture. Step 2A (part 1 of the Mayo test), using the guidance from pages 50-57 of the Federal Register Vol. 84 No. 4 from Monday, January 7, 2019, requires applying a two-prong inquiry. In Prong One, examiners evaluate whether the claim recites a judicial exception, determining if the claim is directed to a law of nature, a natural phenomenon, or an abstract idea. In this case, claim 1 recites pre-training, fine tuning, and determining models, which are mental processes. In Prong Two, examiners evaluate whether the judicial exception is integrated into a practical application that imposes a meaningful limit on the judicial exception. In this case, the claims do not contain additional limitations that would integrate the abstract idea into a practical application.
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claims 1, 6, and 11, pre-training models, fine-tuning the models, and determining a model are mental processes, which is an abstract idea. Additional limitations such as processor, memory, and computer readable media are not considered integration into a practical application or significantly more.

Regarding claims 2, 7, and 12, the limitations are further clarifications of the above abstract ideas, without integration into a practical application and without significantly more.

Regarding claims 3, 8, and 13, performing deep pre-training is a mental process, which is an abstract idea, and the other limitation is a further clarification of the abstract ideas without integration into a practical application and without significantly more.

Regarding claims 4, 9, and 14, selecting a task and updating parameters are mental processes, which are abstract ideas, and the other limitation is a further clarification of the abstract ideas without integration into a practical application and without significantly more.

Regarding claims 5, 10, and 15, combining models is a mental process, which is an abstract idea, without integration into a practical application and without significantly more.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ruder (Ruder Sebastian - "The state of Transfer Learning in NLP" August 18, 2019; 19 pages, https://ruder.io/state-of-transfer-learning-in-nlp/> XP055797-79), in view of Cohen et al. (US 11,262,978 B1), hereinafter referred to as Cohen.

Regarding claim 1, Ruder teaches:
A method for obtaining a question-answer reading comprehension model, wherein the method comprises: 
pre-training N models with different structures respectively with unsupervised training data to obtain N pre-trained models (Page 5 figure at the top, "LM pre-training", where different models are used, and page 4, where unlabeled corpus corresponds to unsupervised training), different models respectively corresponding to different pre-training tasks, N being a positive integer greater than one (page 16 "Ensembling", where models are trained on different target tasks);
fine-tuning the pre-trained models with supervised training data by taking a task as a primary task and taking predetermined other natural language processing tasks as secondary tasks, respectively, to obtain N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks); and
determining the model according to the N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks).  
Ruder does not teach:
question-answer reading comprehension
Cohen teaches:
a question-answer reading comprehension task (col. 18 lines 22-50, where question answering is performed)
determining the question-answer reading comprehension model (col. 18 lines 22-50, where question answering is performed by a model)
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (NLP) with other components (Question answering); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 2, Ruder in view of Cohen teaches:
The method according to claim 1, wherein the pre-training with unsupervised training data respectively comprises: 
pre-training any model with unsupervised training data from at least two different predetermined fields, respectively (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 3, Ruder in view of Cohen teaches:
The method according to claim 1, wherein the method further comprises: 
for any pre-trained model, performing deep pre-training for the pre-trained model with unsupervised training data from at least one predetermined field according to a training task corresponding to the pre-trained model to obtain an enhanced pre-trained model (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages), 
wherein the unsupervised training data used upon the deep pre-training and the unsupervised training data used upon the pre-training come from different fields (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 4, Ruder in view of Cohen teaches:
The method according to claim 1, wherein the fine-turning comprises: 
for any pre-trained model, in each step of the fine-tuning, selecting a task from the primary task and the secondary tasks for training, and updating the model parameters (Ruder page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks, where training or tuning updates the parameters), 
wherein the primary task is selected more times than any of the secondary tasks (Ruder pages 15-16 "Multi-task fine-tuning" where the task ratio is annealed to deemphasize the auxiliary task towards the end of training).  

Regarding claim 5, Ruder in view of Cohen teaches:
The method according to claim 1. wherein the determining the question-answer reading comprehension model according to the N fine-tuned models comprises: 
using a knowledge distillation technique to compress the N fine-tuned models into a single model, and taking the single model as the question-answer reading comprehension model (Ruder page 16 "Distilling", where the models are distilled into a single, smaller model, and Cohen col. 18 lines 22-50, where question answering is performed).

Regarding claim 6, Ruder teaches:
An electronic device, comprising: 
at least one processor (Page 6 "Why does language modelling work so well", where computation is performed); and 
a memory communicatively connected with the at least one processor (Page 14 "Trade-offs and practical considerations", where the model is stored in memory); wherein, 
the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform a method for obtaining a question-answer reading comprehension model (Page 14 "Trade-offs and practical considerations", where the model is stored in memory), wherein the method comprises: 
pre-training N models with different structures respectively with unsupervised training data to obtain N pre-trained models (Page 5 figure at the top, "LM pre-training", where different models are used, and page 4, where unlabeled corpus corresponds to unsupervised training), different models respectively corresponding to different pre-training tasks, N being a positive integer greater than one (page 16 "Ensembling", where models are trained on different target tasks);
fine-tuning the pre-trained models with supervised training data by taking a task as a primary task and taking predetermined other natural language processing tasks as secondary tasks, respectively, to obtain N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks); and
determining the  model according to the N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks).  
Ruder does not teach:
question-answer reading comprehension
Cohen teaches:
a question-answer reading comprehension task (col. 18 lines 22-50, where question answering is performed)
determining the question-answer reading comprehension model (col. 18 lines 22-50, where question answering is performed by a model)
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (NLP) with other components (Question answering); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 7, Ruder in view of Cohen teaches:
The electronic device according to claim 6, wherein the pre-training with unsupervised training data respectively comprises: 
pre-training any model with unsupervised training data from at least two different predetermined fields, respectively (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 8, Ruder in view of Cohen teaches:
The electronic device according to claim 6, wherein the method further comprises:   
for any pre-trained model, performing deep pre-training for the pre-trained model with unsupervised training data from at least one predetermined field according to a training task corresponding to the pre-trained model to obtain an enhanced pre-trained model (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages), 
wherein the unsupervised training data used upon the deep pre-training and the unsupervised training data used upon the pre-training come from different fields (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 9, Ruder in view of Cohen teaches:
The electronic device according to claim 6, wherein the fine-turning comprises: 
for any pre-trained model, in each step of the fine-tuning, selecting a task from the primary task and the secondary tasks for training, and updating the model parameters (Ruder page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks, where training or tuning updates the parameters), 
wherein the primary task is selected more times than any of the secondary tasks (Ruder pages 15-16 "Multi-task fine-tuning" where the task ratio is annealed to deemphasize the auxiliary task towards the end of training).  

Regarding claim 10, Ruder in view of Cohen teaches:
The electronic device according to claim 6, wherein the determining the question-answer reading comprehension model according to the N fine-tuned models comprises: 
using a knowledge distillation technique to compress the N fine-tuned models into a single model, and taking the single model as the question-answer reading comprehension model (Ruder page 16 "Distilling", where the models are distilled into a single, smaller model, and Cohen col. 18 lines 22-50, where question answering is performed).

Regarding claim 11, Ruder teaches:
A non-transitory computer-readable storage medium storing computer instructions therein (Page 14 "Trade-offs and practical considerations", where the model is stored in memory), wherein the computer instructions cause the computer to perform a method for obtaining a question-answer reading comprehension model, wherein the method comprises: 
pre-training N models with different structures respectively with unsupervised training data to obtain N pre-trained models (Page 5 figure at the top, "LM pre-training", where different models are used, and page 4, where unlabeled corpus corresponds to unsupervised training), different models respectively corresponding to different pre-training tasks, N being a positive integer greater than one (page 16 "Ensembling", where models are trained on different target tasks);
fine-tuning the pre-trained models with supervised training data by taking a task as a primary task and taking predetermined other natural language processing tasks as secondary tasks, respectively, to obtain N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks); and
determining the model according to the N fine-tuned models (Pages 4-5, where pre-trained representations are adapted to a supervised target task using labelled data, and page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks).  
Ruder does not teach:
question-answer reading comprehension
Cohen teaches:
a question-answer reading comprehension task (col. 18 lines 22-50, where question answering is performed)
determining the question-answer reading comprehension model (col. 18 lines 22-50, where question answering is performed by a model)
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (NLP) with other components (Question answering); the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 12, Ruder in view of Cohen teaches:
The non-transitory computer-readable storage medium according to claim 11, wherein the pre-training with unsupervised training data respectively comprises: 
pre-training any model with unsupervised training data from at least two different predetermined fields, respectively (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 13, Ruder in view of Cohen teaches:
The non-transitory computer-readable storage medium according to claim 11, wherein the method further comprises: 
for any pre-trained model, performing deep pre-training for the pre-trained model with unsupervised training data from at least one predetermined field according to a training task corresponding to the pre-trained model to obtain an enhanced pre-trained model (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages), 
wherein the unsupervised training data used upon the deep pre-training and the unsupervised training data used upon the pre-training come from different fields (Ruder pages 7-8, where the different fields are the different amounts of data, for example one field is the data in 562MB to 1.1B and the next field is from 1.1B to 2.25B, or alternatively, the fields are different languages).  

Regarding claim 14, Ruder in view of Cohen teaches:
The non-transitory computer-readable storage medium according to claim 11, wherein the fine-turning comprises: 
for any pre-trained model, in each step of the fine-tuning, selecting a task from the primary task and the secondary tasks for training, and updating the model parameters (Ruder page 15 "Sequential adaptation", where fine-tuning is performed with both target and related tasks, where training or tuning updates the parameters), 
wherein the primary task is selected more times than any of the secondary tasks (Ruder pages 15-16 "Multi-task fine-tuning" where the task ratio is annealed to deemphasize the auxiliary task towards the end of training).  

Regarding claim 15, Ruder in view of Cohen teaches:
The non-transitory computer-readable storage medium according to claim 11, wherein the determining the question-answer reading comprehension model according to the N fine-tuned models comprises: 
using a knowledge distillation technique to compress the N fine-tuned models into a single model, and taking the single model as the question-answer reading comprehension model (Ruder page 16 "Distilling", where the models are distilled into a single, smaller model, and Cohen col. 18 lines 22-50, where question answering is performed).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2020/0349464 A1 para [0031] teaches question answering using a machine learning model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658