Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-3, 8, 10-15, 18-20 rejected under 35 U.S.C. 103 as being unpatentable over BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Devlin et al (provided by Examiner, copyright 10/11/2018 and hereinafter Bert) further in view of MASS: Masked Sequence to Sequence Pre-training for Language Generation by Kaitao Song et al (provided by Applicant in IDS filed 3/29/21, copyright 7/13/19 and hereinafter MASS.)
Regarding claim 1, 15, 18
Bert teaches:
A computer-implemented system, method and medium bearing coded instructions operable to train a machine-learned language encoder model, the method comprising: for each of one or more training iterations comprising: 
obtaining, by the computing system, an original language input that comprises a plurality of original input tokens (Bert: Ch 3: system takes a text sentence(s) as input); 
selecting, by the computing system, one or more of the plurality of original input tokens to serve as one or more masked tokens (Bert: Ch 3, 3.3: words in the input sentence masked at random); 
generating, by the computing system, one or more replacement tokens (Bert: Ch 3.3, 3.4: a masked language model created by replacing masked words with a random word),
wherein the one or more replacement tokens comprise alternative natural language tokens (Bert: Ch 3-3.4: a masked language model created by replacing masked words with a random word; the replacements comprising alternate data including replacing the original input token with a random word); 
,
the plurality of updated input tokens comprising a mixture of the one or more replacement tokens and the original input tokens that were not selected to serve as masked tokens (Bert: Ch 3-3.4: the system updates an output comprising a mixture of original input tokens and replacement tokens); 
processing, by the computing system, the noised language input with the machine-learned language encoder model to produce respective prediction for the masked updated input token included in the plurality of updated input tokens (Bert: Ch 3.3, 3.4, 4: model used to generate predictions based on language understanding benchmarks), 
wherein the prediction produced by the machine-learned language encoder model for each updated input token predicts whether such updated input token is one of the original input tokens or one of the replacement input tokens (Bert: Ch 1, 5: masked language model scored based on ability to predict appropriate token among the set of possible tokens); and 
training, by the computing system, the machine-learned language encoder model based at least in part on a loss function that evaluates the plurality of predictions produced by the machine-learned language encoder model (Bert: Ch 3: masked language model trained for a plurality of epoch using a mean likelihood to determine training loss, costs; classification loss, costs; etc.).
Bert strongly suggests (Bert: Ch 3-3.4: a downside of the model is that only a portion of the input tokens are predicted) but does not explicitly teach generating a respective prediction for each updated input token included in the plurality of updated input tokens.


Regarding claim 2, 19
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions, wherein generating, by the computing system, the one or more replacement tokens comprises generating, by the computing system, the one or more replacement tokens using a machine-learned language generator model (Bert: Ch 3, 4: selected word is masked with a random replacement word); (MASS: Ch 3: when k=m all tokens are predicted).

Regarding claim 3, 20
Bert in view of MASS teaches or suggests:



Regarding claim 8
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions further comprising: training, by the computing system, the machine-learned language generator model in a reinforcement learning scheme based on a second objective function that evaluates the predictions produced by the machine-learned language encoder model for the replacement tokens generated by machine-learned language generator model. Examiner has taken official notice which Applicant has failed to timely and explicitly traverse and it is thus accepted as Applicant’s Admitted Prior Art (AAPA: please see MPEP 2144.03) that reinforcement learning would have comprised an obvious inclusion for evaluating the replacement token scores of the Bert and MASS system and method.

Regarding claim 10
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions wherein generating, by the computing system, the one or more replacement tokens comprises sampling, by the computing system, the one or more replacement tokens from a noise distribution. Examiner takes official notice that sampling based on a noise distribution was well 

Regarding claim 11
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions wherein the machine- learned language encoder model comprises a transformer network text encoder. (Bert: Ch 3, 4: system utilizes transformer encoder/decoder); (MASS: Ch 3 system uses a transformer neural network text encoding/decoding)

Regarding claim 12
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions wherein, when one of the replacement tokens is equal to the original token it replaces, the loss function evaluates such replacement token as if it was included in the original input tokens (Bert: Ch 3: a replacement token is kept unchanged and evaluated). Bert and MASS do not explicitly discuss ‘replacement’ of a selected token with itself however Examiner takes official notice that replacement of a selected token with itself was well known and would have comprised an obvious inclusion. 

Regarding claim 13
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions wherein: the one or more training iterations comprise one or more pre-training iterations; and the 

Regarding claim 14
Bert in view of MASS teaches or suggests:
The computer-implemented system, method and medium bearing coded instructions wherein the plurality of original input tokens comprise a plurality of original words. (Bert: Ch 3: input comprise sentences); (MASS: Abstract: a sentence with masked tokens is taken as input)

Claims 4-7, 9, 16, 17 rejected under 35 U.S.C. 103 as being unpatentable over Bert in view of Mass as applied to claims 1-3, 8, 10-15, 18-20  supra and further in view of Song: 20200097820

Regarding claim 4
Bert in view of MASS teaches or suggests:
The computer-implemented method of claim 2, further comprising: training, by the computing system, the machine-learned language generator model based at least in part on a second loss function that evaluates a difference between the one or more replacement tokens and the one or more original tokens selected to serve as masked tokens.
Bert and MASS do not explicitly discuss comparison of the first and second loss models to evaluate the difference between the loss functions.
IN a related field of endeavor Song teaches the utility of plural deep neural networks for generating first and second loss functions based on a difference between classification values 

Regarding claim 5
Bert in view of MASS in view of Song teaches or suggests:
The computer-implemented method of claim 4, wherein the second loss function comprises a maximum likelihood estimation function. Bert and MASS teach the benefit of maximizing the values of parameters within the model including using mean MLM likelihood, prediction likelihood, and/or log-likelihood (Bert: Ch 3.4); (MASS: Ch 3-4) but do not explicitly discuss the utility of maximum likelihood algorithms in this regard however Examiner has taken official notice which Applicant has failed to timely and explicitly traverse and it is thus accepted as Applicant’s Admitted Prior Art (AAPA: please see MPEP 2144.03) that a maximum likelihood estimation would have comprised an obvious inclusion. The average skilled practitioner would have been motivated to do so for the purpose of using gradient ascent, cross entropy loss, etc. to determine optimal parameters for the model or portions thereof.

Regarding claim 6
Bert in view of MASS in view of Song teaches or suggests:


Regarding claim 7
Bert in view of MASS in view of Song teaches or suggests:
The computer-implemented method of claim 4, wherein the method further comprises, prior to the one or more training iterations: individually training, by the computing system, the machine-learned language generator model on the second loss function; and after individually training, by the computing system, the machine-learned language generator model: initializing, by the computing system, the machine-learned language encoder model with weight values based on the machine-learned language generator model (Song: ¶ 103-109; Fig 6: learning applied based on weights applied to losses of first, second, etc. model).

Regarding claim 9
Bert in view of MASS in view of Song teaches or suggests:
The computer-implemented method of claim 2, wherein one or more weights are shared between the machine-learned language generator model and the machine- learned language encoder model. Examiner has taken official notice which Applicant has failed to timely and explicitly traverse and it is thus accepted as Applicant’s Admitted Prior Art (AAPA: please see MPEP 2144.03) that sharing weights among one or more models and would have comprised an obvious inclusion for curating a plurality of models in the Zweig in view of Roberta in view of 

Regarding claim 16, 17
Bert in view of MASS in view of Song teaches or suggests:
The computer-implemented system, method, instructions, wherein the one or more non-transitory computer-readable media further store the machine-learned language generator and/or encoder model. Bert, MASS teach the utility of a computer operation comprising a language generator and encoder model which would be obvious to implement on the well-known structures of a processor in concert with media storing coded instructions (see at least Song: ¶ 108-121; Fig 2).

Response to Arguments
Applicant’s arguments in concert with amended claims, see Remarks and Claims filed 3/9/21, with respect to the rejection(s) of claim(s) 1-20 under 35 USC 102 over Bert and/or 35 USC 103 over Zweig in view of Roberta and/or Zweig in view of Roberta in view of Song have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Bert in view of MASS and Bert in view of MASS in view of Song.

Conclusion 
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  


Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701.  The examiner can normally be reached on 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, VIVIAN CHIN can be reached on 5712727848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/PAUL C MCCORD/Primary Examiner, Art Unit 2654