DETAILED ACTION
Introduction
This office action is in response to applicant’s claims filed 6/2/2020. Claims 1-20 are currently pending and have been examined. Applicant’s IDS have been considered. There is no claim to foreign priority.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-9 and 16-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter because the claims as a whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea. The claims are directed to the abstract idea of collecting and comparing information. 
More specifically, in claims 1-9, and 16-20, "training a language model" appears to be a mathematical correlation used to manipulate information (Digitech), wherein the mathematical result seeks to tie up the judicial exception, such that there is no clear practical application or search area with respect to the body of the claim. The Examiner interprets the result of a trained language model as a mathematical formula (such as a hypothesis function with trained/tuned parameters applied to a feature set), usable across extensive and separate classifications. The above independent and dependent claims further define the mathematical correlation for defining the trained model. Therefore, within the body of the claim, the Examiner notes the applicant must integrate the trained language model into a practical application beyond generally linking the use to a particular technological environment.
The use of a general purpose computer to implement the abstract idea does not render the claims statutory. The additional elements or combination of elements in the claims other than the abstract idea per se amounts to no more than: recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry, with respect to claims 8 and 16. Regarding the dependent claims above, the additional limitations, which are an extension of the mathematical formula only further extend the abstract idea. Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea as discussed above, such that the claims produce a practical application with respect to the judicial exception and amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter (see Federal Register, Vol. 79, No. 241, December 16, 2014, Page 74624; and Alice Corp., 134 S. Ct. at 2359; and Digitech Image Tech. LLC v. Electronics for Imaging, Inc.-organizing and manipulating information through mathematical correlations, 758 F.3d 1344 (Fed Cir. 2014), Electric power group-collecting, analyzing and displaying certain results, and Classen-Collecting and comparing known information).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-7 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Goldberg et al. (word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding method).
As per claim 1, Goldberg teaches a method for training a language model using negative data, the method comprising: 
accessing a first training corpus comprising positive training data (page 1, 2-his positive corpus of training data); 
accessing a second training corpus comprising negative training data (ibid-his negative sampled data from a different corpus); and 
training a first language model using at least the first training corpus, the second training corpus, and a maximum likelihood function, wherein the maximum likelihood function maximizes a likelihood of the first language model predicting the positive training data while minimizing a likelihood of the first language model predicting the negative training data (ibid, pages 3-4-his Maximum Likelihood function which maximized the likelihood of predicting the positive training data, while minimizing the likelihood of predicting the negative training data, wherein the minimization of predicting the negative training data is 1 minus the probability of the negative sampled data, see his arg max functions, page 3). 
As per claim 2, Goldberg teaches method of claim 1, wherein minimizing the likelihood of the first language model predicting the negative training data comprises: 
maximizing 1 minus the likelihood of the first language model predicting the negative training data (ibid-see his arg max for 1 minus “P(D…) of D’, wherein D’ is the negative training data sample set).
As per claim 3, Goldberg teaches the method of claim 2, wherein the maximum likelihood function maximizes the likelihood of 1 minus the likelihood of the first language model predicting the negative training data by: 
maximizing a lower bound on the likelihood of 1 minus the likelihood of the first language model predicting the negative training data (ibid-see page 3, arg max function, including maximizing the product of “1 minus the likelihood” for the negative data).
As per claim 4, Goldberg teaches the method of claim 3, wherein the lower bound comprises a product of 1 minus a probability of the first language model predicting each word in the second training corpus (ibid-see his 1-P(D’), as applied to each word w).
As per claim 5, Goldberg teaches the method of claim 1, wherein the likelihood of the first language model predicting the positive training data is calculated using a likelihood function that accepts the positive training data and a plurality of weights for the first language model as inputs (ibid, pages 2 and 3-see his parameter values as the weights, and above positive/negative training data discussion).
As per claim 6, Goldberg teaches the method of claim 1, wherein the likelihood of the first language model predicting the negative training data is calculated using a likelihood function that accepts the negative training data and a plurality of weights for the first language model as inputs (ibid, pages 2 and 3-see his parameter values as the weights, and above positive/negative training data discussion).
As per claim 7, Goldberg teaches the method of claim 6, wherein the likelihood function optimizes values for the plurality of weights (ibid-page 3, see his optimization discussion, as applied to the parameters/features of the function). 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8, 9, 16, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Goldberg in view of Yuan et al. (Yuan, US 2022/0036890).
As per claim 8, Goldberg teaches [a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations] comprising: accessing a first training corpus comprising positive training data (ibid-see claim 1, corresponding and similar limitation); accessing a second training corpus comprising negative training data (ibid); and training a first language model using at least the first training corpus, the second training corpus, and a maximum likelihood function, wherein the maximum likelihood function maximizes a likelihood of the first language model predicting the positive training data while minimizing a likelihood of the first language model predicting the negative training data (ibid). 
Goldberg lacks teaching that which Yuan teaches a non-transitory computer-readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations (paragraphs [0022-0025]).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Yuan to combine the prior art element of training a language model as taught by Goldberg with a computer-readable embodiment for implementing a method as taught by Yuan as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be having computer based implemented method using stored instructions (ibid-Yuan). 
As per claim 9, Goldberg further makes obvious the non-transitory computer-readable medium of claim 8, wherein training the first language model using at least the first training corpus, the second training corpus, and the maximum likelihood function removes negative n-gram statistics from the first language model (ibid-pages 3-5-his decrease in quantity of words in context, n-grams, for negative n-gram statistics from the language model).
As per claim 16, claim 16 sets forth limitations similar to claim 8 and is thus rejected under similar reasons and rationale, wherein the system is deemed to   embody the operations, such that Goldberg with Yuan make obvious a system comprising: one or more processors (Yuan, paragraphs [0022-0025]); and one or more memory devices comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform operations (ibid) comprising: accessing a first training corpus comprising positive training data (ibid-see claim 8, corresponding and similar limitation); accessing a second training corpus comprising negative training data (ibid); and training a first language model using at least the first training corpus, the second training corpus, and a maximum likelihood function, wherein the maximum likelihood function maximizes a likelihood of the first language model predicting the positive training data while minimizing a likelihood of the first language model predicting the negative training data (ibid).
As per claim 17, Goldberg further makes obvious the system of claim 16, wherein the first language model comprises a neural language model (pages 2-3, see also section 1.1-his neural network language model, as applied to the first language model).
As per claim 19, Goldberg further makes obvious the system of claim 16, wherein the first training corpus does not include the second training corpus (ibid-Goldberg, pages 2-3-see his incorrect sample set, not from the correct sample set training corpus).
Claims 10-12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Goldberg  in view of Yuan, as applied to claim 8 above, and further in view of Nguyen et al. (Nguyen, US 2021/0049236).
As per claim 10, Goldberg further makes obvious the non-transitory computer-readable medium of claim 8, wherein training the first language model using at least the first training corpus, the second training corpus, and the maximum likelihood function, but lacks teaching a function of the task, such as that taught by Nguyen, that decreases an error rate for subject-verb agreement (paragraphs [0073-0074]-his SVA accuracy task).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Nguyen to combine the prior art element of training a language model based on words and context agreement as taught by Goldberg with the context agreement as a subject verb agreement feature as  taught by Nguyen as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be having a language model accurately modeling SVA as a feature (ibid-Nguyen).
As per claim 11, Goldberg further makes obvious the non-transitory computer-readable medium of claim 8, further comprising: accessing a second language model, wherein the second language model is configured to generate outputs that are less [grammatical] than outputs generated by the first language model (ibid-page 3-his generated negative sample set, wherein the sample set are incorrect); generating output text from the second language model; and using the output text from the second language model as the second training corpus comprising the negative training data (ibid-see his joint optimization model using the second language model and the negative sample comprising the negative training data). 
Goldberg lacks explicitly teaching the incorrect negative sample with respect to grammatical rules, however, Nguyen teaches a text structure including features based on grammatical rules. 
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Nguyen to combine the prior art element of training a language model based on words and context agreement, wherein one set of positive examples and the other training set as incorrect examples as taught by Goldberg with the context agreement as a subject verb agreement feature or grammatical rules as taught by Nguyen as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be having a language model accurately modeling SVA as a feature, based on the joint optimization model (ibid-Nguyen, see Goldberg joint optimization discussion).
As per claim 12, Goldberg further makes obvious the non-transitory computer-readable medium of claim 11, wherein the second language model comprises an n-gram model (ibid-see also pages 4, unigram and correspond context as his n-gram model for the negative training sample set).
As per claim 20, Goldberg with Yuan make obvious the system of claim 16, but lack teaching that which Nguyen teaches wherein the first training corpus and the second training corpus are both subsets of a larger training corpus (paragraphs [0178-0180], Fig. 21-his positive and negative training corpus generated from a larger training corpus). 
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Nguyen to combine the prior art element of training a language model based on a generated negative sample set as taught by Goldberg with the training corpus as taught by Nguyen as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be the joint optimization of positive and negative data from, wherein each corpus is generated from a larger corpus of information allowing the optimization of the training samples from a main corpus of information (ibid-Goldberg, ibid-Nguyen).
Claims 13-15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Goldberg  in view of Yuan in view of Nguyen, as applied to claim 11 above, and further in view of Irie et al. (Irie, Traning language models for long-span cross-sentence evaluation).
As per claim 13, Goldberg further makes obvious the non-transitory computer-readable medium of claim 11, but lack teaching that which Irie teaches wherein the second language model comprises a neural language model that is inhibited (ibid-page 423-his transformer model, which is inhibited via removing word positional encoding).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Irie to combine the prior art element of training a language model based on a generated negative sample set as taught by Goldberg with inhibiting a neural language model, via a transformer model that withholds the positional encoding of the words, as taught by Irie as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be the joint optimization of positive and negative data (ibid-Goldberg), wherein negative data is generated from a transformer-based model (ibid-Irie). 
As per claim 14, Goldberg with Yuan with Nguyen with Irie further makes obvious the non-transitory computer-readable medium of claim 13, wherein the second language model is inhibited such that the second language model does not consider word position (ibid-see claim word position removal from encodings discussion).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Irie to combine the prior art element of training a language model based on a generated negative sample set as taught by Goldberg with inhibiting a neural language model, via a transformer model that withholds the positional encoding of the words, as taught by Irie as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be the joint optimization of positive and negative data (ibid-Goldberg), wherein negative data is generated from a transformer-based model (ibid-Irie).
As per claim 15, Goldberg with Yuan with Nguyen with further makes obvious the non-transitory computer-readable medium of claim 11, Irie teaching that which the above combination lacks, wherein the second language model comprises a transformer-based model with word-location identifiers removed (ibid-page 423-his transformer model, which is inhibited via removing word positional encoding). 
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Irie to combine the prior art element of training a language model based on a generated negative sample set as taught by Goldberg with inhibiting a neural language model, via a transformer model that withholds the positional encoding of the words, as taught by Irie as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be the joint optimization of positive and negative data (ibid-Goldberg), wherein negative data is generated from a transformer-based model (ibid-Irie). 
As per claim 18, Goldberg with Irie further makes obvious the system of claim 16, wherein the first language model comprises a transformer-based language model (ibid-Irie, pages 421-423-his transform-based language model).
Thus, it would have been obvious to one of ordinary skill in the linguistics art, before the effective filing date of the invention, as all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods (computer implemented techniques and algorithms combining processes and steps in natural language processing), in view of the teachings of Goldberg and Irie to combine the prior art element of training a language model based on a generated negative sample set as taught by Goldberg with a transformer neural language model as taught by Irie as each element performs the same function as it does separately, as the combination would yield predictable results, KSR International Co. v. Teleflex Inc., 550 US. -- 82 USPQ2nd 1385 (2007), wherein the predictable result would be using a neural network for language modeling including the joint optimization of positive and negative data, the neural network as a well-known transformer model for modeling language (ibid-Goldberg, Irie). 
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure (See PTO-892). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LAMONT M SPOONER whose telephone number is (571)272-7613. The examiner can normally be reached 8:00 AM -5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LAMONT M SPOONER/           Primary Examiner, Art Unit 2657                                                                                                                                                                                             
8/12/2022
lms