DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statements (IDS) were submitted on 02/09/2021 and 05/18/2021. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception of an abstract idea without significantly more. 
The independent Claims 1, 15 and 16 similarly recite “A method for validating language models, the method comprising: at an electronic device with one or more processors and memory”, “obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens”; “determining a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process”; “in accordance with a determination that the probability is within a predetermined range”; “determining that the one or more tokens correspond to a prediction associated with the user privacy preserving training process” and “outputting a predicted token sequence including the one or more tokens and the one or more previous tokens”.
The limitations “determining a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process”; “in accordance with a determination that the probability is within a predetermined range”; and “determining that the one or more tokens correspond to a prediction associated with the user privacy preserving training process”, as drafted, is a process that covers performance of the limitation by the use of mathematical concepts but of a generic computing device.  That is, other than reciting the generic computing device, nothing precludes the step from practically being performed by pen and paper. 
This judicial exception is not integrated into a practical application. Claims 1, 15 and 16 recite “obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens” and “outputting a predicted token sequence including the one or more tokens and the one or more previous tokens”. All of these limitations direct towards using a generic computing device for the method, and do not impose any meaningful limits on practicing the abstract idea. Claims 1, 15 and 16 do not contain any additional limitations.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a generic computing device the determining steps amount to no more than mere instructions to apply the exception using a generic computer. Mere instructions to apply an exception using a generic computing device cannot provide an inventive concept. Therefore, claims 1, 15 and 16 are not patent eligible.
 	
	Claims 2-14 are rejected as being directed to an abstract idea without significantly more under a similar rationale as claims 1, 15 and 16. These claims either depend on or require the limitations of claims 1, 15 and 16. The additional limitations recited in these claims can be performed mentally as well and neither add "significantly more" than the abstract idea or integrate the abstract idea into a practical application. Therefore, these claims are directed toward a judicial exception idea as well.

Claim Rejections - 35 USC § 102

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 5-10 and 12-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Thomson et al. (US 2020/0243094 A1) (hereinafter Thomson).

Regarding claim 1: Thomson discloses an electronic device, comprising:
one or more processors (Thomson ¶0302, “…may run on one or more processors”); 
a memory (Thomson ¶1712, “…a memory”);
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions (Thomson ¶1714 and ¶0302, in ¶1714 Thomson states “… processor 9110 may be configured to interpret and/or execute program instructions and/or process data stored in the memory”);
obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens (Thomson discloses a language model using multiple input hypotheses which include one or more future (predicted) tokens, based on one or more previous tokens (Thomson ¶0394, 0384 and 0263, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”));
determining a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process (Thomson discloses determining a probability that the first input hypothesis with one or more previous/future tokens was generated by a first language model that is trained to preserve user privacy via encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
in accordance with a determination that the probability is within a predetermined range (Thomson discloses the probability being above a selected threshold (Thomson ¶1286,  “In some embodiments, the size of a fake n-gram table may be managed by including entries with a probability or count above a selected threshold.”));
determining that the one or more tokens correspond to a prediction associated with the user privacy preserving training process (Thomson discloses determining that the one or more previous/future (predicted) tokens were generated by a first language model are associated to privacy preserving techniques such as encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
outputting a predicted token sequence including the one or more tokens and the one or more previous tokens (Thomson discloses outputting resulting tokens to a device (Thomson ¶0394-0395 and 0384 in ¶0395 Thomson states “may output tokens based on the best available information at a point in time. In these and other embodiments, the voting process 1408 may provide corrections if future inputs or input changes trigger a change in tokens already output”)).

Regarding Claim 2: Thomson further discloses the electronic device of claim 1, wherein the first set of data includes a probability distribution for the one or more predicted tokens over a vocabulary associated with the first language model (Thomson ¶0263 and 0384).

Regarding Claim 3: Thomson further discloses the electronic device of claim 1, the one or more programs further including instructions for: generating, using the first language model, the first set of data based on the one or more previous tokens (Thomson ¶0394 and 0384).

Regarding Claim 5: Thomson further discloses the electronic device of claim 1, wherein the predicted token sequence represents an emerging vocabulary (Thomson ¶0323 and 0263).

Regarding Claim 6: Thomson further discloses the electronic device of claim 1, wherein the probability is relative to a probability that the first set of data corresponds to a prediction generated by a second language model not trained using the user privacy preserving training process (Thomson ¶0263, 1075 and 1165).

Regarding Claim 7: Thomson further discloses the electronic device of claim 1, wherein the probability is relative to a probability that the first set of data corresponds to a prediction generated by a second language model not trained using the user privacy preserving training process (Thomson ¶0706).

Regarding Claim 8: Thomson further discloses the electronic device of claim 7, the one or more programs further including instructions for: generating, using the first language model, first data corresponding to a first plurality of predicted tokens (Thomson ¶0842); generating, using a third language model, second data corresponding to a second plurality of predicted tokens (Thomson ¶0843); the third language model is not trained using the user privacy preserving training process (Thomson ¶0843, 1075 and 1165); the first data and the second data are generated based on a same plurality of previous tokens (Thomson ¶0394 and 0428); and training the discriminator using the first data and the second data (Thomson ¶0394 and 0704-0706).

Regarding Claim 9: Thomson further discloses the electronic device of claim 1, the one or more programs further including instructions for: in accordance with the determination that the probability is within the predetermined range (Thomson ¶1286); in accordance with a determination that the predicted token sequence corresponds to a first type of prediction, modifying the first language model to prevent prediction of the predicted token sequence (Thomson ¶0270-0271); and in accordance with a determination that the predicted token sequence corresponds to a second type of prediction, training the first language model using training data to reduce a prediction frequency of the predicted token sequence, the training data being selected based on the predicted token sequence to reduce a frequency of predictions of the second type (Thomson ¶1285-1286).

Regarding Claim 10: Thomson further discloses the electronic device of claim 9, wherein the first type of prediction includes a prediction determined to be objectionable (Thomson ¶0270-0271).

Regarding Claim 12: Thomson further discloses the electronic device of claim 1, the one or more programs further including instructions for: in accordance with a determination that the probability is not within the predetermined range and is within a second predetermined range, determining that the one or more tokens do not correspond to a prediction associated with the user privacy preserving training process (Thomson ¶0880).

Regarding Claim 13: Thomson further discloses the electronic device of claim 1, wherein the one or more predicted tokens include any of one or more characters or one or more words (Thomson ¶0343).

Regarding Claim 14: Thomson further discloses the electronic device of claim 1, wherein the one or more previous tokens include any of one or more characters or one or more words (Thomson ¶0343 and 0394).

Regarding claim 15: Thomson discloses a non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device (Thomson ¶1715);
obtain a first set of data corresponding to one or more tokens predicted based on one or more previous tokens (Thomson discloses a language model using multiple input hypotheses which include one or more future (predicted) tokens, based on one or more previous tokens (Thomson ¶0394, 0384 and 0263, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”));
determine a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process (Thomson discloses determining a probability that the first input hypothesis with one or more previous/future tokens was generated by a first language model that is trained to preserve user privacy via encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
in accordance with a determination that the probability is within a predetermined range (Thomson discloses the probability being above a selected threshold (Thomson ¶1286,  “In some embodiments, the size of a fake n-gram table may be managed by including entries with a probability or count above a selected threshold.”));
determine that the one or more tokens correspond to a prediction associated with the user privacy preserving training process (Thomson discloses determining that the one or more previous/future (predicted) tokens were generated by a first language model are associated to privacy preserving techniques such as encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
output a predicted token sequence including the one or more tokens and the one or more previous tokens (Thomson discloses outputting resulting tokens to a device (Thomson ¶0394-0395 and 0384 in ¶0395 Thomson states “may output tokens based on the best available information at a point in time. In these and other embodiments, the voting process 1408 may provide corrections if future inputs or input changes trigger a change in tokens already output”)).

Regarding claim 16: Thomson discloses a method for validating language models, the method comprising:
at an electronic device with one or more processors and memory (Thomson ¶0302 and 1712);
obtaining a first set of data corresponding to one or more tokens predicted based on one or more previous tokens (Thomson discloses a language model using multiple input hypotheses which include one or more future (predicted) tokens, based on one or more previous tokens (Thomson ¶0394, 0384 and 0263, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”));
determining a probability that the first set of data corresponds to a prediction generated by a first language model trained using a user privacy preserving training process (Thomson discloses determining a probability that the first input hypothesis with one or more previous/future tokens was generated by a first language model that is trained to preserve user privacy via encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
in accordance with a determination that the probability is within a predetermined range (Thomson discloses the probability being above a selected threshold (Thomson ¶1286,  “In some embodiments, the size of a fake n-gram table may be managed by including entries with a probability or count above a selected threshold.”));
determining that the one or more tokens correspond to a prediction associated with the user privacy preserving training process (Thomson discloses determining that the one or more previous/future (predicted) tokens were generated by a first language model are associated to privacy preserving techniques such as encryption and anonymization (Thomson ¶0384, 0263 and 1180, in ¶0394 Thomson states “A language model probability used for fusion may also be conditioned on contexts from multiple input hypotheses. For example, with two inputs, a word probability may be expressed as P (word|context 1, context 2), where context 1 is one or more previous tokens from a first input hypothesis and context 2 is one or more previous tokens in a second input hypothesis. Context 1 may further include one or more future tokens from a first input hypothesis. Context 2 may further include one or more future tokens from a second input hypothesis.”, in ¶1180 Thomson further states “…To preserve privacy, this process may use encryption and may anonymize, i.e., discard information related to the speaker's identity or personal information”));
outputting a predicted token sequence including the one or more tokens and the one or more previous tokens (Thomson discloses outputting resulting tokens to a device (Thomson ¶0394-0395 and 0384 in ¶0395 Thomson states “may output tokens based on the best available information at a point in time. In these and other embodiments, the voting process 1408 may provide corrections if future inputs or input changes trigger a change in tokens already output”)).

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Thomson et al., (US 2020/0243094 Al) (hereinafter Thomson), in view of Xu et al., (US 2021/0143987 A1) (hereinafter Xu).

Regarding Claim 4: Thomson discloses the electronic device of claim 1. However, Thomson fails explicitly disclose the claimed:
 	wherein the user privacy preserving training process includes a private federated learning process.

However, in an analogous art, Xu discloses: 
wherein the user privacy preserving training process includes a private federated learning process (Xu discloses Federated learning has emerged as a promising approach for collaborative and privacy-preserving learning (Xu ¶0025)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to incorporate the disclosed teaching of Xu to the device of Thomson, because this approach would address data training privacy concerns by allowing participants to keep their training data private when training a model (Xu ¶0025).

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Thomson et al., (US 2020/0243094 A1) (hereinafter Thomson), in view of Pickover et al., (US 2019/0349333 Al) (hereinafter Pickover).

Regarding Claim 11: Thomson discloses the electronic device of claim 1. However, Thomson fails to explicitly disclose the claimed:
wherein the second type of prediction includes a prediction representing systemic bias in the first language model.

However, in an analogous art, Pickover discloses: 
wherein the second type of prediction includes a prediction representing systemic bias in the first language model (Pickover discloses filtering the potential/estimated negative types within the Artificial Intelligence (AI) model which include considerations of bias, mean-spirited entities, racism, bigotry, misogyny, cultural insensitivity or the like (Pickover ¶0047)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to incorporate the disclosed teaching of Pickover to the device of Thomson, because the filtering of such words in the AI model would decrease the degree to which an AI entity learns bad behavior and culturally insensitive information (Pickover ¶0047).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Please see attached form PTO-892.
Gysel et al., (US 2018/0182376 A1) relates to processing speech or text using rank-reduced token representation. In one example process, speech input is received. A sequence of candidate words corresponding to the speech input is determined. The sequence of candidate words includes a current word and one or more previous words. A vector representation of the current word is determined from a set of trained parameters. A number of parameters in the set of trained parameters varies as a function of one or more linguistic characteristics of the current word. Using the vector representation of the current word, a probability of a next word given the current word and the one or more previous words is determined. A text representation of the speech input is displayed based on the determined probability.
Beaufays et al., (US 2021/0327410 A1) discloses processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the If spoken utterance; cause at least part of the predicted textual segment to be rendered ( e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DERRICK SCOTT JEFFERIES whose telephone number is (571)272-0923. The examiner can normally be reached 7:30a-4:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DERRICK SCOTT JEFFERIES/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658