DETAILED ACTION
Introduction
1.	This office action is in response to Applicant’s submission filed on 11/3/2020.   Claims 1-24 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
3.	The drawings filed on 11/3/2020 have been accepted and considered by the Examiner.

Information Disclosure Statement
4.	The information disclosure statements (IDSs) submitted on June 29, 2021 and May 9, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Objections
5.	Claim 2, and therefore claim 3 which depends therefrom, are objected to because of the following informalities:  Claim 2 recites “the RCL metric.”  There is no antecedent basis for this term.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7.	Claims 1, 4, 8, 9, 12, 16, 17, 20, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over US Pat. App. Pub. No. 20200110915 (Long et al., hereinafter “Long”) in view of Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" arXiv preprint arXiv:1810.04805 (herein “Devlin”).
With regard to Claim 1, Long describes:
A method, comprising:
receiving, at a task-oriented dialogue (TOD) language model, a TOD dataset including a plurality of dialogues, each dialogue of the plurality of dialogues including a plurality of user utterances and a plurality of system responses; (Paragraph 44 describes a task oriented dialogue system 100.  Paragraph 81 describes that system 100 can be training using a plurality of sets of dialogues including utterances and responses.)
Long does not describe:
“generating a model input sequence by:
prefixing a first token to each user utterance of the plurality of user utterances and a second token to each system response of the plurality of system responses, and
concatenating each of the prefixed user utterances and each of the prefixed system responses; 
randomly replacing the first token or the second token from the model input sequence with a mask token to generate a masked training sequence;
inputting the masked training sequence to the TOD language model;
computing a masked language modeling (MLM) loss based on a first output distribution from the TOD language model corresponding to the masked training sequence; and
updating the TOD language model based on the MLM loss.”
However, Devlin describes:
generating a model input sequence by:
prefixing a first token to each user utterance of the plurality of user utterances and a second token to each system response of the plurality of system responses, (Section 3 describes adding a special token to the beginning of every sentence) and

    PNG
    media_image1.png
    312
    538
    media_image1.png
    Greyscale

concatenating each of the prefixed user utterances and each of the prefixed system responses; (Section 3 (quoted above) describes that sentence pairs are concatenated together.)
randomly replacing the first token or the second token from the model input sequence with a mask token to generate a masked training sequence; (Section 3 describes randomly selecting tokens to replace with masks.)

    PNG
    media_image2.png
    157
    531
    media_image2.png
    Greyscale
 
inputting the masked training sequence to the TOD language model;  (Section 3 describes using the masked sentences to train the model (quoted above))
computing a masked language modeling (MLM) loss based on a first output distribution from the TOD language model corresponding to the masked training sequence; (Section 3 describes computing the MLM loss.) and

    PNG
    media_image3.png
    134
    537
    media_image3.png
    Greyscale

updating the TOD language model based on the MLM loss.”
(Section 3 describes updating the model (“predicting the original token”) based on the MLM loss (quoted above))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the model training process as described by Devlin into the system of Long to allow predicting a masked word based only on its context, thus providing trained models that can perform a wide range of tasks without substantial task-specific architecture modifications, as described in the Abstract and  Section 1 of Devlin.

    PNG
    media_image4.png
    192
    524
    media_image4.png
    Greyscale






    PNG
    media_image5.png
    38
    357
    media_image5.png
    Greyscale

With regard to Claim 4, Long does not describe “the TOD language model is built using a bidirectional encoder representations from transformers (BERT)-based language representation model.”
However, Section 1 of Devlin describes using a BERT model.

    PNG
    media_image6.png
    218
    542
    media_image6.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the model training process as described by Devlin into the system of Long to allow predicting a masked word based only on its context, as described at Section 1 of Devlin.
With regard to Claim 8, Long does not describe “selecting, using the TOD language model and for a user utterance from the plurality of user utterances, a system response from the plurality of system responses that is responsive to the user utterance.”
However, Section 3 of Devlin describes selecting a responsive system response by predicting a missing token. (quoted above)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the model training process as described by Devlin into the system of Long to allow predicting a masked word based only on its context, as described at Section 1 of Devlin.
With respect to Claims 9, 12, and 16, system Claims 9, 12, and 16 and method Claims 1, 4, and 8 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a memory (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claims 9, 12, and 16 are similarly rejected under the same rationale as applied above with respect to Claims 1, 4, and 8.
With respect to Claims 17, 20, and 24, computer readable medium Claims 17, 20, and 24 and method Claims 1, 4, and 8 are related as a computer readable medium programmed to perform a method, and the same method, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a computer readable medium (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claims 17, 20, and 24 are similarly rejected under the same rationale as applied above with respect to Claims 1, 4, and 8.

8.	Claims 2, 3, 10, 11, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Long in view of Devlin and US Pat. App. Pub. No. 20170011738 (Senior et al., hereinafter “Senior”)
With regard to Claim 2, Long describes:
“selecting a first set of dialogues from the plurality of dialogues;” (Paragraph 81 describes using sets of dialogues.  One may be selected.)
 splitting each dialogue of the first set of dialogues at a random turn into a first part of that dialogue and the second part of that dialogue to generate a second set of dialogues and a third set of dialogues, the second set of dialogues including the first part of each dialogue of the first set of dialogues and the third set of dialogues including the second part of each dialogue of the first set of dialogues;  (Paragraphs 81 and 84 describes using multiple sets of dialogues, which may include a larger dialogue split into two)
inputting the second set of dialogues and the third set of dialogues to the TOD language model; (Paragraph 81 describes that the model is trained by inputting sets of dialogues into the model.)
Long does not explicitly describe:
“computing a response contrastive loss (RCL) based on a second output distribution from the TOD language model corresponding to the second set of dialogues and the third set of dialogues,
wherein: updating the TOD language model based on the MLM loss includes updating the TOD language model based on a combination of the MLM loss and the RCL metric.”
However, Devlin describes:
“computing a response contrastive loss (RCL) based on a second output distribution from the TOD language model corresponding to the second set of dialogues and the third set of dialogues”
Section 3 of Devlin describes computing a loss based on the data entered into the model.

    PNG
    media_image3.png
    134
    537
    media_image3.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the model training process as described by Devlin into the system of Long to allow predicting a masked word based only on its context, as described at Section 1 of Devlin.
Long in view of Devlin does not explicitly describe:
“wherein: updating the TOD language model based on the MLM loss includes updating the TOD language model based on a combination of the MLM loss and the RCL metric.”
However, paragraphs 18-19 of Senior describes updating a language model based on a loss that is a weighted sum of two different losses.  (Devlin describes an MLM loss, and the RCL is broadly interpreted as a generic loss, as there are no claimed details of the RCL.)
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the loss combination as described by Senior into the system of Long in view of Devlin to improve stability and speed of convergence, as described at paragraph 71 of Senior.
With regard to Claim 3, Long in view of Devlin does not explicitly describe “the combination of the MLM loss and the RCL is a weighted sum of the MLM loss and the RCL.”
However, paragraph 19 of Senior describes updating a language model based on a loss that is a weighted sum of two different losses.  (Devlin describes an MLM loss, and the RCL is broadly interpreted as a generic loss, as there are no claimed details of the RCL.)
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the loss combination as described by Senior into the system of Long in view of Devlin to improve stability and speed of convergence, as described at paragraph 71 of Senior.
With respect to Claims 10 and 11, system Claims 10 and 11 and method Claims 2 and 3 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a memory (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claims 10 and 11 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 3.
With respect to Claims 18 and 19, computer readable medium Claims 18 and 19 and method Claims 2 and 3 are related as a computer readable medium programmed to perform a method, and the same method, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a computer readable medium (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claims 18 and 19 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 3.

9.	Claims 5, 13, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Long in view of Devlin and US Pat. App. Pub. No. 20200043480 (Shen et al., hereinafter “Shen”)
With regard to Claim 5, Long in view of Devlin does not explicitly describe “identifying, using the TOD language model, an intent class of a user utterance of the plurality of user utterances.”
However, paragraph 49 of Shen describes identifying and outputting an intent class based on an utterance.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include identifying an intent class as described by Shen into the system of Long in view of Devlin to allow for personalization of the device for different users, as described at paragraph 50 of Shen.
With respect to Claim 13, system Claim 13 and method Claim 5 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a memory (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 13 is similarly rejected under the same rationale as applied above with respect to Claim 5.
With respect to Claim 21, computer readable medium Claim 21 and method Claim 5 are related as a computer readable medium programmed to perform a method, and the same method, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a computer readable medium (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 21 is similarly rejected under the same rationale as applied above with respect to Claim 5.

10.	Claims 6, 14, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Long in view of Devlin and US Pat. App. Pub. No. 20200152182 (Steedman Henderson et al., hereinafter “Steed”)
With regard to Claim 6, Long in view of Devlin does not explicitly describe “determining, using the TOD language model, a belief state of a dialogue of the plurality of dialogues.”
However, paragraph 20 of Steed describes updating a belief state, which is equivalent to determining a belief state.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include updating a belief state as described by Steed into the system of Long in view of Devlin to allow updating world state information, as described at paragraph 23 of Steed.
With respect to Claim 14, system Claim 14 and method Claim 6 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a memory (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 14 is similarly rejected under the same rationale as applied above with respect to Claim 6.
With respect to Claim 22, computer readable medium Claim 22 and method Claim 6 are related as a computer readable medium programmed to perform a method, and the same method, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a computer readable medium (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 22 is similarly rejected under the same rationale as applied above with respect to Claim 6.

11.	Claims 7, 15, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Long in view of Devlin and US Pat. App. Pub. No. 20100131274 (Stent et al., hereinafter “Stent”)
With regard to Claim 7, Long in view of Devlin does not explicitly describe “predicting, using the TOD language model, a dialogue act of a dialogue of the plurality of dialogues.”
However, paragraphs 32 and 42 of Stent describes a system that predicts the likely dialog act of the next user utterance.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include predicting a dialogue act as described by Stent into the system of Long in view of Devlin to allow selection of a language model with the dialogue act, as described at paragraph 32 of Stent.
With respect to Claim 15, system Claim 15 and method Claim 7 are related as system and the method of using same, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a memory (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 15 is similarly rejected under the same rationale as applied above with respect to Claim 7.
With respect to Claim 23, computer readable medium Claim 23 and method Claim 7 are related as a computer readable medium programmed to perform a method, and the same method, with each claimed element's function corresponding to the claimed method step. Further, Long teaches storage 150 may be a computer readable medium (paragraph 41) and processing engine 112 may be a processor (paragraph 35).  Accordingly, Claim 23 is similarly rejected under the same rationale as applied above with respect to Claim 7.

Conclusion
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent App. Pub. No. 20220084510 (Peng et al.) describes a device that masks inputs to a task-oriented dialogue data model.
13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./           Examiner, Art Unit 2656                                                                                                                                                                                             

/MICHELLE M KOETH/Primary Examiner, Art Unit 2656