DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/11/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner. 

Response to Amendments and Arguments
After reviewing an amendment filed on 04/12/2022 and performing an update search, the examiner discovered a reference to Vaswani et al. (“Attention is all you need”, 2017, which is included in the IDS submitted on 05/11/22). The examiner discussed Vaswani reference with Mr. Christopher Glembocki (Reg. 38,800). Fig. 1 in Vaswani is the same drawing as Figure 3 of the instant application. Mr. Glembocki also mentioned that another reference to Devlin et al. (“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, submitted in an IDS filed on 05/22/2020). Fig. 2 in Devlin is the same drawing as Fig. 4 of the instant application. 

Mr. Glembocki sent a proposed amendment. After a further discussion, Mr. Glembocki sent a revised version of the proposed amendment. The examiner agreed that the revised version of the proposed amendment would be sufficient to distinguish with prior art of the record. Mr. Glembocki authorized the examiner to enter the proposed amendment. The all rejections have been withdrawn. 

Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in a telephone interview with Mr. Christopher R. Glembocki (Reg. 38,800) on 05/13/2022. 

Please replace all prior versions, and listing of claims in the application with the listing of claims below:

 1.	(Currently Amended)	A computer-implemented method, comprising:
initializing a model having a sequence-to-sequence network architecture, wherein the sequence-to-sequence network architecture comprises:
an encoder; and
a decoder;
training, , the model for a first task and for a second task,  the training set comprising a plurality of encoder sequences and a plurality of decoder sequences, a portion of the plurality of encoder sequences having an encoder sequence label and a portion of the plurality of decoder sequences having a decoder sequence label, wherein each of the encoder sequences and decoder sequences include one or more elements and relationships between the one or more elements and a plurality of categories, and wherein training the model comprises:
generating, for each element, a vector representation identifying relationships between each element and the plurality of categories;
generating, based on the vector representation of each element in each sequence, an encoding of each encoder sequence in the training set, each encoding comprising an encoder attention weight calculated based on the encoder sequence label corresponding to the encoder sequence, wherein an encoding for an encoder sequence not having a corresponding encoder sequence label comprises an encoder attention weight of zero, and wherein the encoder attention weight is further calculated based on a feed-forward analysis using a position of each element in the encoder sequence;
generating an encoding of each decoder sequence in the training set, each encoding comprising a decoder attention weight calculated based on the decoder sequence label corresponding to the decoder sequence, wherein an encoding for a decoder sequence not having a corresponding decoder sequence label comprises a decoder attention weight of zero, and wherein the decoder attention weight is further calculated based on one of the encoder attention weight or based on a feed-forward analysis using a position of each element in the encoder sequence or the decoder sequence;
prepending a start of sequence token to each of the encodings of the decoder sequences;
appending an end of sequence token to each of the encodings of the decoder sequences; [[and]]
applying, for the first task or the second task, loss masking to: 
the encoder sequences; and 
the decoder sequences;
for each encoding of the encoder sequences, training the encoder using: 
the encoding of the encoder sequence; [[and]] 
the encoder attention weight
the loss masking of the first task or the loss masking of the second task;
for each encoding of the decoder sequences, training the decoder using: 
the encoding of the decoder sequences; [[and]] 
the decoder attention weight
the loss masking of the first task or the loss masking of the second task; and
generating, using the trained model having been trained for the first task and for the second task, a prediction based on an input data set. 

2.	(Previously Presented)	The computer-implemented method of claim 1, wherein generating the encoding of the encoder sequence further comprises appending a separator token to the encoding of the encoder sequence.

3.	(Original)	The computer-implemented method of claim 1, wherein generating the prediction based on the input data further comprises calculating a confidence metric indicating a likelihood that the generated prediction is correct.

4.	(Currently Amended)	The computer-implemented method of claim 1, wherein training the encoder comprises skipping any encoder sequence that has been masked. 

5.	(Currently Amended)	The computer-implemented method of claim 1, wherein training the decoder comprises skipping any decoder sequence that has been masked. 

6.	(Previously Presented)	The computer-implemented method of claim 1, wherein the encoding of a sample comprises a vector representation of the sample.

7.	(Original)	The computer-implemented method of claim 1, wherein generating the prediction comprises:
generating an input encoding of the input data;
generating an output sequence comprising a start of sequence token;
completing the output sequence by:
generating a next output sequence token by providing the input encoding to the trained model;
appending the next output sequence token to the output sequence;
iteratively generating next output sequence tokens by providing the input encoding to the trained model and appending each generated next output sequence token to the output sequence until the generated subsequent next output sequence token comprises an end of sequence token; and
generating the prediction based on the output sequence.

8.	(Original)	The computer-implemented method of claim 1, wherein the encoder sequences comprise a set of dialog intents.

9.	(Original)	The computer-implemented method of claim 1, wherein the decoder sequences comprise a set of dialog entities.

10.	(Original)	The computer-implemented method of claim 1, wherein:
the training set comprises a vocabulary; and
the encoding for a sample comprises one hundred percent coverage for the vocabulary.

11.	(Currently Amended)	The computer-implemented method of claim 1, wherein the trained model is configured to generate intent labels for a named entity.

12.	(Currently Amended)	A computing device, comprising:
a processor; and
a memory in communication with the processor and storing instructions that, when executed by the processor, cause the computing device to:
initialize a model having a sequence-to-sequence network architecture, wherein the sequence-to-sequence network architecture comprises:
an encoder; and
a decoder;
train, , the model for a first task and for a second task, the training set comprising a plurality of encoder sequences and a plurality of decoder sequences, a portion of the plurality of encoder sequences having an encoder sequence label and a portion of the plurality of decoder sequences having a decoder sequence label, wherein each of the encoder sequences and decoder sequences include one or more elements and relationships between the one or more elements and a plurality of categories, and wherein training the model comprises:
generating, for each element, a vector representation identifying relationships between each element and the plurality of categories;
generating, based on the vector representation of each element in each sequence, an encoding of each encoder sequence in the training set, each encoding comprising an encoder attention weight calculated based on the encoder sequence label corresponding to the encoder sequence and, when no encoder sequence label corresponds to the encoder sequence, setting the encoder attention weight to zero, and wherein the encoder attention weight is further calculated based on a feed-forward analysis using a position of each element in the encoder sequence;
generating an encoding of each decoder sequence in the training set, each encoding comprising [[an ]] a decoder attention weight calculated based on the decoder sequence label corresponding to the decoder sequence and, when no decoder sequence label corresponds to the decoder sequence, setting the decoder attention weight to zero, and wherein the decoder attention weight is further calculated based on one of the encoder attention weight or based on a feed-forward analysis using a position of each element in the encoder sequence or the decoder sequence;
prepending a start of sequence token to each of the encodings of the decoder sequences;
appending an end of sequence token to each of the encodings of the decoder sequences; [[and]]
applying, for the first task or the second task, loss masking to: 
the encoder sequences; and 
the decoder sequences;
for each encoding of the encoder sequences, training the encoder using: 
the encoding of the encoder sequence; [[and]]
the encoder attention weight
the loss masking of the first task or the loss masking of the second task; and
for each encoding of the decoder sequences, training the decoder using: 
the encoding of the decoder sequences; [[and]]
the decoder attention weight and
the loss masking of the first task or the loss masking of the second task; 
generate, using the trained model having been trained for the first task and the second task, a prediction based on an input data set; and 
calculate a confidence metric indicating a likelihood that the generated prediction is correct.

13.	(Currently Amended)	The computing device of claim 12, wherein training the encoder comprises skipping any encoder sequence that has been masked. 

14.	(Currently Amended)	The computing device of claim 12, wherein training the decoder comprises skipping any decoder sequence that has been masked. 

15.	(Previously Presented)	The computing device of claim 12, wherein the encoding of a sample comprises a vector representation of the sample.

16.	(Original)	The computing device of claim 12, wherein generating the prediction comprises:
generating an input encoding of the input data;
generating an output sequence comprising a start of sequence token;
completing the output sequence by:
generating a next output sequence token by providing the input encoding to the trained model;
appending the next output sequence token to the output sequence;
iteratively generating next output sequence tokens by providing the input encoding to the trained model and appending each generated next output sequence token to the output sequence until the generated subsequent next output sequence token comprises an end of sequence token; and
generating the prediction based on the output sequence.

17.	(Original)	The computing device of claim 12, wherein the encoder sequences comprise a set of dialog intents.

18.	(Original)	The computing device of claim 12, wherein the decoder sequences comprise a set of dialog entities.

19.	(Currently Amended)	A computer-implemented method, comprising:
initializing a model having a sequence-to-sequence network architecture, wherein the sequence-to-sequence network architecture comprises:
an encoder; and
a decoder;
training, , the model for a first task and for a second task, the training set comprising a plurality of encoder sequences and a plurality of decoder sequences, a portion of the plurality of encoder sequences having an encoder sequence label and a portion of the plurality of decoder sequences having a decoder sequence label, wherein each of the encoder sequences and decoder sequences include one or more elements and relationships between the one or more elements and a plurality of categories, and wherein training the model comprises:
generating, for each element, a vector representation identifying relationships between each element and the plurality of categories;
generating, based on the vector representation of each element in each sequence, an encoding of each encoder sequence in the training set, each encoding comprising an encoder attention weight calculated based on the encoder sequence label corresponding to the encoder sequence and, when no encoder sequence label corresponds to the encoder sequence, setting the encoder attention weight to zero, and wherein the encoder attention weight is further calculated based on a feed-forward analysis using a position of each element in the encoder sequence;
generating an encoding of each decoder sequence in the training set, each encoding comprising [[an]] a decoder attention weight calculated based on the decoder sequence label corresponding to the decoder sequence and, when no decoder sequence label corresponds to the decoder sequence, setting the decoder attention weight to zero, and wherein the decoder attention weight is further calculated based on one of the encoder attention weight or based on a feed-forward analysis using a position of each element in the encoder sequence or the decoder sequence;
prepending a start of sequence token to each of the encodings of the decoder sequences;
appending an end of sequence token to each of the encodings of the decoder sequences; [[and]]
applying, for the first task or the second task, loss masking to: 
the encoder sequences; and 
the decoder sequences; 
for each encoding of the encoder sequences, training the encoder using: 
the encoding of the encoder sequence; [[and]]
the encoder attention weight 
the loss masking of the first task or the loss masking of the second task; and
for each encoding of the decoder sequences, training the decoder using: 
the encoding of the decoder sequences; [[and]] 
the decoder attention weight
the loss masking of the first task or the loss masking of the second task;
generating, using the trained model having been trained for the first task and for the second task, a prediction based on an input data set by: 
generating an input encoding of the input data;
generating an output sequence comprising a start of sequence token;
completing the output sequence by:
generating a next output sequence token by providing the input encoding to the trained model;
appending the next output sequence token to the output sequence;
iteratively generating next output sequence tokens by providing the input encoding to the trained model and appending each generated next output sequence token to the output sequence until the generated subsequent next output sequence token comprises an end of sequence token;
generating the prediction based on the output sequence; and
calculating a confidence metric indicating a likelihood that the generated prediction is correct.

20.	(Original)	The computer-implemented method of claim 19, wherein:
the training set comprises a vocabulary; and
the encoding for a sample comprises one hundred percent coverage for the vocabulary.

Allowable Subject Matter
Claims 1-20 are allowed. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JIALONG HE/Primary Examiner, Art Unit 2659