DETAILED ACTION
This action is in response to the application filed 03/25/2019 which claims priority to PRO 62/647585 filed 03/23/2018. Claims 1-20 are pending and have been considered. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/26/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claims 5 and 20 are objected to because of the following informalities:  
Regarding claim 5, "claim1" appears to be missing a space and should read "claim 1".  
Regarding claim 20, applicant is suggested to amend “The computer-readable storage media” to “The non-transitory computer-readable storage media” to avoid an antecedent basis rejection. 
Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-12, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al. ("Semi-supervised Sequence Learning", cited by Applicant in the IDS filed on 08/26/2019, hereinafter "Dai") in view of Vinyals et al. ("A Neural Conversational Model", cited by Applicant in the IDS filed on 08/26/2019, hereinafter "Vinyals").



Regarding claim 1, Dai teaches A method comprising: 
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
	Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

	Regarding claim 2, the combination of Dai and Vinyals teaches The method of claim 1, where Dai further teaches wherein the turn prediction task is to auto-encode the input snippet, and wherein the turn prediction is a predicted reconstruction of the input snippet (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Vinyals teaches the input snippet as cited above in claim 1.]).
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]


	Regarding claim 3, the combination of Dai and Vinyals teaches The method of claim 1, where Vinyals further teaches wherein the turn prediction task is to predict one or more turns that follow the input snippet in a dialogue transcript (“Conversational modeling can directly benefit from this formulation because it requires mapping between queries and responses. Due to the complexity of this mapping, conversational modeling has previously been designed to be very narrow in domain, with a major undertaking on feature engineering. In this work, we experiment with the conversation modeling task by casting it to a task of predicting the next sequence given the previous sequence or sequences using recurrent networks. We find that this approach can do surprisingly well on generating fluent and accurate replies to conversations.” [pg. 1, § 1 Introduction, ¶2]), and wherein the turn prediction is a prediction of a turn that follows the input snippet in the dialogue transcript in which the input snippet is found (“As turn taking is not clearly indicated, we treated consecutive sentences assuming they were uttered by different characters. We trained our model to predict the next sentence given the previous one, and we did this for every sentence (noting that this doubles our dataset size, as each sentence is used both for context and as target).” [pg. 3, § 4.2. OpenSubtitles dataset, ¶1]).  
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 5, the combination of Dai and Vinyals teaches The method of claim 1, where Dai further teaches wherein the prediction neural network has a set of prediction parameters (“The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. [pg. 1, Abstract]), and wherein training the supervised prediction neural network to perform the supervised prediction task comprises training the prediction neural network jointly with the encoder neural network to determine trained values of the prediction network parameters from initial values of the prediction network parameters (“In most of our experiments our output layer predicts the document label from the LSTM output at the last timestep. We also experiment with the approach of putting the label at every timestep and linearly increasing the weights of the prediction objectives from 0 to 1. This way we can inject gradients to earlier steps in the recurrent networks. We call this approach linear label gain. Lastly, we also experiment with the method of jointly training the supervised learning task with the sequence autoencoder and call this method joint training.” [pg. 2-3, § 3. Overview of baselines, ¶2]).

Regarding claim 6, the combination of Dai and Vinyals teaches The method of claim 5, where Dai further teaches wherein the prediction neural network has not been previously trained on any other task before the supervised prediction neural network is trained on the supervised prediction task (“We also find that a simple pretraining step can significantly stabilize the training of LSTMs. A simple pretraining method is to use a recurrent language model as a starting point of the supervised network” [pg. 1, § 1 Introduction, ¶2; Examiner is interpreting a pre-training method to be equivalent to a prediction neural network that has not been previously trained on any task. Pre-training would occur before training of the supervised model on the supervised task.]).

Regarding claim 7, the combination of Dai and Vinyals teaches The method of claim 1, where Dai further teaches wherein the encoder neural network is a recurrent neural network that is configured to process each turn in the snippet to generate the encoded representation (“Our approach to sequence autoencoding is inspired by the work in sequence to sequence learning (also known as seq2seq) by Sutskever et al., which has been successfully used for machine translation, text parsing, image captioning, video analysis, speech recognition and conversational modeling. Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2. Sequence autoencoders and recurrent language models, ¶1]).

Regarding claim 8, the combination of Dai and Vinyals teaches The method of claim 1, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 9, the combination of Dai and Vinyals teaches The method of claim 1, where Dai further teaches further comprising: providing the supervised prediction neural network for use in performing the supervised prediction task (“We find that the weights obtained from the sequence autoencoder can be used as an initialization of another supervised network, one which tries to classify the sequence. We hypothesize that this is because the network can already memorize the input sequence. This reason, and the fact that the gradients have shortcuts, are our hypothesis of why the sequence autoencoder is a good and stable approach in initializing recurrent networks.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶3; See further: “After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase.” [pg. 3, § 4 Experiments, ¶3]]).

Regarding claim 10, Dai teaches A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers (“To speed up performance and reduce GPU memory usage, we perform truncated backpropagation up to 400 timesteps from the end of the sequence.” [pg. 3, § 4 Experiment, ¶2, GPU memory implies use of computers.]), cause the one or more computers to perform operations comprising: 
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
	Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 11, the combination of Dai and Vinyals teaches The system of claim 10, where Dai further teaches wherein the turn prediction task is to auto-encode the input snippet, and wherein the turn prediction is a predicted reconstruction of the input snippet (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Vinyals teaches the input snippet as cited above in claim 1.]).
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

	Regarding claim 12, the combination of Dai and Vinyals teaches The system of claim 10, where Vinyals further teaches wherein the turn prediction task is to predict one or more turns that follow the input snippet in a dialogue transcript (“Conversational modeling can directly benefit from this formulation because it requires mapping between queries and responses. Due to the complexity of this mapping, conversational modeling has previously been designed to be very narrow in domain, with a major undertaking on feature engineering. In this work, we experiment with the conversation modeling task by casting it to a task of predicting the next sequence given the previous sequence or sequences using recurrent networks. We find that this approach can do surprisingly well on generating fluent and accurate replies to conversations.” [pg. 1, § 1 Introduction, ¶2]), and wherein the turn prediction is a prediction of a turn that follows the input snippet in the dialogue transcript in which the input snippet is found (“As turn taking is not clearly indicated, we treated consecutive sentences assuming they were uttered by different characters. We trained our model to predict the next sentence given the previous one, and we did this for every sentence (noting that this doubles our dataset size, as each sentence is used both for context and as target).” [pg. 3, § 4.2. OpenSubtitles dataset, ¶1]).  
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]
Regarding claim 14, the combination of Dai and Vinyals teaches The system of claim 10, where Dai further teaches wherein the prediction neural network has a set of prediction parameters (“The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. [pg. 1, Abstract]), and wherein training the supervised prediction neural network to perform the supervised prediction task comprises training the prediction neural network jointly with the encoder neural network to determine trained values of the prediction network parameters from initial values of the prediction network parameters (“In most of our experiments our output layer predicts the document label from the LSTM output at the last timestep. We also experiment with the approach of putting the label at every timestep and linearly increasing the weights of the prediction objectives from 0 to 1. This way we can inject gradients to earlier steps in the recurrent networks. We call this approach linear label gain. Lastly, we also experiment with the method of jointly training the supervised learning task with the sequence autoencoder and call this method joint training.” [pg. 2-3, § 3. Overview of baselines, ¶2]).

Regarding claim 15, the combination of Dai and Vinyals teaches The system of claim 14, where Dai further teaches wherein the prediction neural network has not been previously trained on any other task before the supervised prediction neural network is trained on the supervised prediction task (“We also find that a simple pretraining step can significantly stabilize the training of LSTMs. A simple pretraining method is to use a recurrent language model as a starting point of the supervised network” [pg. 1, § 1 Introduction, ¶2; Examiner is interpreting a pre-training method to be equivalent to a prediction neural network that has not been previously trained on any task. Pre-training would occur before training of the supervised model on the supervised task.]).

Regarding claim 16, the combination of Dai and Vinyals teaches The system of claim 10, where Dai further teaches wherein the encoder neural network is a recurrent neural network that is configured to process each turn in the snippet to generate the encoded representation (“Our approach to sequence autoencoding is inspired by the work in sequence to sequence learning (also known as seq2seq) by Sutskever et al., which has been successfully used for machine translation, text parsing, image captioning, video analysis, speech recognition and conversational modeling. Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2. Sequence autoencoders and recurrent language models, ¶1]).

Regarding claim 17, the combination of Dai and Vinyals teaches The system of claim 10, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 18, the combination of Dai and Vinyals teaches The system of claim 10, where Dai further teaches the operations further comprising: providing the supervised prediction neural network for use in performing the supervised prediction task (“We find that the weights obtained from the sequence autoencoder can be used as an initialization of another supervised network, one which tries to classify the sequence. We hypothesize that this is because the network can already memorize the input sequence. This reason, and the fact that the gradients have shortcuts, are our hypothesis of why the sequence autoencoder is a good and stable approach in initializing recurrent networks.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶3; See further: “After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase.” [pg. 3, § 4 Experiments, ¶3]]).

Regarding claim 19, Dai teaches One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers (“To speed up performance and reduce GPU memory usage, we perform truncated backpropagation up to 400 timesteps from the end of the sequence.” [pg. 3, § 4 Experiment, ¶2, GPU memory implies use of computers.]), cause the one or more computers to perform operations comprising:
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
	Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 20, the combination of Dai and Vinyals teaches The computer-readable storage media of claim 19, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Dai in view of Vinyals and further in view of Lowe et al. ("The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems", hereinafter "Lowe").

Regarding claim 4, the combination of Dai and Vinyals teaches The method of claim 1, however fails to explicitly teach wherein the turn prediction task is to predict the turns that are at one or more predetermined positions relative to the input snippet in a dialogue transcript, and wherein the turn prediction is a prediction of the turns that are at the one or more predetermined positions relative to the input snippet in the dialogue transcript in which the input snippet is found.
Lowe teaches Lowe teaches wherein the turn prediction task is to predict the turns that are at one or more predetermined positions (“Since we want to learn to predict all parts of a conversation, as opposed to only the closing statement, we consider various portions of context for the conversations in the test set.” [pg. 5, § 3.4 Test Set Generation, ¶2; note: Examiner is interpreting all parts of the conversation to be equivalent to one or more predetermined positions (i.e. first sentence or closing statement, etc.) relative to the input snippet in a dialogue transcript (“Lowe further discloses: “Compared to the rest of the corpus, this test set has been further processed to extract a pair of (context, response, flag) triples from each dialogue. The flag is a Boolean variable indicating whether or not the response was the actual next utterance after the given context. The response is a target (output) utterance which we aim to correctly identify. The context consists of the sequence of utterances appearing in dialogue prior to the response. We create a pair of triples, where one triple contains the correct response (i.e. the actual next utterance in the dialogue), and the other triple contains a false response, sampled randomly from elsewhere within the test set.  [pg. 5, § 3.4 Test Set Generation, ¶1; Examiner is interpreting “relative” to the input snippet to correspond to knowing the sequence in a dialogue and predicting the best response based off context.]), and wherein the turn prediction is a prediction of the turns that are at the one or more predetermined positions relative to the input snippet in the dialogue transcript in which the input snippet is found (“Here, C denotes the maximum desired context size, which we set to C = 20. The last term is the desired minimum context size, which we set to be 2. Parameter t is the actual length of that dialogue (thus the constraint that c ≤ t − 1), and n is a random number corresponding to the randomly sampled context length, that is selected to be inversely proportional to C. In practice, this leads to short test dialogues having short contexts, while longer dialogues are often broken into short or medium-length segments, with the occasional long context of 10 or more turns.” [pg. 5, § 3.4 Test Set Generation, ¶2-3]).
Dai, Vinyals, and Lowe are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Lowe discloses a multi-turn dialogue model to find the next best response in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s and Vinyals’ teachings to further implement a prediction method for finding the best response in all parts of conversation as taught by Lowe. One would have been motivated to make this modification in order to select the best response in any turn of a conversation. [pg. 1-2, 1 Introduction, ¶4, Lowe]

Regarding claim 13, the combination of Dai and Vinyals teaches The system of claim 10, however fails to explicitly teach wherein the turn prediction task is to predict the turns that are at one or more predetermined positions relative to the input snippet in a dialogue transcript, and wherein the turn prediction is a prediction of the turns that are at the one or more predetermined positions relative to the input snippet in the dialogue transcript in which the input snippet is found.
Lowe teaches wherein the turn prediction task is to predict the turns that are at one or more predetermined positions (“Since we want to learn to predict all parts of a conversation, as opposed to only the closing statement, we consider various portions of context for the conversations in the test set.” [pg. 5, § 3.4 Test Set Generation, ¶2; note: Examiner is interpreting all parts of the conversation to be equivalent to one or more predetermined positions (i.e. first sentence or closing statement, etc.) relative to the input snippet in a dialogue transcript (“Lowe further discloses: “Compared to the rest of the corpus, this test set has been further processed to extract a pair of (context, response, flag) triples from each dialogue. The flag is a Boolean variable indicating whether or not the response was the actual next utterance after the given context. The response is a target (output) utterance which we aim to correctly identify. The context consists of the sequence of utterances appearing in dialogue prior to the response. We create a pair of triples, where one triple contains the correct response (i.e. the actual next utterance in the dialogue), and the other triple contains a false response, sampled randomly from elsewhere within the test set.  [pg. 5, § 3.4 Test Set Generation, ¶1; Examiner is interpreting “relative” to the input snippet to correspond to knowing the sequence in a dialogue and predicting the best response based off context.]), and wherein the turn prediction is a prediction of the turns that are at the one or more predetermined positions relative to the input snippet in the dialogue transcript in which the input snippet is found (“Here, C denotes the maximum desired context size, which we set to C = 20. The last term is the desired minimum context size, which we set to be 2. Parameter t is the actual length of that dialogue (thus the constraint that c ≤ t − 1), and n is a random number corresponding to the randomly sampled context length, that is selected to be inversely proportional to C. In practice, this leads to short test dialogues having short contexts, while longer dialogues are often broken into short or medium-length segments, with the occasional long context of 10 or more turns.” [pg. 5, § 3.4 Test Set Generation, ¶2-3]).
Dai, Vinyals, and Lowe are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Lowe discloses a multi-turn dialogue model to find the next best response in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s and Vinyals’ teachings to further implement a prediction method for finding the best response in all parts of conversation as taught by Lowe. One would have been motivated to make this modification in order to select the best response in any turn of a conversation. [pg. 1-2, 1 Introduction, ¶4, Lowe]

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        




/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122