DETAILED ACTION
This action is in response to the claims filed 12/16/2021 for application 16/363,891. Claims 1, 5, 10, 19, and 20 are amended, claims 2-4 and 11-13 are canceled, and claims 21-23 are new. Claims 1, 5-10, and 14-23 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 5-10, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al. ("Semi-supervised Sequence Learning", cited by Applicant in the IDS filed on 08/26/2019, hereinafter "Dai") in view of Vinyals et al. ("A Neural Conversational Model", cited by Applicant in the IDS filed on 08/26/2019, hereinafter "Vinyals") and further in view of Kiros et al. ("Skip-Thought Vectors", cited by Applicant in the IDS filed on 08/26/2019, hereinafter "Kiros").

Regarding claim 1, Dai teaches A method comprising: 
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence and one or more additional conversational turns at other positions in the sequence;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
	Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]
	However Dai/Vinyals fails to explicitly teach and one or more additional conversational turns at other positions in the sequence;
	Kiros teaches and one or more additional conversational turns at other positions in the sequence (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
	Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’ teachings to implement the skip thought model method as taught by Kiros. One would have been motivated to make this modification in order to predict nearby sentences that share semantic syntactic properties. [Abstract, Kiros]

Regarding claim 5, Dai/Vinyals/Kiros teaches The method of claim 1, where Dai further teaches wherein the prediction neural network has a set of prediction parameters (“The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. [pg. 1, Abstract]), and wherein training the supervised prediction neural network to perform the supervised prediction task comprises training the prediction neural network jointly with the encoder neural network to determine trained values of the prediction network parameters from initial values of the prediction network parameters (“In most of our experiments our output layer predicts the document label from the LSTM output at the last timestep. We also experiment with the approach of putting the label at every timestep and linearly increasing the weights of the prediction objectives from 0 to 1. This way we can inject gradients to earlier steps in the recurrent networks. We call this approach linear label gain. Lastly, we also experiment with the method of jointly training the supervised learning task with the sequence autoencoder and call this method joint training.” [pg. 2-3, § 3. Overview of baselines, ¶2]).

Regarding claim 6, Dai/Vinyals/Kiros teaches The method of claim 5, where Dai further teaches wherein the prediction neural network has not been previously trained on any other task before the supervised prediction neural network is trained on the supervised prediction task (“We also find that a simple pretraining step can significantly stabilize the training of LSTMs. A simple pretraining method is to use a recurrent language model as a starting point of the supervised network” [pg. 1, § 1 Introduction, ¶2; Examiner is interpreting a pre-training method to be equivalent to a prediction neural network that has not been previously trained on any task. Pre-training would occur before training of the supervised model on the supervised task.]).

Regarding claim 7, Dai/Vinyals/Kiros teaches The method of claim 1, where Dai further teaches wherein the encoder neural network is a recurrent neural network that is configured to process each turn in the snippet to generate the encoded representation (“Our approach to sequence autoencoding is inspired by the work in sequence to sequence learning (also known as seq2seq) by Sutskever et al., which has been successfully used for machine translation, text parsing, image captioning, video analysis, speech recognition and conversational modeling. Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2. Sequence autoencoders and recurrent language models, ¶1]).

Regarding claim 8, Dai/Vinyals/Kiros teaches The method of claim 1, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Kiros’ teachings to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 9, Dai/Vinyals/Kiros teaches The method of claim 1, where Dai further teaches further comprising: providing the supervised prediction neural network for use in performing the supervised prediction task (“We find that the weights obtained from the sequence autoencoder can be used as an initialization of another supervised network, one which tries to classify the sequence. We hypothesize that this is because the network can already memorize the input sequence. This reason, and the fact that the gradients have shortcuts, are our hypothesis of why the sequence autoencoder is a good and stable approach in initializing recurrent networks.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶3; See further: “After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase.” [pg. 3, § 4 Experiments, ¶3]]).

Regarding claim 10, Dai teaches A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers (“To speed up performance and reduce GPU memory usage, we perform truncated backpropagation up to 400 timesteps from the end of the sequence.” [pg. 3, § 4 Experiment, ¶2, GPU memory implies use of computers.]), cause the one or more computers to perform operations comprising: 
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence and one or more additional conversational turns at other positions in the sequence;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]
However Dai/Vinyals fails to explicitly teach and one or more additional conversational turns at other positions in the sequence;
	Kiros teaches and one or more additional conversational turns at other positions in the sequence (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
	Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’ teachings to implement the skip thought model method as taught by Kiros. One would have been motivated to make this modification in order to predict nearby sentences that share semantic syntactic properties. [Abstract, Kiros]

Regarding claim 14, Dai/Vinyals/Kiros teaches The system of claim 10, where Dai further teaches wherein the prediction neural network has a set of prediction parameters (“The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. [pg. 1, Abstract]), and wherein training the supervised prediction neural network to perform the supervised prediction task comprises training the prediction neural network jointly with the encoder neural network to determine trained values of the prediction network parameters from initial values of the prediction network parameters (“In most of our experiments our output layer predicts the document label from the LSTM output at the last timestep. We also experiment with the approach of putting the label at every timestep and linearly increasing the weights of the prediction objectives from 0 to 1. This way we can inject gradients to earlier steps in the recurrent networks. We call this approach linear label gain. Lastly, we also experiment with the method of jointly training the supervised learning task with the sequence autoencoder and call this method joint training.” [pg. 2-3, § 3. Overview of baselines, ¶2]).

Regarding claim 15, Dai/Vinyals/Kiros teaches The system of claim 14, where Dai further teaches wherein the prediction neural network has not been previously trained on any other task before the supervised prediction neural network is trained on the supervised prediction task (“We also find that a simple pretraining step can significantly stabilize the training of LSTMs. A simple pretraining method is to use a recurrent language model as a starting point of the supervised network” [pg. 1, § 1 Introduction, ¶2; Examiner is interpreting a pre-training method to be equivalent to a prediction neural network that has not been previously trained on any task. Pre-training would occur before training of the supervised model on the supervised task.]).

Regarding claim 16, Dai/Vinyals/Kiros teaches The system of claim 10, where Dai further teaches wherein the encoder neural network is a recurrent neural network that is configured to process each turn in the snippet to generate the encoded representation (“Our approach to sequence autoencoding is inspired by the work in sequence to sequence learning (also known as seq2seq) by Sutskever et al., which has been successfully used for machine translation, text parsing, image captioning, video analysis, speech recognition and conversational modeling. Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2. Sequence autoencoders and recurrent language models, ¶1]).

Regarding claim 17, Dai/Vinyals/Kiros teaches The system of claim 10, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Kiros’ teachings to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Regarding claim 18, Dai/Vinyals/Kiros teaches The system of claim 10, where Dai further teaches the operations further comprising: providing the supervised prediction neural network for use in performing the supervised prediction task (“We find that the weights obtained from the sequence autoencoder can be used as an initialization of another supervised network, one which tries to classify the sequence. We hypothesize that this is because the network can already memorize the input sequence. This reason, and the fact that the gradients have shortcuts, are our hypothesis of why the sequence autoencoder is a good and stable approach in initializing recurrent networks.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶3; See further: “After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase.” [pg. 3, § 4 Experiments, ¶3]]).
Regarding claim 19, Dai teaches One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers (“To speed up performance and reduce GPU memory usage, we perform truncated backpropagation up to 400 timesteps from the end of the sequence.” [pg. 3, § 4 Experiment, ¶2, GPU memory implies use of computers.]), cause the one or more computers to perform operations comprising:
obtaining unsupervised training data (“For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning” [pg. 1, § Introduction, ¶4; Unlabeled data used for unsupervised learning would correspond to unsupervised training data])
training a turn prediction neural network to perform a turn prediction task (“A simple pretraining method is to use a recurrent language model as a starting point of the supervised network. A slightly better method is to use a sequence autoencoder, which uses a RNN to read a long input sequence into a single vector. This vector will then be used to reconstruct the original sequence” [pg. 1, § 1. Introduction, ¶2; note: Examiner is interpreting the predicted reconstruction of the original sequence to be equivalent to a “turn prediction”, thus the RNN disclosed by Dai would be equivalent to a “turn prediction” neural network.) on the unsupervised training data using unsupervised learning (“A significant property of the sequence autoencoder is that it is unsupervised, and thus can be trained with large quantities of unlabeled data to improve its quality. Our result is that additional unlabeled data can improve the generalization ability of recurrent networks. This is especially useful for tasks that have limited labeled data” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4]), wherein: 
the turn prediction neural network comprises (i) a turn encoder neural network that is configured to receive and to generate an encoded representation of the input snippet (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet comprising one or more input conversational turns as cited below]) in accordance with a set of encoder network parameters  (Dai discloses a set of encoder network parameters: “In our sequence autoencoders, the weights for the decoder network and the encoder network are the same (see Figure 1).” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶2])
and (ii) a turn decoder neural network that is configured to receive the encoded representation of the input snippet and to process the encoded representation to generate a turn prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a turn prediction.), and
training the turn prediction neural network to perform the turn prediction task comprises training the turn encoder neural network to determine updated values of the encoder network parameters from initial values of the encoder network parameters (“The weights obtained from pretraining can then be used as an initialization for the standard LSTM RNNs. We believe that this semi-supervised approach is superior to other unsupervised sequence learning methods, e.g., Paragraph Vectors, because it can allow for easy fine-tuning.” [pg. 1, § 1 Introduction, ¶2; weights obtained from pretraining would be equivalent to determining updated values of the encoder network parameters.]); 
obtaining supervised training data (“In this first set of experiments, we benchmark our methods on the IMDB movie sentiment dataset, proposed by Maas et al. There are 25,000 labeled and 50,000 unlabeled documents in the training set and 25,000 in the test set. We use 15% of the labeled training documents as a validation set.” [pg. 3, § 4.1 Sentiment analysis experiments with IMDB, ¶1]); and 
training a supervised prediction neural network to perform a supervised prediction task on the supervised training data using supervised learning (“In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.” [Abstract]), wherein: 
the supervised prediction neural network comprises (i) the turn encoder neural network (See Figure 1 on pg. 2) and (ii) a prediction neural network that is configured to receive the encoded representation of the input snippet generated by the turn encoder neural network (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶1; This corresponds to generating an encoded representation of the input. note: Vinyals teaches an input snippet as cited below]) and to process the respective encoded representations to generate a supervised prediction (“Key to their approach is the use of a recurrent network as an encoder to read in an input sequence into a hidden state, which is the input to a decoder recurrent network that predicts the output sequence.” [pg. 2, § 2 Sequence autoencoders and recurrent language models, ¶4; Examiner is interpreting predicting an output sequence to be equivalent to generating a supervised prediction.), and
training the supervised prediction neural network to perform the supervised prediction task comprises training the turn prediction neural network to determine trained values of the encoder network parameters from the updated values of the encoder network parameters that were determined by training the turn prediction neural network on the turn prediction task (“After training the recurrent language model or the sequence autoencoder for roughly 500K steps with a batch size of 128, we use both the word embedding parameters and the LSTM weights to initialize the LSTM for the supervised task. We then train on that task while fine tuning both the embedding parameters and the weights and use early stopping when the validation error starts to increase. We choose the dropout parameters based on a validation set.” [pg. 3, § 4 Experiments, ¶3; embedding parameters from pre-training would be equivalent to encoder network parameters from the updated values of the encoder network. See further: “These two algorithms can be used as a “pretraining” algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models.” [Abstract; the two algorithms disclosed by Dai are interpreted as unsupervised models used to train another supervised model.]]).
However Dai fails to explicitly teach comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns;
an input snippet comprising one or more input conversational turns
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output;
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence and one or more additional conversational turns at other positions in the sequence;
Vinyals teaches comprising a plurality of dialogue transcripts, each dialogue transcript comprising a sequence of conversational turns (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
an input snippet comprising one or more input conversational turns (See pg. 3, § 5. Experiments for input snippet]);
comprising a plurality of snippets of one or more conversational turns and, for each snippet, a respective target output (“The model is based on a recurrent neural network which reads the input sequence one token at a time, and predicts the output sequence, also one token at a time. During training, the true output sequence is given to the model, so learning can be done by backpropagation. The model is trained to maximize the cross entropy of the correct sequence given its context. During inference, given that the true output sequence is not observed, we simply feed the predicted output token as input to predict the next output. This is a “greedy” inference approach. A less greedy approach would be to use beam search, and feed several candidates at the previous step to the next step. The predicted sequence can be selected based on the probability of the sequence.” [pg. 2, 3. Model, ¶1; target output would correspond to the output sequence])
wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, a next conversational turn at position t+1 in the sequence (“Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation.” [Abstract; See further pg. 3, § 5. Experiments for dialogue transcripts]);
	Dai and Vinyals are both in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s semi-supervised sequence learning method to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]
However Dai/Vinyals fails to explicitly teach and one or more additional conversational turns at other positions in the sequence;
	Kiros teaches and one or more additional conversational turns at other positions in the sequence (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
	Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’ teachings to implement the skip thought model method as taught by Kiros. One would have been motivated to make this modification in order to predict nearby sentences that share semantic syntactic properties. [Abstract, Kiros]

Regarding claim 20, Dai/Vinyals/Kiros teaches The non-transitory computer-readable storage media of claim 19, where Dai further teaches wherein the conversational turns in the supervised training data are a proper subset of the conversational turns in the unsupervised training data (“Another important result from our experiments is that it is possible to use unlabeled data from related tasks to improve the generalization of a subsequent supervised model. For example, using unlabeled data from Amazon reviews to pretrain the sequence autoencoders can improve classification accuracy on Rotten Tomatoes from 79.0% to 83.3%, an equivalence of adding substantially more labeled data. This evidence supports the thesis that it is possible to use unsupervised learning with more unlabeled data to improve supervised learning. With sequence autoencoders, and outside unlabeled data, LSTMs are able to match or surpass previously reported results.” [pg. 1, § 1 Introduction, ¶4; note: Examiner is interpreting a subset of supervised training data to be equivalent to using more unlabeled data from related tasks (i.e. unsupervised training data).  Vinyals discloses conversational turns.]).
Dai, Vinyals, and Kiros are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Kiros’ teachings to substitute Dai’s training data with the conversational modeling data as taught by Vinyals. One would have been motivated to make this modification in order to generate fluent and accurate replies to conversations. [pg. 1, § 1. Introduction, ¶1-3, Vinyals]

Claims 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Dai in view of Vinyals and Kiros and further in view of Mikolov et al. ("Efficient Estimation of Word Representations in Vector Space" cited by Applicant in the IDS filed 08/26/2019, hereinafter "Mikolov").

	Regarding claim 21, Dai/Vinyals/Kiros teaches The method of claim 1, where Kiros further teaches wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, each conversational turn, wherein k is an integer greater than or equal to 1. (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
However Dai/Vinyals/Kiros fails to explicitly teach that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence.
Mikolov teaches that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence (“The second architecture is similar to CBOW, but instead of predicting the current word based on the context, it tries to maximize classification of a word based on another word in the same sentence. More precisely, we use each current word as an input to a log-linear classifier with continuous projection layer, and predict words within a certain range before and after the current word. We found that increasing the range improves quality of the resulting word vectors, but it also increases the computational complexity. Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples.” [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1; See further Figure 1. “ 
    PNG
    media_image1.png
    302
    204
    media_image1.png
    Greyscale
”])
Dai, Vinyals, Kiros, and Mikolov are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. Mikolov teaches method to estimate word representations in a vector space. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’/Kiros’ teachings to substitute the word vectors and implement the skip-gram model as taught by Mikolov to predict additional turns. One would have been motivated to make this modification in order to predict additional turns both before and after a sentence, thereby improving the quality of resulting sentence predictions. [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1, Mikolov]

Regarding claim 22, Dai/Vinyals/Kiros teaches The system of claim 10, where Kiros further teaches wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, each conversational turn, wherein k is an integer greater than or equal to 1. (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
However Dai/Vinyals/Kiros fails to explicitly teach that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence.
Mikolov teaches that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence (“The second architecture is similar to CBOW, but instead of predicting the current word based on the context, it tries to maximize classification of a word based on another word in the same sentence. More precisely, we use each current word as an input to a log-linear classifier with continuous projection layer, and predict words within a certain range before and after the current word. We found that increasing the range improves quality of the resulting word vectors, but it also increases the computational complexity. Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples.” [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1; See further Figure 1. “ 
    PNG
    media_image1.png
    302
    204
    media_image1.png
    Greyscale
”])
Dai, Vinyals, Kiros, and Mikolov are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. Mikolov teaches method to estimate word representations in a vector space. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’/Kiros’ teachings to substitute the word vectors and implement the skip-gram model as taught by Mikolov to predict additional turns. One would have been motivated to make this modification in order to predict additional turns both before and after a sentence, thereby improving the quality of resulting sentence predictions. [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1, Mikolov]

Regarding claim 23, Dai/Vinyals/Kiros teaches The non-transitory computer-readable storage media of claim 19, where Kiros further teaches wherein the turn prediction task is to predict, from an input snippet xt at position t in a sequence, each conversational turn, wherein k is an integer greater than or equal to 1. (“The skip-thoughts model. Given a tuple (si−1, si, si+1) of contiguous sentences, with si the i-th sentence of a book, the sentence si is encoded and tries to reconstruct the previous sentence si−1 and next sentence si+1. In this example, the input is the sentence triplet I got back home. I could see the cat on the steps. This was strange. Unattached arrows are connected to the encoder output. Colors indicate which components share parameters. <eos> is the end of sentence token.” [pg. 2, Figure 1 Caption])
However Dai/Vinyals/Kiros fails to explicitly teach that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence.
Mikolov teaches that is k or fewer positions before the input snippet xt at position t in the sequence and each conversational turn that is k or fewer positions after the input snippet xt at position t in the sequence (“The second architecture is similar to CBOW, but instead of predicting the current word based on the context, it tries to maximize classification of a word based on another word in the same sentence. More precisely, we use each current word as an input to a log-linear classifier with continuous projection layer, and predict words within a certain range before and after the current word. We found that increasing the range improves quality of the resulting word vectors, but it also increases the computational complexity. Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples.” [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1; See further Figure 1. “ 
    PNG
    media_image1.png
    302
    204
    media_image1.png
    Greyscale
”])
Dai, Vinyals, Kiros, and Mikolov are all in the same field of endeavor of sequence to sequence learning. Dai discloses a semi-supervised sequence learning method using a pre-trained unsupervised model to further train other supervised models. Vinyals discloses a neural conversational model that predicts the next sentence in a conversation. Kiros teaches a skip-thought model to predict the next and previous sentences. Mikolov teaches method to estimate word representations in a vector space. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Dai’s/Vinyals’/Kiros’ teachings to substitute the word vectors and implement the skip-gram model as taught by Mikolov to predict additional turns. One would have been motivated to make this modification in order to predict additional turns both before and after a sentence, thereby improving the quality of resulting sentence predictions. [pg. 4, § 3.2 Continuous Skip-gram Model, ¶1, Mikolov]
Response to Arguments
Regarding the 35 U.S.C. §103 rejections:
Applicant’s arguments in regards to independent claims 1, 10, and 19 have been considered but are moot because the amended limitations, in particular “one or more additional conversational turns at other positions in the sequence” are now taught by the newly presented art of Kiros. Please see the updated 103 rejection above. 

Applicant’s arguments regarding the new dependent claims 21, 22, 23 have been considered but are moot because the new claims are taught by newly presented arts of Kiros and Mikolov. Please see the updated 103 rejection.

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122