DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/19/2021 has been entered.

Response to Amendment
The amendment filed 01/19/2021 has been entered. Claims 1-19 remain pending in the application. 

Response to Arguments
Applicant’s arguments, filed 01/19/2001, with respect to the rejections of claims 1, 5 and 12 under 103 have been fully considered and are persuasive because of the amendments. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Al-Rfou et al. (Conversational Contextual Cues: The Case of Personalization and History for Response Ranking) in view of Prakash et al. (Emulating Human Conversations using Convolutional Neural Network-based IR) and further in view of Bengio et al. (US Patent 8,131,786). 


Information Disclosure Statement
The examiner has considered the information disclosure statements (IDS) submitted on 12/08/2020.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Rfou et al. (Conversational Contextual Cues: The Case of Personalization and History for Response Ranking) in view of Prakash et al. (Emulating Human Conversations using Convolutional Neural Network-based IR) and further in view of Bengio et al. (US Patent 8,131,786). 
As per claim 1, Al-Rfou teaches a method implemented by one or more processors, comprising:
	identifying a plurality of positive training instances that each include input features and response features [Figs. 3-4, page 2, Col. 1, 2nd paragraph, “train a deep neural network as a binary classifier to learn the difference between positive, real examples of input / response pairs, and negative, random examples of input / response pairs. The classifier’s probabilities are used as scores to rank the candidates. This ranker will choose the response with the highest score”; page 5, Col. 1, last paragraph, “To construct the training dataset, we form pairs of features and responses … features {I, C, A} (Input message, Context, Author)”], wherein for each of the positive training instances: 
the input features are based on content of an electronic communication [Figs. 3-4, page 5, “Input Message, Context, …], wherein the input features comprise groups of input features [Input message, Context, Author] including at least a first group of input features [Fig. 4 shows Input Message going to hidden layers h1] and a second group of input features [Fig. 4 shows context going to hidden layers h2], each of the groups of input features being based on unique features or unique combinations of features of the electronic communication [page 4, Col. 2, section 4, “message Mi is a sequence of a variable number of words Mi = (wi1, wi2, … wil)”], and 
the response features are based on a reply electronic communication that is a reply to the electronic communication [abstract, “predicting the next response in a conversation”; page 3, Col. 1, 1st paragraph, “retrieve the most suitable response for any input message”]; 
training a scoring model based on the positive training instances [page 5, Col. 1, last paragraph, “To construct the training dataset, we form pairs of features and responses … features {I, C, A} (Input message, Context, Author)”], wherein training the scoring model based on a given instance of the positive training instances comprises [abstract, “train deep neural networks on a large conversational dataset … evaluate our models on the task of predicting the next response in a conversation; Figs. 3-4, page 2, Col. 1, 2nd paragraph, “train a deep neural network as a binary classifier to learn the difference between positive, real examples of input / response pairs, and negative, random examples of input / response pairs”]: 
[Fig. 4 shows applying input features into hidden layers h1 which is parallel to hidden layers h2; page 5, section 4.2, “Figure 3 shows a network that concatenates the previous features into one input vector”];
“…” applying the response features to a first parallel upstream layer of a response neural network model of the scoring model [Fig. 4 shows applying response features into hidden layers h1 which is parallel to hidden layers h2];
Since h1 (or h2) indicating hidden layers, where hidden layers h1 is parallel to hidden layers h2, and both the input and response features are input into h1, thus h1 comprises a first parallel input upstream layer and a first parallel response upstream layer. 
determining a first response score [page 2, Col. 1, 2nd paragraph, “The classifier’s probabilities are used as scores to rank the candidates. This ranker will choose the response with the highest score”];  
 “…” applying the second group of input features to a second parallel input upstream layer of the input neural network model of the scoring model [Fig. 4 shows applying context features (second group of input features) into hidden layers h2 which is parallel to hidden layers h1];
“…” applying the response features to a second parallel response upstream layer of the response neural network model of the scoring model [Fig. 4 shows applying response features into hidden layers h2 which is parallel to hidden layers h1]; 
determining a second response score [page 2, Col. 1, 2nd paragraph, “The classifier’s probabilities are used as scores to rank the candidates. This ranker will choose the response with the highest score”].
Al-Rfou does not explicitly teach
	input upstream layer;

generating a first input vector based on applying the first group of input features to a first parallel input upstream layer of an input neural network model of the scoring model (emphasis added);
generating a first response vector based on applying the response features to a first parallel response upstream layer of a response neural network model of the scoring model (emphasis added);
determining a first response score based on comparison of the first input vector and the first response vector (emphasis added);
generating a second input vector based on applying the second group of input features to a second parallel input upstream layer of the input neural network model of the scoring model (emphasis added);
generating a second response vector based on applying the response features to a second parallel response upstream layer of the response neural network model of the scoring model (emphasis added);
determining a second response score based on comparison of the second input vector and the second response vector (emphasis added); and 
updating the first parallel input upstream layer and the first parallel response upstream layer based on comparison, by an error engine, of the first response score to a given response score indicated by the given instance, the given response score indicated by the given instance being a positive response score; Page 2 of 19Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 11/19/2020 Office Action
updating, by the error engine, the second parallel input upstream layer and the second parallel response upstream layer based on comparison of the second response score to the given response score indicated by the given instance.  
Prakash teaches
input upstream layer [Fig. 3];
[Fig. 3];
generating a first input vector based on applying the first group of input features to a first parallel input upstream layer of an input neural network model of the scoring model [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Al-Rfou teaches applying input features into the first and second input upstream layers which are parallel (Fig. 4), and Prakash teaches the input vector is generated based on applying the input into the input upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation];
generating a first response vector based on applying the response features to a first parallel response upstream layer of a response neural network model of the scoring model [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Al-Rfou teaches applying response features into the first and second response upstream layers which are parallel (Fig. 4), and Prakash teaches the response vector is generated based on applying the response into the response upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation];
determining a first response score based on comparison of the first input vector and the first response vector [section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”];
updating the first parallel input upstream layer and the first parallel response upstream layer [Fig. 3, section 3.3, “a Message model (M-Model) and a Response model (R-Model), that have been obtained after training on M-R pairs”; It can be seen that the models is trained (or retrained) based on the message, response and the scores, also, Fig. 3 shows the message model and the response model each contains multiple layers (including input layer and response layer respectively), therefore, updating the models based on the message, response and scores is updating the layers (including the input and response layers) included in the models. Also, Al-Rfou teaches both the input and response features are input into h1, where h1 comprises a first parallel input upstream layer and a first parallel response upstream layer, therefore, the combination of Al-Rfou and Prakash teach “updating the first parallel input upstream layer and the first parallel response upstream layer”]; Page 2 of 13Patent Application No. 15/476,292 
generating a second input vector based on applying the second group of input features to a second parallel input upstream layer of the input neural network model of the scoring model [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Al-Rfou teaches applying input features into the first and second input upstream layers which are parallel (Fig. 4), and Prakash teaches the input vector is generated based on applying the input into the input upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation];
generating a second response vector based on applying the response features to a second parallel response upstream layer of the response neural network model of the scoring model [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Al-Rfou teaches applying response features into the first and second response upstream layers which are parallel (Fig. 4), and Prakash teaches the response vector is generated based on applying the response into the response upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation];
determining a second response score based on comparison of the second input vector and the second response vector [section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”]; and 
updating the second parallel input upstream layer and the second parallel response upstream layer [Fig. 3, section 3.3, “a Message model (M-Model) and a Response model (R-Model), that have been obtained after training on M-R pairs”; It can be seen that the models is trained (or retrained) based on the message, response and the scores, also, Fig. 3 shows the message model and the response model each contains multiple layers (including input layer and response layer respectively), therefore, updating the models based on the message, response and scores is updating the layers (including the input and response layers) included in the models. Also, Al-Rfou teaches both the context features (second group of input features) and the response are input into hidden layers h2 which is parallel to hidden layers h1, therefore, the combination of Al-Rfou and Prakash teach “updating the first parallel input upstream layer and the first parallel response upstream layer”].   
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of generating a first input vector and a first response vector based on applying the first group of input and response features to the upstream layers of a neural network model of the scoring model, and calculating the response scores of Prakash into the method of predicting the next response in a conversation of Al-Rfou. Doing so would help the system learning the semantic relevance between a message and its response based on the generated input vector and response vector, and outputting one of the highest scoring response (Prakash, Page 3, col. 1).
Al-Rfou and Prakash do not teach
updating the first parallel input upstream layer and the first parallel response upstream layer based on comparison, by an error engine, of the first response score to a given response score indicated by the given instance, the given response score indicated by the given instance being a positive response score (emphasis added); Page 2 of 19Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 11/19/2020 Office Action
updating, by the error engine, the second parallel input upstream layer and the second parallel response upstream layer based on comparison of the second response score to the given response score indicated by the given instance (emphasis added).  
	Bengio teaches
[the scoring model modifier 312], of the first response score [the candidate image having the highest score] to a given response score indicated by the given instance [score for the first image], the given response score indicated by the given instance being a positive response score [Col. 2, lines 5-12 to Col. 6, lines 1-10, “in response to receiving the query through a search interface, identifying a plurality of images responsive to the query; applying the scoring model to each of the plurality of images to determine a respective score for each image; and presenting images … wherein the images are presented in an order according to the respective score for each image”; Col. 2, lines 65-66, “Images include, for example, still images, video, and other visual content”; Col. 5, lines 20-51, “The system selects a first image from … the positive group of images … and applies a scoring model for the query to the first image to determine a score for the first image … The system selects candidate images from the other group of images … applies the scoring model to each of the candidate images, and then selects the candidate image having the highest score … The system then subtracts the score of the image selected from the negative group of images from the score of the image selected from the positive group of images, and compares the difference to a threshold … if the difference does not exceed the threshold, the scoring model needs to be updated”; Col. 9, lines 23-24, “instructs the scoring model modifier 312 to update the scoring model”; since Al-Rfou (as modified) teaches updating the model which including the parallel input and response layers, and Bengio teaches the scoring model is updated based on the comparing the scores of the images responsive to the query, thus, the combination of Al-Rfou and Bengio read on the above limitation]; Page 2 of 19Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 11/19/2020 Office Action
updating, by the error engine [the scoring model modifier 312], the second parallel input upstream layer and the second parallel response upstream layer based on comparison of the second response score to the given response score indicated by the given instance [Col. 2, lines 5-12 to Col. 6, lines 1-10, “in response to receiving the query through a search interface, identifying a plurality of images responsive to the query; applying the scoring model to each of the plurality of images to determine a respective score for each image; and presenting images … wherein the images are presented in an order according to the respective score for each image”; Col. 2, lines 65-66, “Images include, for example, still images, video, and other visual content”; Col. 5, lines 20-51, “The system selects a first image from … the positive group of images … and applies a scoring model for the query to the first image to determine a score for the first image … The system selects candidate images from the other group of images … applies the scoring model to each of the candidate images, and then selects the candidate image having the highest score … The system then subtracts the score of the image selected from the negative group of images from the score of the image selected from the positive group of images, and compares the difference to a threshold … if the difference does not exceed the threshold, the scoring model needs to be updated”; Col. 9, lines 23-24, “instructs the scoring model modifier 312 to update the scoring model”; since Al-Rfou (as modified) teaches updating the model which including the second parallel input and response layers, and Bengio teaches the scoring model is updated based on the comparing the scores of the images responsive to the query, thus, the combination of Al-Rfou and Bengio read on the above limitation]. Page 2 of 19Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of updating the scoring model based on comparison, by an error engine, of the first response score to a given response score of Bengio into the method of predicting the next response in a conversation of Al-Rfou. Doing so would help optimizing the scoring model during training, so that the scoring model is most accurate at scoring the most highly ranked images (Bengio, Col. 4, lines 15-17).

As per claim 19, Al-Rfou, Prakash and Bengio teach the method of claim 1.
Al-Rfou further teaches
[Fig. 4, section 4.3 disclose applying the input features into hidden layers h1 which is parallel to hidden layers h2, applying context features (second group of input features) into hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second input to input downstream layers); since Al-Rfou teaches applying the first and second input to input downstream layers h4, while Prakash teaches generating an input vector based on applying the input features to the input layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; 
generating an overall response vector based on application of the first and second input vectors to response downstream layers of the input neural network model of the scoring model [Fig. 4, section 4.3 disclose applying the response features into parallel hidden layers h1 and hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second response to input downstream layers); since Al-Rfou teaches applying the first and second response to input downstream layers h4, while Prakash teaches generating a response vector based on applying the response features to a response layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; 
Prakash teaches
determining a response score based on comparison of the overall input vector and the overall response vector [section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”]; and 
Bengio teaches
updating the input downstream layers and the response downstream layers based on comparison of the response score to the given response score indicated by the given instance [Col. 2, lines 5-12 to Col. 6, lines 1-10, “in response to receiving the query through a search interface, identifying a plurality of images responsive to the query; applying the scoring model to each of the plurality of images to determine a respective score for each image; and presenting images … wherein the images are presented in an order according to the respective score for each image”; Col. 2, lines 65-66, “Images include, for example, still images, video, and other visual content”; Col. 5, lines 20-51, “The system selects a first image from … the positive group of images … and applies a scoring model for the query to the first image to determine a score for the first image … The system selects candidate images from the other group of images … applies the scoring model to each of the candidate images, and then selects the candidate image having the highest score … The system then subtracts the score of the image selected from the negative group of images from the score of the image selected from the positive group of images, and compares the difference to a threshold … if the difference does not exceed the threshold, the scoring model needs to be updated”; Col. 9, lines 23-24, “instructs the scoring model modifier 312 to update the scoring model”; since Al-Rfou (as modified) teaches updating the model which including the parallel input and response layers, and Bengio teaches the scoring model is updated based on the comparing the scores of the images responsive to the query, thus, the combination of Al-Rfou and Bengio read on the above limitation]; Page 2 of 13Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 05/21/2020 Office Action 
claim 19 is rejected using the same rationale as claim 1.

Claims 2, 3, 5, 7 and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Rfou et al. in view of Prakash et al. in view of Bengio et al. and further in view of Yamaguchi et al. (US Pub. 2018/0166077).
As per claim 2, Al-Rfou, Prakash and Bengio teach the method of claim 1.
Prakash further teaches
identifying an additional response [Fig. 3]; 
[section 3.3, "Fig. 3 shows two models, a Message model (M-Model) and a Response model (R-Model), that have been obtained after training on M-R pairs. When a text X is forward propagated through either of the models, it is vectorized in the space defined by that model" teaches an additional response vector can be sent into the response model of Fig. 3]; 
Al-Rfou, Prakash and Bengio do not teach
storing, in one or more computer readable media, an association of the additional response vector to the additional response.  
Yamaguchi teaches 
storing, in one or more computer readable media, an association of the additional response vector to the additional response [paragraph 0008, "employ the language accumulated in the dialog log database as response data in a response database that stores language to be used for a response to the speaker's spoken language" teaches response vectors stored in a response database].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of storing an association of the additional response vector to the additional response of Yamaguchi into the method of presenting human-like responses in on- going conversation with a user of Prakash. Doing so would help quickly retrieving of response vectors.

As per claim 3, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 2.
Prakash further teaches
receiving new input features of a new electronic communication, the new electronic communication directed to a user and generated subsequent to the storing; generating a new input  [Fig. 3 teaches a model for receiving an input and generating an input vector]; 
generating a response score that indicates a likelihood that the additional response is an appropriate response for the new electronic communication, wherein generating the response score is based on comparison of the new input vector to the additional response vector stored in association with the additional response [section 3.3 "We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X" teaches a score based on the response vector and input vector]; and 
based on the response score, providing to a client device of the user a suggestion to include the additional response in a reply to the new electronic communication, wherein the additional response is provided to the client device based on the new electronic communication being directed to the user [section 3 "the system retrieves and ranks the candidates by relevance and outputs one of the highest scoring responses" teaches ranking and providing responses based on the response score].
claim 3 is rejected using the same rationale as claim 1.

As per claim 5, Al-Rfou teaches a method implemented by one or more processors, comprising: 
	identifying input features of a transmitted electronic communication directed to a user [Figs. 3-4, page 5, “Input Message, Context, …; page 4, Col. 2, section 4, “message Mi is a sequence of a variable number of words Mi = (wi1, wi2, … wil)”]; 
	applying the input features to parallel input upstream layers of a trained input neural network model [Figs. 3-4 shows applying input features into the model; abstract, “train deep neural networks on a large conversational dataset … evaluate our models on the task of predicting the next response in a conversation]; 
[Fig. 4 shows applying input features into hidden layers h1 which is parallel to hidden layers h2; page 5, section 4.2, “Figure 3 shows a network that concatenates the previous features into one input vector”], Page 4 of 19Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 11/19/2020 Office Action 
wherein the trained input neural network model is trained based on backpropagation that is based on errors during training by [section 5.2, "The derivatives are estimated using the backpropagation algorithm" teaches using backpropagation to train the models]: 
for each of the parallel input upstream layers, applying training input features of a group of training input features associated with the input upstream layer to the associated input upstream layer [Fig. 4 shows applying input features into hidden layers h1 which is parallel to hidden layers h2, and applying context features (second group of input features) into hidden layers h2]; 
for each of a plurality of parallel response upstream layers of a response neural network model, applying response features to the associated response upstream layer [Fig. 4 shows applying response features into the parallel hidden layers h2 and h1]; 
determining a first response score [page 2, Col. 1, 2nd paragraph, “The classifier’s probabilities are used as scores to rank the candidates. This ranker will choose the response with the highest score”];
determining a response to provide for inclusion in a reply electronic communication that is a reply by the user to the electronic communication [abstract, “evaluate our models on the task of predicting the next response in a conversation”; page 5, section 4.1, “our task is to select the best response out of a pool of random candidates”];   
Al-Rfou does not explicitly teach
	input upstream layer;
response upstream layer;
generating an input vector over the trained input neural network model based on applying the input features to the parallel input upstream layers of the trained input neural network model (emphasis added);
for each of the parallel input upstream layers, generating a training input vector based on applying training input features of a group of training input features associated with the input upstream layer to the associated input upstream layer (emphasis added);
for each of a plurality of parallel response upstream layers of a response neural network model, generating a response vector based on applying response features to the associated response upstream layer (emphasis added);
for each of multiple pairs each comprising a corresponding one of the input vectors and a corresponding one of the response vectors, determining a response score based on comparison of the corresponding one of the input vectors and the corresponding one of the response vectors; and 
for each of the multiple pairs, updating both the input upstream layer and the response upstream layer used to generate the pair, the updating based on comparison, by an error engine, of the response score for the pair to a given response score indicated by the given instance the given response score indicated by the given instance being a positive response score; and 
wherein determining the response is based on comparison of the input vector to a pre-stored value stored in association with the response prior to transmission of the electronic communication, the pre-stored value being generated based on applying response features of the response to the response neural network model.  
Prakash teaches
input upstream layer [Fig. 3];
response upstream layer [Fig. 3];
[Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Prakash teaches the input vector is generated based on applying the input to the input upstream layer, and Al-Rfou teaches the first and second input layers are parallel (Fig. 4), thus the combination of Al-Rfou and Prakash read on the above limitation];
for each of the parallel input upstream layers, generating a training input vector based on applying training input features of a group of training input features associated with the input upstream layer to the associated input upstream layer [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RX when it is vectorized using the R-Model”; since Al-Rfou teaches applying input features into the first and second input upstream layers which are parallel (Fig. 4), and Prakash teaches the input vector is generated based on applying the input into the input upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation]
for each of a plurality of parallel response upstream layers of a response neural network model, generating a response vector based on applying response features to the associated response upstream layer [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RY when it is vectorized using the R-Model”; since Al-Rfou teaches applying response features into the first and second response upstream layers which are parallel (Fig. 4), and Prakash teaches the response vector is generated based on applying the response into the response upstream layer, thus the combination of Al-Rfou and Prakash read on the above limitation];
for each of multiple pairs each comprising a corresponding one of the input vectors and a corresponding one of the response vectors, determining a response score based on comparison of the corresponding one of the input vectors and the corresponding one of the response vectors [Fig. 3, section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”];
for each of the multiple pairs, updating both the input upstream layer and the response upstream layer used to generate the pair [Fig. 3, section 3.3, “a Message model (M-Model) and a Response model (R-Model), that have been obtained after training on M-R pairs”; It can be seen that the models is trained (or retrained) based on the message, response and the scores, also, Fig. 3 shows the message model and the response model each contains multiple layers (including input layer and response layer respectively), therefore, updating the models based on the message, response and scores is updating the layers (including the input and response layers) included in the models. Also, Al-Rfou teaches both the input and response features are input into h1, where h1 comprises a first parallel input upstream layer and a first parallel response upstream layer, therefore, the combination of Al-Rfou and Prakash teach “updating the first parallel input upstream layer and the first parallel response upstream layer”];  
determining the response is based on comparison of the input vector to a pre-stored value “…”, Response to 05/21/2020 Office Actionthe pre-stored value being generated based on applying response features of the response to the response neural network model [section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”].  
same rationale as claim 1.
Al-Rfou and Prakash do not teach
updating both the input upstream layer and the response upstream layer used to generate the pair, the updating based on comparison, by an error engine, of the response score for the pair to a given response score indicated by the given instance the given response score indicated by the given instance being a positive response score;

Yamaguchi teaches 
pre-stored value stored in association with the response prior to transmission ofPage 5 of 13Patent Application No. 15/476,292Attorney Docket No. ZS202-17807Response to 05/21/2020 Office Action the electronic communication [paragraph 0008, “employ the language accumulated in the dialog log database as response data in a response database that stores language to be used for a response to the speaker's spoken language” teaches response vectors stored in a response database];
same rationale as claim 2.
Al-Rfou, Prakash and Yamaguchi do not teach
updating both the input upstream layer and the response upstream layer used to generate the pair, the updating based on comparison, by an error engine, of the response score for the pair to a given response score indicated by the given instance the given response score indicated by the given instance being a positive response score;
Bengio teaches
updating both the input upstream layer and the response upstream layer used to generate the pair, the updating based on comparison, by an error engine [the scoring model modifier 312], of the response score [the candidate image having the highest score] for the pair to a given response score indicated by the given instance [score for the first image] the given response score indicated by the given instance being a positive response score [Col. 2, lines 5-12 to Col. 6, lines 1-10, “in response to receiving the query through a search interface, identifying a plurality of images responsive to the query; applying the scoring model to each of the plurality of images to determine a respective score for each image; and presenting images … wherein the images are presented in an order according to the respective score for each image”; Col. 2, lines 65-66, “Images include, for example, still images, video, and other visual content”; Col. 5, lines 20-51, “The system selects a first image from … the positive group of images … and applies a scoring model for the query to the first image to determine a score for the first image … The system selects candidate images from the other group of images … applies the scoring model to each of the candidate images, and then selects the candidate image having the highest score … The system then subtracts the score of the image selected from the negative group of images from the score of the image selected from the positive group of images, and compares the difference to a threshold … if the difference does not exceed the threshold, the scoring model needs to be updated”; Col. 9, lines 23-24, “instructs the scoring model modifier 312 to update the scoring model”; since Al-Rfou (as modified) teaches updating the model which including the parallel input and response layers, and Bengio teaches the scoring model is updated based on the comparing the scores of the images responsive to the query, thus, the combination of Al-Rfou and Bengio read on the above limitation];
claim 5 is rejected using the same rationale as claim 1.

As per claim 7, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 5.
Prakash further teaches
the pre-stored value is a response vector generated based on applying the response features to the response neural network model [section 3.3 "We use the notation MX for the vector when X is vectorized using the M-model and RX when it is vectorized using the R-Model" teaches applying the response and generating a response vector]. 
claim 7 is rejected using the same rationale as claim 1.

As per claim 9, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 5.
Al-Rfou further teaches
[Fig. 4 shows applying input features into hidden layers h1]; and 
applying a second group of the input features to second input layers of the trained input neural network model [Fig. 4 shows applying context features (second group of input features) into hidden layers h2].  

As per claim 10, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 9.
Al-Rfou further teaches
generating a first input vector based on applying the first group of the input features to the first input layers [Fig. 4 shows applying input features into hidden layers h1]; 
generating a second input vector based on applying the second group of the input features to the second input layers [Fig. 4 shows applying context features (second group of input features) into hidden layers h2; Al-Rfou teaches applying the second group of input features to a second input layer, where Prakash teaches generating an input vector based on applying the input features to the input upstream layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; Page 6 of 13Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 05/21/2020 Office Action 
applying, to downstream input layers of the input neural network model, a combined input vector that is based on the first input vector and the second input vector [Fig. 4, section 4.3 disclose applying the input features into hidden layers h1 which is parallel to hidden layers h2, applying context features (second group of input features) into hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second input to input downstream layers); since Al-Rfou teaches applying the first and second input to input downstream layers h4, while Prakash teaches generating an input vector based on applying the input features to the input layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; and 
generating the input vector over the downstream input layers based on the combined input vector [Fig. 4, section 4.3 disclose applying the input features into hidden layers h1 which is parallel to hidden layers h2, applying context features (second group of input features) into hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second input to input downstream layers); since Al-Rfou teaches applying the first and second input to input downstream layers h4, while Prakash teaches generating an input vector based on applying the input features to the input layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation].  

As per claim 11, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 9.
Prakash further teaches
generating the pre-stored value based on applying response features of the response to the response neural network model [section 3.3 "We use the notation MX for the vector when X is vectorized using the M-model and RX when it is vectorized using the R-Model" teaches applying the response and generating a response vector];
 Yamaguchi further teaches
storing, in one or more computer readable media, the pre-stored value in association with the response [Paragraph 0008, "employ the language accumulated in the dialog log database as response data in a response database that stores language to be used for a response to the speaker's spoken language" teaches response vectors stored in a response database].  
claim 11 is rejected using the same rationale as claim 2.

Claims 4, 6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Al-Rfou et al. in view of Prakash et al. in view of Bengio et al. in view of Yamaguchi et al. and further in view of Zhang et al. (US Pub. 2015/0169991).
As per claim 4, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 3.
Al-Rfou, Prakash, Bengio and Yamaguchi do not teach
the comparison of the new input vector to the response vector is a dot product of the new input vector and the response vector.
Zhang teaches
the comparison of the new input vector to the response vector is a dot product of the new input vector and the response vector [paragraph 0022, “Search results 111 can be ranked based on scores related to the resources 105 identified by the search results 111, such as information retrieval ("IR") scores … the IR scores are computed from dot products of a feature vector corresponding to a search query 109 and feature vectors of resources 105”];
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have included the comparison of the new input vector to the response vector is a dot product of the new input vector and the response vector of Zhang into the method of presenting human-like responses in on- going conversation with a user of Prakash. Doing so would help determining the search result scores.

As per claim 6, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 5.
Prakash teaches in section 2 “form a candidate set by taking their corresponding replies and rank them to display an appropriate response”;
Al-Rfou, Prakash, Bengio and Yamaguchi do not explicitly teach

Zhang teaches
providing the response for display in an interface rendered by a client device of the user, the interface enabling selection of the response for inclusion in the reply electronic communication [paragraph 0021, “The user devices 106 can receive the search results 111, e.g., in the form of one or more web pages, and render the web pages for presentation to users. In response to the user selecting a link in a search result 111 at a user device 106, the user device 106 requests the resource 105 identified by the link”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of providing the response for display in an interface rendered by a client device of the user, the interface enabling selection of the response for inclusion in the reply electronic communication of Zhang into the method of presenting human-like responses in on- going conversation with a user of Prakash. Doing so would help the user in selecting a search result from the set of generated search results provided to the user device.

As per claim 8, Al-Rfou, Prakash, Bengio and Yamaguchi teach the method of claim 7.
Al-Rfou, Prakash, Bengio and Yamaguchi do not teach
the comparison of the input vector to the pre-stored value is a dot product of the input vector and the response vector.  
Zhang teaches
the comparison of the input vector to the pre-stored value is a dot product of the input vector and the response vector [paragraph 0022, “Search results 111 can be ranked based on scores related to the resources 105 identified by the search results 111, such as information retrieval ("IR") scores … the IR scores are computed from dot products of a feature vector corresponding to a search query 109 and feature vectors of resources 105”];
claim 8 is rejected using the same rationale as claim 4.

Claims 12-15 are rejected under 35 U.S.C. 103 as being unpatentable over Prakash et al. (Emulating Human Conversations using Convolutional Neural Network-based IR) in view of Yamaguchi et al. (US Pub. 2018/0166077) and further in view of Zhang et al. (US Pub. 2015/0169991).
As per claim 12, Prakash teaches a method implemented by one or more processors, comprising:
identifying input features of a transmitted electronic communication directed to a user from another user, the transmitted electronic communication being an email, a text message, a chat message, or an instant message [section 3.1.1, “We constructed a dataset of 17.62 million tweet conversational pairs (tweets and their responses)”; It can be seen the dataset (training data) including tweet (input features) which is an electronic communication; section 3.3, “utilizing M-R pairs from Twitter (described in 3.1.1 above) as training data, where the user message is treated as query”]; 
applying the input features to a trained input neural network model [Fig. 3 shows applying the input message X into M-model]; 
generating an input vector over the trained input neural network model based on applying the input features to the trained input neural network model [Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RX when it is vectorized using the R-Model”];
determining a response score for a candidate response to the transmitted electronic communication [Fig. 3, section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”], wherein determining the response score comprises: 
determining the response score “…” of the input vector and a response vector stored in association with the candidate response [Fig. 3, section 3.3, “Semantic Relevance Score(X,Y): We denote this score as SemRel(X, Y). This is the confidence of semantic relevance of Y as a response to X”]; 
determining, based on the response score, to provide the candidate response for inclusion in a reply electronic communication that is a reply by the user to the electronic communication [section 3, “We model the task of providing appropriate chat responses … where for a given user message M and context C, the system retrieves and ranks the candidates by relevance and outputs one of the highest scoring responses”];  Page 7 of 13Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 05/21/2020 Office Action
Prakash does not teach
determining the response score based on a dot product of the input vector and a response vector (emphasis added).
providing, before the user has provided any user interface input to indicate a desire to reply to the transmitted electronic communication, the candidate response and at least one additional candidate response for display in a user interface rendered by a client device of the user, the user interface enabling selection of any one of the candidate responses for inclusion in the reply electronic communication.  
the response vector being stored in association with the candidate response prior to transmission of the electronic communication; 
Yamaguchi teaches 
the response vector being stored in association with the candidate response prior to transmission of the electronic communication [paragraph 0008, "employ the language accumulated in the dialog log database as response data in a response database that stores language to be used for a response to the speaker's spoken language" teaches response vectors stored in a response database].
same rationale as claim 5.
Prakash and Yamaguchi do not teach
determining the response score based on a dot product of the input vector and a response vector;
providing, before the user has provided any user interface input to indicate a desire to reply to the transmitted electronic communication, the candidate response and at least one additional candidate response for display in a user interface rendered by a client device of the user, the user interface enabling selection of any one of the candidate responses for inclusion in the reply electronic communication.  
Zhang teaches
determining the response score based on a dot product of the input vector and a response vector [paragraph 0022, “Search results 111 can be ranked based on scores related to the resources 105 identified by the search results 111, such as information retrieval ("IR") scores … the IR scores are computed from dot products of a feature vector corresponding to a search query 109 and feature vectors of resources 105”];
providing, before the user has provided any user interface input to indicate a desire to reply to the transmitted electronic communication, the candidate response and at least one additional candidate response for display in a user interface rendered by a client device of the user, the user interface enabling selection of any one of the candidate responses for inclusion in the reply electronic communication [paragraph 0021, “The user devices 106 can receive the search results 111, e.g., in the form of one or more web pages, and render the web pages for presentation to users. In response to the user selecting a link in a search result 111 at a user device 106, the user device 106 requests the resource 105 identified by the link”].  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the process of determining the response score based on a dot product of the input vector and a response vector, and providing the candidate response and at least one additional candidate response for display in a user interface rendered by a client device of the user, the user interface enabling selection of any one of the candidate responses of Zhang into the method of presenting human-like responses in on- going conversation with a user of Prakash. Doing so would help the user in selecting a search result from the set of generated search results provided to the user device.

As per claim 13, Prakash, Yamaguchi and Zhang teach the method of claim 12.
Zhang further teaches
providing the candidate response for display in an interface rendered by a client device of the user, the interface enabling selection of the response for inclusion in the reply electronic communication [paragraph 0021, “The user devices 106 can receive the search results 111, e.g., in the form of one or more web pages, and render the web pages for presentation to users. In response to the user selecting a link in a search result 111 at a user device 106, the user device 106 requests the resource 105 identified by the link”].  
claim 13 is rejected using the same rationale as claim 12.

As per claim 14, Prakash, Yamaguchi and Zhang teach the method of claim 13.
Prakash further teaches
[Fig. 3, section 3.3, “We use the notation MX for the vector when X is vectorized using the M-model and RX when it is vectorized using the R-Model”]; 

As per claim 15, Prakash, Yamaguchi and Zhang teach the method of claim 14.
Prakash further teaches
the response neural network model is separate from the trained input neural network model, but was trained cooperatively with the trained input neural network model based on errors that were a function of both models [Fig. 3, section 3.3 teaches a separate input and response model trained cooperatively].

Claims 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Prakash et al. in view of Yamaguchi et al. in view of Zhang et al. and further in view of Al-Rfou et al. (Conversational Contextual Cues: The Case of Personalization and History for Response Ranking).
As per claim 16, Prakash, Yamaguchi and Zhang teach the method of claim 12.
Prakash, Yamaguchi and Zhang do not teach
applying a first group of the input features to first input layers of the trained input neural network model; and 
applying a second group of the input features to second input layers of the trained input neural network model.  
Al-Rfou teaches
applying a first group of the input features to first input layers of the trained input neural network model [Fig. 4 shows applying input features into hidden layers h1 which is parallel to hidden layers h2]; and 
[Fig. 4 shows applying context features (second group of input features) into hidden layers h2 which is parallel to hidden layers h1]; 
claim 16 is rejected using the same rationale as claim 1.

As per claim 17, Prakash, Yamaguchi, Zhang and Al-Rfou teach the method of claim 16.
Al-Rfou further teaches
generating a first input vector based on applying the first group of the input features to the first input layers [Fig. 4 shows applying input features into hidden layers h1]; 
generating a second input vector based on applying the second group of the input features to the second input layers [Fig. 4 shows applying context features (second group of input features) into hidden layers h2; Al-Rfou teaches applying the second group of input features to a second input layer, where Prakash teaches generating an input vector based on applying the input features to the input upstream layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; Page 6 of 13Patent Application No. 15/476,292 Attorney Docket No. ZS202-17807 Response to 05/21/2020 Office Action 
applying, to downstream input layers of the input neural network model, a combined input vector that is based on the first input vector and the second input vector [Fig. 4, section 4.3 disclose applying the input features into hidden layers h1 which is parallel to hidden layers h2, applying context features (second group of input features) into hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second input to input downstream layers); since Al-Rfou teaches applying the first and second input to input downstream layers h4, while Prakash teaches generating an input vector based on applying the input features to the input layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation]; and 
[Fig. 4, section 4.3 disclose applying the input features into hidden layers h1 which is parallel to hidden layers h2, applying context features (second group of input features) into hidden layers h2, then “concatenates the hidden layers from the previous networks, [h1; h2; h3], to produce a final hidden layer h4” (application of the first and second input to input downstream layers); since Al-Rfou teaches applying the first and second input to input downstream layers h4, while Prakash teaches generating an input vector based on applying the input features to the input layer (Fig. 3, section 3.3), therefore, the combination of Prakash and Al-Rfou read on the above limitation].  
 claim 17 is rejected using the same rationale as claim 1.

As per claim 18, Prakash, Yamaguchi, Zhang and Al-Rfou teach the method of claim 16.
Al-Rfou further teaches
none of the input features of the second group are applied to the first input layers and wherein none of the input features of the first group are applied to the second input layers [Fig. 4 teaches completely separate inputs for the two groups between input (i.e. first group) and context (i.e. second group)].

Prior Art

The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Wu (US Pub. 2018/0174020) describes a methods for emotionally intelligent automated chatting by determining a context and an emotion of a conversation with a user.
Allen et al. (US Pub. 2014/0214960) describes a system for targeting a message to members of a social network.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TRI T NGUYEN whose telephone number is 571-272-0103.  The examiner can normally be reached on M-F, 8 AM-5 PM, (CT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/T. N./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123