DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The present application is being examined under the 06/14/2018.
Claims 1-20 are rejected.
Claims 1-20 are pending.

Drawings
The Drawings filed on 06/14/2018 are acceptable for examination purposes.

Specification
The Specification filed on 06/14/2018 is acceptable for examination purposes.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 6-12, 15, 16, 19, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Campbell et al. (hereinafter Campbell) US 10387463 B2.
In reference to claim 1. Campbell teaches a method of training a dialogue management system to converse with an end user, the method comprising:
“retrieving training dialogue data” (Campbell in at least Col. 1 line 27 to Col. 2 line 3 “receiving as inputs, an upper limit of dialog turns and training data, in which the training data includes a series of user utterances for a dialog and a series of responsive system utterances for the dialog”);
“generating a first observation from the training dialogue data” (Campbell in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior 
“feeding the first observation into a neural network to generate a first set of predicted outputs, the first set of predicted outputs including a predicted value” (Campbell in at least Fig. 4, Fig. 8, and Col. 14 line 60 Col. 16 line 7 “belief tracker component 408 implements a neural network (e.g., a recurrent neural network) 65 that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances” and “The belief tracker component 408 receives the feature vector xt and the dialog history of the current dialog (e.g., encoded by ht-1, which is a hidden vector) and outputs for each slot a probability distribution vector over the columns of the tables of the database”. Examiner notes that a probability distribution vector is a prediction);
“selecting a first recommended action for the dialogue management system according to the first set of predicted outputs” (Campbell in at least Fig. 8, Col. 15 lines 12-21, and Col. 16 lines 14-28 “The dialog manager component 412 is configured to receive the output probability distribution vector, select an action from an action space based at least in part on a dialog management policy, and provide the selected action to a dialog generator. The dialog management policy maps belief states of a state space to actions of an action space. In some embodiments of the invention, the dialog management policy is a predetermined policy which is updated over time via machine learning (e.g., supervised learning component 
“using the first recommended action to generate a second observation from the training dialogue data” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) “the user sought to find information via dialog 500 regarding players who played in a golf tournament. In this dialog 500, the dialog handler component 402 generated responsive system utterances 504a, 504b that proactively prompt the user towards efficient queries by posing questions to the user. After narrowing down the user's query, the dialog handler component 402 generated a responsive system utterance 504c by returning information from the database 410 that answers the user's query. In this example, dialog 500 includes three dialog turns, in which 502a and 504a includes a first dialog tum, 502b and 504b includes a second dialog turn, and 502c and 504c includes a third dialog turn”);
“generating a reward using the second observation” (Campbell, as mentioned above, in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the turn and on number of dialog turns elapsed”);
“calculating a target value, wherein the target value depends on the reward” (Campbell in at least Col. 11 line 59 to Col. 12 line 67 “the reward function can be represented in the following manner […] where r is the rank of correct answer. In other words, in some embodiments of the present invention, the reward function returns a value of -0.1 for each turn that does not return an answer. For example, consider a dialog comprising two turns in which the dialog provides an incorrect answer at the second turn. At turn one of the example two-tum dialog, the reward function would return a reward value of -0.1 as the dialog has not yet failed. At tum two of the example two-tum dialog, the reward function would return a value of (-0.1)+ (-1) as an incorrect answer was provided. In another embodiment, a discount factor, y between 0 and 1, is multiplied to rewards of turns>1. Hence, the accumulative reward is (-0.1)+(-1ɣ)=-0.1-ɣ. In some embodiments of the present invention, the dialog fails if a maximum number of dialog turns have been exceeded or if the user has desired an intent to terminate the dialog. If the system utterance provided by the dialog handler 402 returns an answer to the user's query (e.g., "user target"), and if the answer is within the top R number of results that have been previously provided by an human agent (e.g., top 5, top 10, top 15, top 20, etc.), then the output value of the reward function would be inversely proportional to the rank r of the answer (e.g., 1-((r-1)/R))”; see omitted formula which can be found in Col. 12 lines 20-25);
“using the target value and the predicted value to update parameters of the neural network” (Campbell in at least Col. 11 line 59 to Col. 12 line 67, and Col. 15 lines 12-21 “the dialog management policy is an updating version of the predetermined dialog management policy that is learned through machine learning” and “the dialog management policy is a updated over time via machine learning (e.g., supervised learning component 602 and/or reinforcement learning component 702)”); and
“feeding the second observation into the neural network to generate a second recommended action” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) with at least a second recommended action).

In reference to claim 2. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method includes:
Campbell further discloses:
“identifying a current task, calculating a completion percentage of the current task, and wherein the reward is calculated as a function of the completion percentage of the current task” (Campbell in at least Col. 11 line 59 to Col. 12 line 67 discloses how the reward function works. Examiner notes that the reward function is calculated as a function of the completion percentage. The completion percentage is either 0% or 100%, 0% being not answered in the current step or incorrect answer, and 100% being correct answer).

In reference to claim 6. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method includes:
Campbell further discloses:
“using an action screener to constrain the number of predicted outputs that need to be calculated by the neural network” (Campbell in at least Fig. 8 and Col. 16 lines 13-40 which discloses the disambiguate action to disambiguate the user query. Examiner notes that 

In reference to claim 7. Campbell teaches the method according to claim 6 (as mentioned above), wherein the step of using the action screener further includes:
Campbell further discloses:
“feeding information related to the first observation into a classification module and outputting a classification score for each action in a set of actions” (Campbell in at least Fig. 4, Fig. 8, and Col. 14 line 60 Col. 16 line 54 “The dialog manager component 412 is configured to receive the output probability distribution vector, select an action from an action space based at least in part on a dialog management policy, and provide the selected action to a dialog generator. The dialog management policy maps belief states of a state space to actions of an action space. In some embodiments of the invention, the dialog management policy is a predetermined policy which is updated over time via machine learning (e.g., supervised learning component 602 and/or reinforcement learning component 702)” and “the action space of the dialog management policy of the dialog manager component 412 includes an inform action, a feature recommendation action, a query action, a welcome action, and a disambiguate action. The inform action returns an answer from the database, the feature recommendation provides details regarding attributes from the tables, the query action queries the database, the disambiguate action asks a question to the user to disambiguate the user query, and the welcome action generates a welcome message to be sent to the user”. Examiner notes that the probability distribution vector is a classification over the columns of the multiple tables of the database. The action from the action space is based on a dialog management policy which maps belief 
“retrieving a classification threshold” (Campbell in at least Col. 17 lines 9-35 “the policy is defined over the new state space such that if the query complexity dT is greater than some threshold dialog complexity value, the dialog handler component 402 would be configured to not query the database but rather generate a system utterance that asks a clarification question to the user, whereas if the returned value of the dialog complexity estimator is less than or equal to the threshold, the dialog handler component 402 would be configured to query the database and generate a responsive system utterance based at least in part on a result of the query”);
“generating a subset of actions from the set of actions, wherein the subset of actions includes actions for which the classification score is greater than the classification threshold” (Campbell in at least Col. 17 lines 9-35 “the policy is defined over the new state space such that if the query complexity dT is greater than some threshold dialog complexity value, the dialog handler component 402 would be configured to not query the database but rather generate a system utterance that asks a clarification question to the user, whereas if the returned value of the dialog complexity estimator is less than or equal to the threshold, the dialog handler component 402 would be configured to query the database and generate a responsive system utterance based at least in part on a result of the query”); and
“constraining the number of predicted outputs that need to be calculated by the neural network according to the subset of actions” (Campbell in at least Fig. 8 and Col. 16 lines 13-40 which discloses the disambiguate action to disambiguate the user query. Examiner notes 

In reference to claim 8. Campbell teaches the method according to claim 7 (as mentioned above), wherein:
Campbell further discloses:
“the classification scores output by the classification module are used to calculate the reward” (Campbell in at least Col. 16 line 55 to Col. 18 line 38 “the hybrid approach includes augmenting a state space of the dialog management policy to include an additional dimension that corresponds to the returned value, and then modifying the reward function by specifying the reward function over the augmented state space and subtracting the output of the reward function by a function of the return value of the dialog complexity estimator”. Examiner notes that the cited section provides further examples on how to calculate the reward based on the classification scores as mentioned above).

In reference to claim 9. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method further includes steps of:
Campbell further discloses:
“feeding the second observation into a second neural network to generate a second set of predicted outputs and wherein the predicted value from the second set of predicted outputs is used with the reward to calculate the target value” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) “the user sought to find information via dialog 500 regarding players who played in a golf tournament. In this dialog 500, the dialog handler the dialog handler 402 returns an answer to the user's query (e.g., "user target"), and if the answer is within the top R number of results that have been previously provided by an human agent (e.g., top 5, top 10, top 15, top 20, etc.), then the output value of the reward function would be inversely proportional to the rank r of the answer (e.g., 1-((r-1)/R))”; see omitted formula which can be found in Col. 12 lines 20-25. Examiner notes that the system can be implemented with one of more neural networks, see at least Col. 13 lines 1-31 and 

In reference to claim 10. Campbell teaches the method according to claim 9 (as mentioned above), wherein:
Campbell further discloses:
“the second neural network is updated less frequently than the first neural network” (Campbell in at least Fig. 4, Fig. 8, and Col. 14 line 60 Col. 16 line 7 “belief tracker component 408 implements a neural network (e.g., a recurrent neural network) 65 that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances” and “The belief tracker component 408 receives the feature vector xt and the dialog history of the current dialog (e.g., encoded by ht-1, which is a hidden vector) and outputs for each slot a probability distribution vector over the columns of the tables of the database”. Examiner notes that the second neural network is updated less frequently that the first neural network because the belief tracker component is updated at every iteration. Each iteration of the belief tracker component is a different neural network (updated neural network) which is updated less frequently because the action space is disambiguated at every step).

In reference to claim 11.
“retrieve training dialogue data” (Campbell in at least Col. 1 line 27 to Col. 2 line 3 “receiving as inputs, an upper limit of dialog turns and training data, in which the training data includes a series of user utterances for a dialog and a series of responsive system utterances for the dialog”);
“generate a first observation from the training dialogue data” (Campbell in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the turn and on number of dialog turns elapsed”. Examiner notes that the dialog in each turn represents an observation, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations));
“feed the first observation into a neural network to generate a first set of predicted outputs, the first set of predicted outputs including a predicted value” (Campbell in at least Fig. 4, Fig. 8, and Col. 14 line 60 Col. 16 line 7 “belief tracker component 408 implements a neural network (e.g., a recurrent neural network) 65 that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances” and “The belief tracker component 408 receives the feature vector xt and the dialog history of the current dialog (e.g., encoded by ht
“select a first recommended action for a dialogue management system according to the first set of predicted outputs” (Campbell in at least Fig. 8, Col. 15 lines 12-21, and Col. 16 lines 14-28 “The dialog manager component 412 is configured to receive the output probability distribution vector, select an action from an action space based at least in part on a dialog management policy, and provide the selected action to a dialog generator. The dialog management policy maps belief states of a state space to actions of an action space. In some embodiments of the invention, the dialog management policy is a predetermined policy which is updated over time via machine learning (e.g., supervised learning component 602 and/or reinforcement learning component 702)” and “the action space of the dialog management policy of the dialog manager component 412 includes an inform action, a feature recommendation action, a query action, a welcome action, and a disambiguate action. The inform action returns an answer from the database, the feature recommendation provides details regarding attributes from the tables, the query action queries the database, the disambiguate action asks a question to the user to disambiguate the user query, and the welcome action generates a welcome message to be sent to the user”);
“use the first recommended action to generate a second observation from the training dialogue data” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) “the user sought to find information via dialog 500 regarding players who played in a golf tournament. In this dialog 500, the dialog handler component 402 generated responsive system utterances 504a, 504b that proactively prompt the user towards efficient queries by posing questions to the user. After narrowing down the user's query, the dialog handler component 402 generated a responsive system utterance 504c by returning information from the 
“generate a reward using the second observation” (Campbell, as mentioned above, in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the turn and on number of dialog turns elapsed”);
“calculate a target value, wherein the target value depends on the reward” (Campbell in at least Col. 11 line 59 to Col. 12 line 67 “the reward function can be represented in the following manner […] where r is the rank of correct answer. In other words, in some embodiments of the present invention, the reward function returns a value of -0.1 for each turn that does not return an answer. For example, consider a dialog comprising two turns in which the dialog provides an incorrect answer at the second turn. At turn one of the example two-tum dialog, the reward function would return a reward value of -0.1 as the dialog has not yet failed. At tum two of the example two-tum dialog, the reward function would return a value of (-0.1)+ (-1) as an incorrect answer was provided. In another embodiment, a discount factor, y between 0 and 1, is multiplied to rewards of turns>1. Hence, the accumulative reward is (-0.1)+(-1ɣ)=-0.1-ɣ. In some embodiments of the present invention, the dialog fails if a maximum number of dialog turns have been exceeded or if the user has desired an intent to terminate the dialog. If the system utterance provided by the dialog handler 402 returns an answer to the user's query (e.g., "user target"), and if the answer is within the top R number of results that have been previously provided by an 
“use the target value and the predicted value to update parameters of the neural network” (Campbell in at least Col. 11 line 59 to Col. 12 line 67, and Col. 15 lines 12-21 “the dialog management policy is an updating version of the predetermined dialog management policy that is learned through machine learning” and “the dialog management policy is a predetermined policy which is updated over time via machine learning (e.g., supervised learning component 602 and/or reinforcement learning component 702)”); and
“feed the second observation into the neural network to generate a second recommended action” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) with at least a second recommended action).

In reference to claim 12. Campbell teaches the non-transitory computer-readable medium storing software of claim 11 (as mentioned above),
Campbell further discloses:
“wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to identify a current task, calculate a completion percentage of the current task and calculate the reward using the completion percentage of the current task” (Campbell in at least Col. 11 line 59 to Col. 12 line 67 discloses how the reward function works. Examiner notes that the reward function is calculated as a function of the completion percentage. The completion percentage is either 0% or 100%, 0% being not answered in the current step or incorrect answer, and 100% being correct answer).

In reference to claim 15. Campbell teaches the non-transitory computer-readable medium storing software of claim 11 (as mentioned above),
Campbell further discloses:
 “wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to use an action screener to constrain the number of predicted outputs that need to be calculated by the neural network” (Campbell in at least Fig. 8 and Col. 16 lines 13-40 which discloses the disambiguate action to disambiguate the user query. Examiner notes that disambiguating the user query constrains the number of predicted outputs that need to be calculated by the neural network).

In reference to claim 16. Campbell teaches a dialogue management system and a reinforcement learning system for training the dialogue management system to converse with an end user, comprising:
“one or more computers and one or more storage devices storing instructions that are operable” (Campbell see at least Fig. 3), when executed by the one or more computers, to cause the one or more computers to:
“retrieve training dialogue data” (Campbell in at least Col. 1 line 27 to Col. 2 line 3 “receiving as inputs, an upper limit of dialog turns and training data, in which the training data includes a series of user utterances for a dialog and a series of responsive system utterances for the dialog”);
“generate a first observation from the training dialogue data” (Campbell in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior dialog the reward function is configured to output a reward value that is based at least in 
“feed the first observation into a neural network to generate a first set of predicted outputs, the first set of predicted outputs including a predicted value” (Campbell in at least Fig. 4, Fig. 8, and Col. 14 line 60 Col. 16 line 7 “belief tracker component 408 implements a neural network (e.g., a recurrent neural network) 65 that is configured to map dialog history to belief states. A belief state is a distribution over user goals and dialog states (e.g., context). The output of the belief tracker is an encoding of both the current user utterance and the history of utterances of user-system utterances” and “The belief tracker component 408 receives the feature vector xt and the dialog history of the current dialog (e.g., encoded by ht-1, which is a hidden vector) and outputs for each slot a probability distribution vector over the columns of the tables of the database”. Examiner notes that a probability distribution vector is a prediction);
“select a first recommended action for the dialogue management system according to the first set of predicted outputs” (Campbell in at least Fig. 8, Col. 15 lines 12-21, and Col. 16 lines 14-28 “The dialog manager component 412 is configured to receive the output probability distribution vector, select an action from an action space based at least in part on a dialog management policy, and provide the selected action to a dialog generator. The dialog management policy maps belief states of a state space to actions of an action space. In some embodiments of the invention, the dialog management policy is a predetermined policy which is updated over time via machine learning (e.g., supervised learning component 602 and/or reinforcement learning component 702)” and “the action space of the dialog 
“use the first recommended action to generate a second observation from the training dialogue data” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) “the user sought to find information via dialog 500 regarding players who played in a golf tournament. In this dialog 500, the dialog handler component 402 generated responsive system utterances 504a, 504b that proactively prompt the user towards efficient queries by posing questions to the user. After narrowing down the user's query, the dialog handler component 402 generated a responsive system utterance 504c by returning information from the database 410 that answers the user's query. In this example, dialog 500 includes three dialog turns, in which 502a and 504a includes a first dialog tum, 502b and 504b includes a second dialog turn, and 502c and 504c includes a third dialog turn”);
“generate a reward using the second observation” (Campbell, as mentioned above, in at least Col. 2 lines 4-25 “The trained dialog management policy was trained based at least in part on executing a reward function at each turn of a prior dialog, in which for each turn of the prior dialog the reward function is configured to output a reward value that is based at least in part on an accuracy of a responsive system utterance of the turn and on number of dialog turns elapsed
“calculate a target value, wherein the target value depends on the reward” (Campbell in at least Col. 11 line 59 to Col. 12 line 67 “the reward function can be represented in the following manner […] where r is the rank of correct answer. In other words, in some embodiments of the present invention, the reward function returns a value of -0.1 for each turn that does not return an answer. For example, consider a dialog comprising two turns in which the dialog provides an incorrect answer at the second turn. At turn one of the example two-tum dialog, the reward function would return a reward value of -0.1 as the dialog has not yet failed. At tum two of the example two-tum dialog, the reward function would return a value of (-0.1)+ (-1) as an incorrect answer was provided. In another embodiment, a discount factor, y between 0 and 1, is multiplied to rewards of turns>1. Hence, the accumulative reward is (-0.1)+(-1ɣ)=-0.1-ɣ. In some embodiments of the present invention, the dialog fails if a maximum number of dialog turns have been exceeded or if the user has desired an intent to terminate the dialog. If the system utterance provided by the dialog handler 402 returns an answer to the user's query (e.g., "user target"), and if the answer is within the top R number of results that have been previously provided by an human agent (e.g., top 5, top 10, top 15, top 20, etc.), then the output value of the reward function would be inversely proportional to the rank r of the answer (e.g., 1-((r-1)/R))”; see omitted formula which can be found in Col. 12 lines 20-25);
“use the target value and the predicted value to update parameters of the neural network” (Campbell in at least Col. 11 line 59 to Col. 12 line 67, and Col. 15 lines 12-21 “the dialog management policy is an updating version of the predetermined dialog management policy that is learned through machine learning” and “the dialog management policy is a predetermined policy which is updated over time
“feed the second observation into the neural network to generate a second recommended action” (Campbell, as mentioned above, see at least Fig. 5 and Col. 11 lines 5-58 which disclose an example of at least 3 dialog turns (i.e. at least 3 observations) with at least a second recommended action).

In reference to claim 19. Campbell teaches the dialogue management system and reinforcement learning system according to claim 16 (as mentioned above), wherein the instructions are operable, when executed by the one or more computers, to cause the one or more computers to:
Campbell further discloses:
“use an action screener to output a classification score for each action in a set of actions” (Campbell in at least Fig. 8 and Col. 16 lines 13-40 which discloses the disambiguate action to disambiguate the user query. Examiner notes that disambiguating the user query constrains the number of predicted outputs that need to be calculated by the neural network).

In reference to claim 20. Campbell teaches the dialogue management system and reinforcement learning system according to claim 16 (as mentioned above), wherein the instructions are operable, when executed by the one or more computers, to cause the one or more computers to:
Campbell further discloses:
“use a classification score when generating the reward” (Campbell in at least Col. 16 line 55 to Col. 18 line 38 “the hybrid approach includes augmenting a state space of the dialog management policy to include an additional dimension that corresponds to the returned value, and then modifying the reward function by specifying the reward function over the augmented state space and subtracting the output of the reward function by a function of .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3-5, 13, 14, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Campbell et al. (hereinafter Campbell) US 10387463 B2 in view of Rao et al. (hereinafter Rao) US 10762161 B2.
In reference to claim 3. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method includes:
	Campbell does not explicitly disclose:
“identifying one or more user feedback tokens in the first observation and wherein the reward is calculated as a function of the number of user feedback tokens”.
	However, Rao discloses:
“identifying one or more user feedback tokens in the first observation and wherein the reward is calculated as a function of the number of user feedback tokens” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive recommended content items, user partial or complete viewings of recommended content items, or other forms of direct feedback to the model regarding user preferences and/or selections of content items 110. Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" 

In reference to claim 4. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method includes:
Campbell does not explicitly disclose:
“performing a sentiment analysis on information derived from the first observation and wherein the reward is calculated as a function of the output of the sentiment analysis”.
	However, Rao discloses:
“performing a sentiment analysis on information derived from the first observation and wherein the reward is calculated as a function of the output of the sentiment analysis” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" or dialog processes) to provide recommended content to a user requesting a content item. One of ordinary skill would have motivation to combine Campbell and Rao because the content recommendation model is designed to reverse or enhance the polarity of the user-provided statement, depending in part on a degree of negativity or a degree of positivity of the statement, respectively (Rao Col. 8 lines 36-40).

In reference to claim 5. Campbell teaches the method according to claim 1 (as mentioned above), wherein the method includes:
Campbell does not explicitly disclose:
“receiving image information corresponding to a real or simulated user, analyzing the image information to determine an emotional state of the real or simulate user, and wherein the reward is calculated as a function of the emotional state”.
	However, Rao discloses:
“receiving image information corresponding to a real or simulated user, analyzing the image information to determine an emotional state of the real or simulate user, and wherein the reward is calculated as a function of the emotional state” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive recommended content items, user partial or complete viewings of recommended content items, or other forms of direct feedback to the model regarding user preferences and/or selections of content items 110. Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" or dialog processes) to provide recommended content to a user requesting a content item. One of ordinary skill would have motivation to combine Campbell and Rao because the content recommendation model is designed to reverse or enhance the polarity of the user-provided statement, depending in part on a degree of negativity or a degree of positivity of the statement, respectively (Rao Col. 8 lines 36-40).

In reference to claim 13. Campbell teaches the non-transitory computer-readable medium storing software of claim 11 (as mentioned above),
Campbell does not explicitly disclose:
“wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to identify one or more user feedback tokens in the first observation and calculate the reward using the number of user feedback tokens”.
	However, Rao discloses:
“wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to identify one or more user feedback tokens in the first observation and calculate the reward using the number of user feedback tokens” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive recommended content items, user partial or complete viewings of recommended content items, or other forms of direct feedback to the model regarding user preferences and/or selections of content items 110. Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" or dialog processes) to provide recommended content to a user requesting a content item. One of ordinary skill would have motivation to combine Campbell and Rao because the content recommendation model is designed to reverse or enhance the polarity of the user-provided statement, 

In reference to claim 14. Campbell teaches the non-transitory computer-readable medium storing software of claim 11 (as mentioned above),
Campbell does not explicitly disclose:
“wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to perform a sentiment analysis on information derived from the first observation and calculate the reward using the output of the sentiment analysis”.
	However, Rao discloses:
“wherein the instructions executable by one or more computers, upon such execution, cause the one or more computers to perform a sentiment analysis on information derived from the first observation and calculate the reward using the output of the sentiment analysis” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" or dialog processes) to provide recommended content to a user requesting a content item. One of ordinary skill would have motivation to combine Campbell and Rao because the content recommendation model is designed to reverse or enhance the polarity of the user-provided statement, depending in part on a degree of negativity or a degree of positivity of the statement, respectively (Rao Col. 8 lines 36-40).

In reference to claim 17. Campbell teaches the dialogue management system and reinforcement learning system according to claim 16 (as mentioned above), wherein the instructions are operable, when executed by the one or more computers, to cause the one or more computers to:
Campbell does not explicitly disclose:
“generate the reward using a sentiment analysis performed on the first observation”.

“generate the reward using a sentiment analysis performed on the first observation” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Godfather." Structured user feedback includes user selections and/or rejections responsive recommended content items, user partial or complete viewings of recommended content items, or other forms of direct feedback to the model regarding user preferences and/or selections of content items 110. Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-

In reference to claim 18. Campbell teaches the dialogue management system and reinforcement learning system according to claim 16 (as mentioned above), wherein the instructions are operable, when executed by the one or more computers, to cause the one or more computers to:
Campbell does not explicitly disclose:
“generate the reward using the number of user feedback tokens identified in the first observation”.
	However, Rao discloses:
“generate the reward using the number of user feedback tokens identified in the first observation” (Rao in at least Col. 5 lines 23-47 “User feedback items 118 can include structured and/or unstructured user feedback data, where each item of user feedback data is provided by a user from a set of multiple users on user devices 108 in response to content items 110. Unstructured user feedback data includes freeform responses by users in response to content item recommendations and/or presentation, a user input responsive to a query from a dialog process, and/or a combination thereof. In some implementations, unstructured feedback can include an "emoji" that carries the sentiment expressed by the user. An example of unstructured user feedback includes a user response "The Godfather is not a funny movie" in response to a recommended content item 110 as the movie "The Structured user feedback can include a "star rating system" in which users can rate a recommendation using a quantified value (e.g., 1-5 stars or the like). An example of a structured user feedback includes a user selection of the movie "Midnight in Paris" when presented with a selection of comedic films in response to requesting presentation of a romantic comedy” and Col. 8 lines 29-60 “a rule-based approach includes analyzing user-provided statement including one or more sentiments and/or emotions”. The reward is calculated based on the one or more sentiments and/or emotions).
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Campbell and Rao. Campbell teaches a system for implementing multi-turn dialogs. Rao teaches interactive humanoid conversational entities (e.g., "virtual agents," "chatbots" or dialog processes) to provide recommended content to a user requesting a content item. One of ordinary skill would have motivation to combine Campbell and Rao because the content recommendation model is designed to reverse or enhance the polarity of the user-provided statement, depending in part on a degree of negativity or a degree of positivity of the statement, respectively (Rao Col. 8 lines 36-40).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Viker A. Lamardo whose telephone number is (571)270-5871.  The examiner can normally be reached on Mon. - Fri. 9 AM - 5 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached on (571)272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/VIKER A LAMARDO/Primary Examiner, Art Unit 2126