DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed July 11, 2022 has been entered.  Claims 1 – 20 remain pending in the application.  Applicant’s amendments to the Drawings and Specification have overcome each and every objection previously set forth in the Non-Final Office Action mailed April 11, 2022.
Response to Arguments
Applicant’s arguments, see page 10, line 11 – page 11, line 21, filed July 11, 2022, with respect to the 35 U.S.C. 101 rejections of claims 1 – 20 have been fully considered and are persuasive.  The 35 U.S.C. 101 rejections of April 11, 2022 have been withdrawn.
Applicant's arguments, filed July 11, 2022, with respect to the 35 U.S.C. 103 rejection of claim 1 have been fully considered but they are not persuasive.
In response to applicant's arguments that Williams et al. ("Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning"), hereinafter Williams, does not teach "a template list specifying a list of verb phrases with placeholders for objects", claimed in amended claim 1, Williams discloses the use of action templates in a natural language dialog system (Abstract, lines 6-10, “We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates.”).  Williams cites an example of an action template (Section 4, lines 30-36, “Third, system actions were templatized: for example, system actions of the form “prezzo is a nice restaurant in the west of town in the moderate price range” all map to the template “<name> is a nice restaurant in the <location> of town in the <price> price range”.”) that discloses a verb phrase with placeholders for objects, where the template phrase contains the verb “is” and the placeholders “<name>”, “<location>”, and “<price>”.
Claims 8 and 15 have not been amended as stated in applicant’s arguments (“independent claims 1, 8 and 15 are being amended”), and the original rejections of claims 8 and 15 are maintained.
Applicant’s arguments with respect to the 35 U.S.C. 103 rejection of claim 3 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 8, 11, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Williams et al. ("Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning"), hereinafter Williams, in view of Ham, et al. (US Patent Application Publication No. 2021/0065705), hereinafter Ham.
Regarding claim 1, Williams discloses a computer-implemented method for natural language generation (Abstract, lines 6-9, “Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software”), comprising:
receiving a current observation expressed in natural language (Section 2, lines 10-11, "The cycle begins when the user provides an utterance, as text");
extracting entities in the current observation (Section 2, lines 15-16, "an entity extraction module identifies entity mentions");
obtaining a template list, the template list specifying a list of verb phrases with placeholders for objects (Abstract, lines 6-10, “We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates.”; Section 3, lines 52-54, "However, HCNs differ in that they use developer-provided action templates, which can contain entity references"; Section 4, lines 30-36, “Third, system actions were templatized: for example, system actions of the form “prezzo is a nice restaurant in the west of town in the moderate price range” all map to the template “<name> is a nice restaurant in the <location> of town in the <price> price range”.”);
inputting the observations and the template list to a neural network, the neural network outputting the template list of the verb phrases filled-in with at least some of the entities (Section 2, lines 32-34, "The feature components from steps 1-5 are concatenated to form a feature vector (step 6). This vector is passed to an RNN,"; Section 2, lines 37-42, "The RNN computes a hidden state (vector), which is retained for the next timestep (step 8), and passed to a dense layer with a softmax activation, with output dimension equal to the number of distinct system action templates (step 9). Thus the output of step 9 is a distribution over action templates. Next, the action mask is applied as an element-wise multiplication, and the result is normalized back to a probability distribution (step 10) – this forces non-permitted actions to take on probability zero. From the resulting distribution (step 11), an action is selected (step 12)." Section 2, lines 54-56, "The selected action is next passed to “Entity output” developer code that can substitute in entities (step 13) and produce a fully-formed action"); Substituting in entities and producing a fully-formed action reads on outputting the template filled-in with entities.);
and receiving a reward associated with the neural network's output, wherein based on the reward, the neural network automatically retraining itself (Section 6, lines 3-12, "Once a system operates at scale, interacting with a large number of users, it is desirable for the system to continue to learn autonomously using reinforcement learning (RL). With RL, each turn receives a measurement of goodness called a reward; the agent explores different sequences of actions in different situations, and makes adjustments so as to maximize the expected discounted sum of rewards, which is called the return, denoted G.").
Williams does not disclose:
selecting a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation;
combining the current observation and the relevant historical observation as observations.
Ham teaches:
selecting a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation (Paragraph 0087, lines 1-6, "The electronic device 200 may determine an entity that is included in the text and needs to be specified. The electronic device 200 may acquire specification information for specifying the determined entity by retrieving the information about the user's conversation history acquired from the selected database."; The text reads on the current observation and acquiring specification information from the user’s conversation history reads on selecting a relevant historical observation from historical observations.);
combining the current observation and the relevant historical observation as observations (Paragraph 0087, lines 6-9, "The electronic device 200 may interpret the text and the specification information using, for example, and without limitation, a natural language understanding (NLU) model, or the like."; The text reads on the current observation and the specification information reads on the relevant historical observation.).
Ham teaches combining the current text with information from the conversation history in order to accurately analyze the current text for generating a response (Paragraph 0006, lines 5-8, "Public electronic devices installed in public places need to store and use a conversation history for each user in order to accurately analyze a user's utterance input and provide a personalized answer thereto.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to combine the current text with information from the conversation history.  Doing so would allow for accurately analyzing the current text for generating a response.
Regarding claim 4, Williams in view of Ham discloses the method of claim 1, wherein the entities include noun words identifiable in the observations (Ham; Paragraph 0100, lines 1-5, "According to an embodiment of the disclosure, the electronic device 200 may determine a noun that needs to be specified in the text as an entity that needs to be specified and specify an object indicated by the noun based on conversation history information.").
Ham teaches extracting entities that are nouns in order to determine the domain of the entities (Paragraph 0007, lines 1-9, "The entity may include at least one of a word, a phrase, or a morpheme having a specific meaning, which is included in the text. The electronic device 200 may identify at least one entity in the text and determine which domain includes each entity according to the meaning of the at least one entity. For example, the electronic device 200 may determine whether the entity identified in the text is an entity representing, for example, and without limitation, a person, an object, a geographical area, a time, a date, or the like.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to extracting entities that are nouns.  Doing so would allow for determining the domain of the entities.
Regarding claim 8, Williams discloses a system for natural language generation, comprising: a processor; and a memory coupled to the processor (Abstract, lines 6-9, “Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software”), the processor configured to:
receive a current observation expressed in natural language (Section 2, lines 10-11, "The cycle begins when the user provides an utterance, as text");
extract entities in the current observation (Section 2, lines 15-16, "an entity extraction module identifies entity mentions");
obtain a template list, the template list specifying a list of verb phrases to be filled-in with at least some of the entities (Section 3, lines 52-54, "However, HCNs differ in that they use developer-provided action templates, which can contain entity references");
input the observations and the template list to a neural network, the neural network outputting the template list of the verb phrases filled-in with said at least some of the entities (Section 2, lines 32-34, "The feature components from steps 1-5 are concatenated to form a feature vector (step 6). This vector is passed to an RNN,"; Section 2, lines 37-42, "The RNN computes a hidden state (vector), which is retained for the next timestep (step 8), and passed to a dense layer with a softmax activation, with output dimension equal to the number of distinct system action templates (step 9). Thus the output of step 9 is a distribution over action templates. Next, the action mask is applied as an element-wise multiplication, and the result is normalized back to a probability distribution (step 10) – this forces non-permitted actions to take on probability zero. From the resulting distribution (step 11), an action is selected (step 12)." Section 2, lines 54-56, "The selected action is next passed to “Entity output” developer code that can substitute in entities (step 13) and produce a fully-formed action"); Substituting in entities and producing a fully-formed action reads on outputting the template filled-in with entities.);
and receive a reward associated with the neural network's output, wherein based on the reward, the neural network automatically retraining itself (Section 6, lines 3-12, "Once a system operates at scale, interacting with a large number of users, it is desirable for the system to continue to learn autonomously using reinforcement learning (RL). With RL, each turn receives a measurement of goodness called a reward; the agent explores different sequences of actions in different situations, and makes adjustments so as to maximize the expected discounted sum of rewards, which is called the return, denoted G.").
Williams does not disclose:
select a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation;
combine the current observation and the relevant historical observation as observations.
Ham teaches:
select a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation (Paragraph 0087, lines 1-6, "The electronic device 200 may determine an entity that is included in the text and needs to be specified. The electronic device 200 may acquire specification information for specifying the determined entity by retrieving the information about the user's conversation history acquired from the selected database."; The text reads on the current observation and acquiring specification information from the user’s conversation history reads on selecting a relevant historical observation from historical observations.);
combine the current observation and the relevant historical observation as observations (Paragraph 0087, lines 6-9, "The electronic device 200 may interpret the text and the specification information using, for example, and without limitation, a natural language understanding (NLU) model, or the like."; The text reads on the current observation and the specification information reads on the relevant historical observation.).
Ham teaches combining the current text with information from the conversation history in order to accurately analyze the current text for generating a response (Paragraph 0006, lines 5-8, "Public electronic devices installed in public places need to store and use a conversation history for each user in order to accurately analyze a user's utterance input and provide a personalized answer thereto.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to combine the current text with information from the conversation history.  Doing so would allow for accurately analyzing the current text for generating a response.
Regarding claim 11, Williams in view of Ham discloses the system of claim 8, wherein the entities include noun words identifiable in the observations (Ham; Paragraph 0100, lines 1-5, "According to an embodiment of the disclosure, the electronic device 200 may determine a noun that needs to be specified in the text as an entity that needs to be specified and specify an object indicated by the noun based on conversation history information.").
Ham teaches extracting entities that are nouns in order to determine the domain of the entities (Paragraph 0007, lines 1-9, "The entity may include at least one of a word, a phrase, or a morpheme having a specific meaning, which is included in the text. The electronic device 200 may identify at least one entity in the text and determine which domain includes each entity according to the meaning of the at least one entity. For example, the electronic device 200 may determine whether the entity identified in the text is an entity representing, for example, and without limitation, a person, an object, a geographical area, a time, a date, or the like.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to extracting entities that are nouns.  Doing so would allow for determining the domain of the entities.
Regarding claim 15, Williams discloses a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device (Abstract, lines 6-9, “Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software”) to cause the device to:
receive a current observation expressed in natural language (Section 2, lines 10-11, "The cycle begins when the user provides an utterance, as text");
extract entities in the current observation (Section 2, lines 15-16, "an entity extraction module identifies entity mentions");
obtain a template list, the template list specifying a list of verb phrases to be filled-in with at least some of the entities (Section 3, lines 52-54, "However, HCNs differ in that they use developer-provided action templates, which can contain entity references");
input the observations and the template list to a neural network, the neural network outputting the template list of the verb phrases filled-in with said at least some of the entities (Section 2, lines 32-34, "The feature components from steps 1-5 are concatenated to form a feature vector (step 6). This vector is passed to an RNN,"; Section 2, lines 37-42, "The RNN computes a hidden state (vector), which is retained for the next timestep (step 8), and passed to a dense layer with a softmax activation, with output dimension equal to the number of distinct system action templates (step 9). Thus the output of step 9 is a distribution over action templates. Next, the action mask is applied as an element-wise multiplication, and the result is normalized back to a probability distribution (step 10) – this forces non-permitted actions to take on probability zero. From the resulting distribution (step 11), an action is selected (step 12)." Section 2, lines 54-56, "The selected action is next passed to “Entity output” developer code that can substitute in entities (step 13) and produce a fully-formed action"); Substituting in entities and producing a fully-formed action reads on outputting the template filled-in with entities.);
and receive a reward associated with the neural network's output, wherein based on the reward, the neural network automatically retraining itself (Section 6, lines 3-12, "Once a system operates at scale, interacting with a large number of users, it is desirable for the system to continue to learn autonomously using reinforcement learning (RL). With RL, each turn receives a measurement of goodness called a reward; the agent explores different sequences of actions in different situations, and makes adjustments so as to maximize the expected discounted sum of rewards, which is called the return, denoted G.").
Williams does not disclose:
select a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation;
combine the current observation and the relevant historical observation as observations.
Ham teaches:
select a relevant historical observation from historical observations, the relevant historical observation selected based on the relevant historical observation having at least one of the entities in common with the current observation (Paragraph 0087, lines 1-6, "The electronic device 200 may determine an entity that is included in the text and needs to be specified. The electronic device 200 may acquire specification information for specifying the determined entity by retrieving the information about the user's conversation history acquired from the selected database."; The text reads on the current observation and acquiring specification information from the user’s conversation history reads on selecting a relevant historical observation from historical observations.);
combine the current observation and the relevant historical observation as observations (Paragraph 0087, lines 6-9, "The electronic device 200 may interpret the text and the specification information using, for example, and without limitation, a natural language understanding (NLU) model, or the like."; The text reads on the current observation and the specification information reads on the relevant historical observation.).
Ham teaches combining the current text with information from the conversation history in order to accurately analyze the current text for generating a response (Paragraph 0006, lines 5-8, "Public electronic devices installed in public places need to store and use a conversation history for each user in order to accurately analyze a user's utterance input and provide a personalized answer thereto.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to combine the current text with information from the conversation history.  Doing so would allow for accurately analyzing the current text for generating a response.
Regarding claim 18, Williams in view of Ham discloses the computer program product as claimed in claim 15, wherein the entities include noun words identifiable in the observations (Ham; Paragraph 0100, lines 1-5, "According to an embodiment of the disclosure, the electronic device 200 may determine a noun that needs to be specified in the text as an entity that needs to be specified and specify an object indicated by the noun based on conversation history information.").
Ham teaches extracting entities that are nouns in order to determine the domain of the entities (Paragraph 0007, lines 1-9, "The entity may include at least one of a word, a phrase, or a morpheme having a specific meaning, which is included in the text. The electronic device 200 may identify at least one entity in the text and determine which domain includes each entity according to the meaning of the at least one entity. For example, the electronic device 200 may determine whether the entity identified in the text is an entity representing, for example, and without limitation, a person, an object, a geographical area, a time, a date, or the like.").
Williams and Ham are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams to incorporate the teachings of Ham to extracting entities that are nouns.  Doing so would allow for determining the domain of the entities.
Claims 2, 5, 9, 12, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Williams in view of Ham and further in view of Wu et al. ("End-to-End Dynamic Query Memory Network for Entity-Value Independent Task-Oriented Dialog"), hereinafter Wu.
Regarding claim 2, Williams in view of Ham discloses the method as claimed in claim 1, but does not specifically disclose: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series.
Wu teaches: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series (Section 2.2.2, lines 1-8, "To capture the sequential dependencies of dialog utterances, we adopt the idea from [10, 11], whose model can be seen as a bank of gated RNNs, whose hidden states correspond to latent concepts and attributes. Therefore, to obtain a similar behaviour, DQMemNN adds a recurrent architecture between hops. We use the output memory cells {ci} as the inputs of a Long Short Term Memory (LSTM) [12] based on the utterances order appearing in the dialog history.”; The sequential dependencies of dialog utterances reads on the relevant historical observations in a time series.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Regarding claim 5, Williams in view of Ham discloses the method as claimed in claim 1, but does not specifically disclose: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system.
Wu teaches: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system (Section 2.2, lines 1-6, "Our model takes a discrete set of RDL dialog utterances si, i = 1,...,N-1 as input, and the output answer sN is the one of the action templates in cand, where N is the total utterances number in the dialog. Specifically, input s1,...,sN-2 are the utterances (stories) stored in memory, and sN-1 is the initial query (question)."; The sN-1 utterance reads on the current dialog and the s1,...,sN-2 utterances read on the past dialogs.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Regarding claim 9, Williams in view of Ham discloses the system as claimed in claim 8, but does not specifically disclose: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series.
Wu teaches: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series (Section 2.2.2, lines 1-8, "To capture the sequential dependencies of dialog utterances, we adopt the idea from [10, 11], whose model can be seen as a bank of gated RNNs, whose hidden states correspond to latent concepts and attributes. Therefore, to obtain a similar behaviour, DQMemNN adds a recurrent architecture between hops. We use the output memory cells {ci} as the inputs of a Long Short Term Memory (LSTM) [12] based on the utterances order appearing in the dialog history.”; The sequential dependencies of dialog utterances reads on the relevant historical observations in a time series.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Regarding claim 12, Williams in view of Ham discloses the system as claimed in claim 8, but does not specifically disclose: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system.
Wu teaches: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system (Section 2.2, lines 1-6, "Our model takes a discrete set of RDL dialog utterances si, i = 1,...,N-1 as input, and the output answer sN is the one of the action templates in cand, where N is the total utterances number in the dialog. Specifically, input s1,...,sN-2 are the utterances (stories) stored in memory, and sN-1 is the initial query (question)."; The sN-1 utterance reads on the current dialog and the s1,...,sN-2 utterances read on the past dialogs.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Regarding claim 16, Williams in view of Ham discloses the computer program product as claimed in claim 15, but does not specifically disclose: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series.
Wu teaches: wherein the relevant historical observation includes a plurality of relevant historical observations in a time series (Section 2.2.2, lines 1-8, "To capture the sequential dependencies of dialog utterances, we adopt the idea from [10, 11], whose model can be seen as a bank of gated RNNs, whose hidden states correspond to latent concepts and attributes. Therefore, to obtain a similar behaviour, DQMemNN adds a recurrent architecture between hops. We use the output memory cells {ci} as the inputs of a Long Short Term Memory (LSTM) [12] based on the utterances order appearing in the dialog history.”; The sequential dependencies of dialog utterances reads on the relevant historical observations in a time series.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Regarding claim 19, Williams in view of Ham discloses the computer program product as claimed in claim 15, but does not specifically disclose: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system.
Wu teaches: wherein the current observation includes a current dialog and the historical observations include past dialogs in a conversation carried on in an interactive system (Section 2.2, lines 1-6, "Our model takes a discrete set of RDL dialog utterances si, i = 1,...,N-1 as input, and the output answer sN is the one of the action templates in cand, where N is the total utterances number in the dialog. Specifically, input s1,...,sN-2 are the utterances (stories) stored in memory, and sN-1 is the initial query (question)."; The sN-1 utterance reads on the current dialog and the s1,...,sN-2 utterances read on the past dialogs.).  Wu teaches using a time sequence of dialog utterances in order to improve performance with longer dialogs (Section 5, lines 1-9, "This paper has introduced an end-to-end framework for task oriented dialog systems based on Dynamic Query Memory Network and a recorded delexicalization mechanism. DQMemNN is designed to overcome the major drawback of MemNN, namely no temporal dependencies during memory access. In addition, RDL is able to reduce the learning complexity and also alleviate OOV entity problems. The results show that DQMemNN outperforms other memory network models, especially in the task with longer dialog turns.").
Williams, Ham, and Wu are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Wu to use a time sequence of dialog utterances.  Doing so would improve performance with longer dialogs.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Williams in view of Ham and further in view of Seo et al. (“Bi-Directional Attention Flow for Machine Comprehension”), hereinafter Seo.
Regarding claim 3, Williams in view of Ham discloses the method as claimed in claim 1, but does not specifically disclose: wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning, wherein an attention between a verb in the verb phrases and an observation word in said at least some of the entities are determined as a sum of an embedding associated with the verb, an embedding associated with the observation word, and an element-wise product of the embedding associated with the verb and the embedding associated with the observation word, wherein the embedding associated with the verb, the embedding associated with the observation word and the element-wise product are parameterized with learnable vectors.
Seo teaches:
wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning (Abstract, lines 6-10, “In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi stage hierarchical process that represents the context at different levels of granularity and uses bidirectional attention flow mechanism to obtain a query-aware context representation without early summarization.”),
wherein an attention between a verb in the verb phrases and an observation word in said at least some of the entities are determined as a sum of an embedding associated with the verb, an embedding associated with the observation word, and an element-wise product of the embedding associated with the verb and the embedding associated with the observation word, wherein the embedding associated with the verb, the embedding associated with the observation word and the element-wise product are parameterized with learnable vectors (Page 3, lines 34-44, “In this layer, we compute attentions in two directions: from context to query as well as from query to context. Both of these attentions, which will be discussed below, are derived from a shared similarity matrix, S ϵ RTxJ, between the contextual embeddings of the context (H) and the query (U), where Stj indicates the similarity between t-th context word and j-th query word. The similarity matrix is computed by Stj = α(H:t, U:j) ϵ R where α is a trainable scalar function that encodes the similarity between its two input vectors, H:t is t-th column vector of H, and U:j is j-th column vector of U.  We choose α(h; u) = w(S)[h; u; h ° u], where w(S) ϵ R6d, is a trainable weight vector, ° is elementwise multiplication, [;] is vector concatenation across row, and implicit multiplication is matrix multiplication. Now we use S to obtain the attentions and the attended vectors in both directions.”; The query word reads on the verb, the context word reads on the observation word, and the trainable weight vector reads on the learnable vectors.).
Seo teaches using a bidirectional attention flow mechanism to obtain query-aware context representations from query embeddings, context embeddings, and the element-wise product of query embeddings and context embeddings, with the embeddings weighted with trainable weight vectors, in order to implement a question answering system capable of answering complex questions (Section 6, lines 1-8, “In this paper, we introduce BIDAF, a multi-stage hierarchical process that represents the context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query aware context representation without early summarization. The experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test. The ablation analyses demonstrate the importance of each component in our model. The visualizations and discussions show that our model is learning a suitable representation for MC and is capable of answering complex questions by attending to correct locations in the given paragraph.”).
Williams, Ham, and Seo are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Seo to use a bidirectional attention flow mechanism to obtain query-aware context representations from query embeddings, context embeddings, and the element-wise product of query embeddings and context embeddings, with the embeddings weighted with trainable weight vectors.  Doing so would allow for implementing a question answering system capable of answering complex questions.
Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Williams in view of Ham and further in view of Hausknecht et al. ("Interactive Fiction Games: A Colossal Adventure"), hereinafter Hausknecht.
Regarding claim 6, Williams in view of Ham discloses the method as claimed in claim 1, but does not specifically disclose: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game.
Hausknecht teaches: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game (Section 5.3, lines 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun."; Figure 3, "Template-DQN estimates Q-Values Q(o, u) for all templates u ∈ T and Q(o, p) for all vocabulary p ∈ V. Similar to DRRN, separate GRUs are used to encode each component of the observation, including the text of the previous action at-1."; The observation reads on the game step and the previous action reads on the previous game steps.).  Hausknecht teaches using the game step and previous game steps when filling out an action template in order to properly pair verbs and nouns in the action template (Section 5.3, line 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun.").
Williams, Ham, and Hausknecht are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Hausknecht to use the game step and previous game steps when filling out an action template.  Doing so would allow for properly pairing verbs and nouns in the action template.
Regarding claim 13, Williams in view of Ham discloses the system as claimed in claim 8, but does not specifically disclose: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game.
Hausknecht teaches: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game (Section 5.3, lines 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun."; Figure 3, "Template-DQN estimates Q-Values Q(o, u) for all templates u ∈ T and Q(o, p) for all vocabulary p ∈ V. Similar to DRRN, separate GRUs are used to encode each component of the observation, including the text of the previous action at-1."; The observation reads on the game step and the previous action reads on the previous game steps.).  Hausknecht teaches using the game step and previous game steps when filling out an action template in order to properly pair verbs and nouns in the action template (Section 5.3, line 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun.").
Williams, Ham, and Hausknecht are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Hausknecht to use the game step and previous game steps when filling out an action template.  Doing so would allow for properly pairing verbs and nouns in the action template.
Regarding claim 20, Williams in view of Ham discloses the computer program product as claimed in claim 15, but does not specifically disclose: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game.
Hausknecht teaches: wherein the current observation includes a game step in an interactive fiction game and the historical observations include previous games steps in the interactive fiction game (Section 5.3, lines 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun."; Figure 3, "Template-DQN estimates Q-Values Q(o, u) for all templates u ∈ T and Q(o, p) for all vocabulary p ∈ V. Similar to DRRN, separate GRUs are used to encode each component of the observation, including the text of the previous action at-1."; The observation reads on the game step and the previous action reads on the previous game steps.).  Hausknecht teaches using the game step and previous game steps when filling out an action template in order to properly pair verbs and nouns in the action template (Section 5.3, line 1-7, "LSTM-DQN is an agent for parser-based games that handles the combinatorial action space by generating verb-object actions using a set possible verbs and possible objects. Specifically, LSTM-DQN uses two output layers to estimate Q-Values over possible verbs and objects. Actions are selected by pairing the maximally valued verb with the maximally valued noun.").
Williams, Ham, and Hausknecht are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Hausknecht to use the game step and previous game steps when filling out an action template.  Doing so would allow for properly pairing verbs and nouns in the action template.
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Williams in view of Ham and further in view of Vlasov et al. ("Few-Shot Generalization Across Dialogue Tasks"), hereinafter Vlasov.
Regarding claim 7, Williams in view of Ham discloses the method as claimed in claim 1, but does not specifically disclose: wherein the observations and the template list input to a neural network are tokenized using natural language processing, and embedded as vectors using word embeddings.
Vlasov teaches: wherein the observations and the template list input to a neural network are tokenized using natural language processing, and embedded as vectors using word embeddings (Section 3, lines 22-25, "The first step of the policy is to featurize user input, system actions and slots. The labels for the user input are the intents and entities extracted by the natural language understanding system. The labels for the system actions are action names. For our purposes, we use the tokens in the labels as features"; Section 3, lines 31-35, "In the second step, embedding layers are applied to feature vectors to create embeddings for user input (intent and entities), system actions and slots. The embedding layers are dense layers with separate weights for user input, slots, system actions and RNN output. User input and the previous RNN output (the previous predicted system action embedding) are used to calculate an attention over a memory.").  Vlasov teaches using tokens from the extracted entities and generating embeddings for the entities to allow the neural network to learn which previous inputs are important for providing a response (Section 3, lines 6-7, "Instead, we propose an architecture which attends over the history of the dialogue, learning which previous user utterances and system actions are important for deciding which action to take next.")
Williams, Ham, and Vlasov are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Vlasov to use tokens from the extracted entities and generate embeddings for the entities.  Doing so would allow the neural network to learn which previous inputs are important for providing a response.
Regarding claim 14, Williams in view of Ham discloses the system as claimed in claim 8, but does not specifically disclose: wherein the observations and the template list input to a neural network are tokenized using natural language processing, and embedded as vectors using word embeddings.
Vlasov teaches: wherein the observations and the template list input to a neural network are tokenized using natural language processing, and embedded as vectors using word embeddings (Section 3, lines 22-25, "The first step of the policy is to featurize user input, system actions and slots. The labels for the user input are the intents and entities extracted by the natural language understanding system. The labels for the system actions are action names. For our purposes, we use the tokens in the labels as features"; Section 3, lines 31-35, "In the second step, embedding layers are applied to feature vectors to create embeddings for user input (intent and entities), system actions and slots. The embedding layers are dense layers with separate weights for user input, slots, system actions and RNN output. User input and the previous RNN output (the previous predicted system action embedding) are used to calculate an attention over a memory.").  Vlasov teaches using tokens from the extracted entities and generating embeddings for the entities to allow the neural network to learn which previous inputs are important for providing a response (Section 3, lines 6-7, "Instead, we propose an architecture which attends over the history of the dialogue, learning which previous user utterances and system actions are important for deciding which action to take next.")
Williams, Ham, and Vlasov are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Vlasov to use tokens from the extracted entities and generate embeddings for the entities.  Doing so would allow the neural network to learn which previous inputs are important for providing a response.
Claims 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Williams in view of Ham and further in view of Ni et al. (US Patent No. 11,068,474), hereinafter Ni.
Regarding claim 10, Williams in view of Ham discloses the system as claimed in claim 8, but does not specifically disclose: wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning.
Ni teaches: wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning (Column 11, lines 35-38, "The conversational query understanding engine may use various forms of encoders. In an example, a general recurrent neural network (RNN) is used to present the encoding process"; Column 11, line 66 - column 12, line 2, "In a further example, the sequence to sequence framework is combined with an attention mechanism (e.g., word importance evaluation, etc.) that significantly improves sequence to sequence model performance."; Column 14, lines 40-42, "In some embodiments, the context and the query may be encoded using the same RNN. Before matching query against context, they are encoded to new representations").  Ni teaches using an attention mechanism to encode a query and context in order to provide a relevant response to the query (Column 1, line 61 - column 2, line 5, "Embodiments described herein generally relate to understanding conversational queries received by an information retrieval system (e.g., a personal assistant device, search engine, etc.), and in particular, to techniques and configurations that use context retrieved from previous search queries and results to reformulate a current query that is missing context. Example embodiments discussed herein further relate to using deep machine learning techniques to identify relationships between words or phrases of the context data and the current query and further determine that the current query needs to be reformulated to retrieve relevant results.").
Williams, Ham, and Ni are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Ni to use an attention mechanism to encode a query and context.  Doing so would allow for providing a relevant response to the query.
Regarding claim 17, Williams in view of Ham discloses the computer program product as claimed in claim 15, but does not specifically disclose: wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning.
Ni teaches: wherein the neural network takes the verb phrases as queries and computes query-context representation using an attention mechanism in deep learning (Column 11, lines 35-38, "The conversational query understanding engine may use various forms of encoders. In an example, a general recurrent neural network (RNN) is used to present the encoding process"; Column 11, line 66 - column 12, line 2, "In a further example, the sequence to sequence framework is combined with an attention mechanism (e.g., word importance evaluation, etc.) that significantly improves sequence to sequence model performance."; Column 14, lines 40-42, "In some embodiments, the context and the query may be encoded using the same RNN. Before matching query against context, they are encoded to new representations").  Ni teaches using an attention mechanism to encode a query and context in order to provide a relevant response to the query (Column 1, line 61 - column 2, line 5, "Embodiments described herein generally relate to understanding conversational queries received by an information retrieval system (e.g., a personal assistant device, search engine, etc.), and in particular, to techniques and configurations that use context retrieved from previous search queries and results to reformulate a current query that is missing context. Example embodiments discussed herein further relate to using deep machine learning techniques to identify relationships between words or phrases of the context data and the current query and further determine that the current query needs to be reformulated to retrieve relevant results.").
Williams, Ham, and Ni are considered to be analogous to the claimed invention because they are in the same field of natural language interactive systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Williams in view of Ham to incorporate the teachings of Ni to use an attention mechanism to encode a query and context.  Doing so would allow for providing a relevant response to the query.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657