DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed 1/25/2021, applicant has submitted an amendment filed 4/13/2021.
Claim(s) 1, 6-8, 10-13, 15, has/have been amended.  Claim(s) 4, 9, 14, has/have been cancelled.  
EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Carl Pellegrini on 4/20/2021.

The application has been amended as follows: 

Amend “estimate a node” in the last line of claim 1 to recite –estimate the node--
Amend “estimate a node” in the last line of claim 6 to recite –estimate the node--
Amend “estimate a node” in the last line of claim 11 to recite –estimate the node-

Claim Interpretation
In claims 6, 7, 11, 12, “the collating” in “in the collating” in the 4th to last line is interpreted as referring to what used to be referred to as the (b) step in claims 9 and 14 (“for each of a plurality of nodes in a conversation tree in which at least one of a label and a topic is provided to each of the plurality of nodes, collating the at least one of the label and the topic provided to the node and the received utterance, and estimating a node that is most related to the received utterance”).

Allowable Subject Matter
Claims 1-3, 5, 6-8, 10, 11-13, 15, are allowed.
The following is an examiner’s statement of reasons for allowance:
Applicant incorporated claims 4, 9, and 14 (previously indicated as being directed to allowable subject matter) into their respective independent claims.
To restate the reasons for indicating allowable subject matter:
As per Claim 4 (and similarly claims 9 and 14), the prior art of record does not teach or suggest the combination of all limitations in claims 1 and 4 together, including (i.e. in combination with the remaining limitations in claims 1 and 4) wherein the utterance position estimation unit specifies a state indicated by the utterance and a feature word included in the utterance based on the received utterance, and collates the specified state of the utterance and the specified feature word with the label and the topic of each of the plurality of nodes to estimate a node that is most related to the received utterance.
In addition to what was discussed in the rejection of claim 1, Aleksic describes comparing (matching or determining a strong correlation) “context data” of a request and context data of a dialog state to determine that the request pertains to the dialog state that corresponds to the matching set of context data, and where context data can be analogized to a fingerprint that uniquely identifies the dialog state (paragraph 76) and analyzing both the voice input and the context data to determine a dialog state for the voice input (paragraph 80).  In Aleksic, context associated with a voice input can include data that characterizes a display of a user interface at a computing device at which the voice input was received at a time that the voice input was received (paragraph 12) and “Some types of context data may indicate a condition or state of the user device 108 at or near a time that the voice input 108 was detected by the device 108. As described further below, examples of context data include user account information, anonymized user profile information (e.g., gender, age, browsing history data, data indicating previous queries submitted on the device 108), location information, and a screen signature (i.e., data that indicates content displayed by the device 108 at or near a time when the voice input 110 was detected by the device 108). In some implementations, the application identifier, dialog identifier, and dialog state identifier may be considered as special types of context data” (paragraph 38) and “dialog state history data may be used alone (i.e., without other context data) to determine the dialog state associated with a transcription request”, (paragraph 51) and where a request can include a dialog state identifier and “other context data” (paragraph 52).  Paragraph 59 describes comparing context data included in the request with respective context data associated with each of the dialog 
Aleksic thus suggests an embodiment of the comparison where both the words of the transcription and a dialog state identifier in the request are compared to words corresponding to a dialog state and a dialog state identifier corresponding to the dialog state, respectively, to determine a dialog state corresponding to the voice input.
Aleksic does not appear to describe where the utterance position estimation unit specifies a state indicated by the received utterance based on the received utterance (the dialog state identifier in paragraph 52, in particular, appears to be sent with the request as a separate part of the request and thus appears to be specified before the system receives and processes the request [see e.g. paragraph 63] and is not based on the utterance [i.e. the audio of the request]) and where the utterance position estimation unit collates the specified state of the utterance (i.e. the state that is indicated by the received utterance and which is specified based on the received utterance) with the label and the topic of each of the plurality of nodes.
Aleksic also teaches, in paragraph 52, where a speech recognizer may provide an indication of a dialog state identifier that corresponds to a given request to the user device along with the transcription result (at least suggesting that a speech recognizer can deermine a dialog state identifier).  In this example, the speech recognizer provides a transcription result and a dialog state identifier associated with a first request, and then in a subsequent/second request, the dialog state identifier associated with the first request which can be used by the speech recognizer to determine a dialog state for the second/subsequent request.  Therefore, Aleksic suggests where a speech recognizer not indicated by the received utterance and specified based on the received utterance which is determined to be most related to the estimated dialog state (the current dialog state in the rejection of claim 1).
	In the following reference Bui et al., paragraph 55 describes a dialog state tracker that analyzes utterances that are represented by computer readable text to determine dialog states.  Paragraph 57 describes predicting a dialog state of a given utterance which can be a current utterance [the most recent utterance].  Paragraph 58 describes where in response to predicting dialog states of utterances, the dialog system can take one or more machine actions such as mapping the dialog states to actions that can result in computer output.  Paragraph 59 describes where an output is provided visually/audibly and where the user can respond to the output.
	In this reference, it appears that the determined dialog state is analogous to the determined dialog state corresponding to a voice input (i.e. determined based on the comparison of n-grams and/or context data), and not to the dialog state identifier that is used in the comparison.  It is, therefore, not clear that one of ordinary skill in the art would use the dialog state determination in Bui to determine, using the utterance position estimation unit, the dialog state identifier that is collated/compared to dialog states’ dialog state identifiers.
2017/0228366 “In managing dialog session 130, session manager 122 provides inputs of dialog session 130 to dialog state tracker 118 and receives outputs comprising dialog states of dialog session 130 from dialog state tracker 118. The inputs include the utterances that are represented by computer readable text (e.g., utterances 132), which 
2017/0364323 teaches where a dialog processing server sends a response message including a dialog state identifier and text of a user’s utterance (paragraph 68) where the dialog state name is “Search_spot (Kyoto)” (paragraph 68) which appears to be based on the user’s utterance “Tell me sightseeing spots in Kyoto” (paragraph 66).  Paragraphs 30-32 describe an embodiment where the dialog receiver may be included in the dialog processing server, and another embodiment where Figure 2 is in a terminal (which appears to be a user device as depicted in Figure 1).  Paragraphs 71-74 describes where an intention of a user’s utterance “Narrow down to the Arashiyama area” is analyzed and the intention is determined to be a narrowed-down search request for the “target dialogue state” and where the dialog sequence identifier for the “Narrow down to the Arashimaya area” utterance is determined to be the same as the “Tell me sightseeing spots in Kyoto” utterance (since the utterances are in the same utterance group).  Figure 12 and paragraphs 97-101 further describe other dialog states determined for utterances like “Show me hotels”.  In this reference, similar to Bui, it appears that the dialog state name is more analogous to the dialog state that is determined for an utterance in Aleksic, and not the dialog state identifier in the request which is compared/collated.
The following reference is similar to the previous reference
2017/0337036 “The dialogue receiver 202 receives the user's utterance U2 and performs speech recognition, and converts the user's utterance into text. The target determiner 204 refers to the dialogue information stored in the dialogue information 
The following reference describes, for tuning a speech recognition process,  maintaining a database of utterances, where information associated with the utterances is collected utilizing a speech recognition process, and when a speech recognition process application is deployed, audio data and recognition logs may be created, and creating a database record for each utterance including dialog state (an identifier indicating context in the dialog flow in which recognition happened).
7069513 col. 9, lines 16-45;
The following reference teaches comparing information in an out-of-band signal including a serialized form of a recognition result and an indication of a present dialog state to an expected recognition result and an expected dialog state, and logging a mismatch if any information does not match their counterpart.
2006/0224392 “Upon answering the call, speech application 308 plays a prompt, "Welcome, where would you like to fly?" In addition to playing the prompt, an out-of-band signal indicative of the prompt and the present dialog state is sent to the testing application 312. Testing application 312 interprets the out-of-band signal and compares the present dialog state with an expected dialog state. Additionally, call answer latency as well as QoS measures can be logged based on the information sent from speech application 308”, paragraph 63; “In addition to the prompt, an out-of-band signal is sent with a serialized form of the recognition result, an indication of the prompt that is played 
The following reference teaches “estimating a dialog state from an utterance” but was published in 2018 and therefore does not qualify as prior art.
Kim, A., Song, H., & Park, S. (2018). “A two-step neural dialog state tracker for task-oriented dialog processing”. Computational Intelligence and Neuroscience, 2018, NA. Retrieved from https://dialog.proquest.com/professional/docview/2225569324?accountid=131444
The prior art teaches “A spoken dialog system stores a history of dialog states in a memory, outputs a system response in a current dialog state, inputs a user utterance, performs speech recognition of the user utterance, to obtain one or a plurality of recognition candidates of the user utterance and likelihoods thereof with respect to the user utterance, calculates a degree of state conformance of each of the current and the preceding dialog states stored in the memory with respect to the user utterance, selects one of the current and the preceding dialog states and one of the recognition candidates based on a combination of the degree of state conformance of each dialog state and the 
2008/0201135 “These problems are ascribed to the estimation of a dialog state from only an input time in reference 1 and to the estimation of a dialog state from only input contents in reference 2. In order to accept a correction input by the user, it is necessary to perform input interpretation by comprehensively handling both the estimation of input contents and the estimation of a dialog state on which the input acts”, paragraph 12; 
The following reference teaches analyzing a newly input utterance from the user and updating a dialog state of the user based on a newly input utterance
2018/0075847 “The task state database 150 may include task states, e.g. in form of task lineages, of different users of the web-based conversational agent 140. The web-based conversational agent 140 may keep tracking the dialog state or task state of a user, by analyzing a newly input utterance from the user and updating the dialog state of the user based on the newly input utterance, e.g. by extending a dialog lineage of the 
The following reference teaches comparing slot fills and queued events with a dialog state column to determine a best match among eligible states.
2006/0212515 “filtering process for an eligible state set 750 generated in accordance with FIG. 7A is schematically illustrated in FIG. 7B. The eligible state set 750 is input to a Form State Filter 760 in which the eligible states consist of all the states defined for a form, taking into account that the states may have been contributed by more than one service. The Form State Filter 760, in response to the receipt of data 762 from the command analysis engine 126 filling one or more slots within the form, and in response to any currently queued events 764, generates a set of candidate user interface states 770, these being a subset of the eligible state set 750. The filtering is accomplished by comparing the slot fills and queued events with the " dialog state" column 222 of FIG. 5 to determine the best match among all the eligible states.”, paragraph 74;
The prior art teaches determining whether a user’s utterance correlates to a language model corresponding to a particular diloag state.
7162421 “If the user's utterance correlates to a language model corresponding to a particular dialog state, the natural language understanding 335 translates the user's utterance in order to determine whether the sequence of words correlates to a possible meaning for the current state of the speech recognition system”
	The prior art teaches identifying tags in a spoken language input and where a dialog state belief tracking system identifies entities, attributes, and relationships that 
2016/0163311 “The spoken language system is operable to receive a spoken language input, identify tags within the spoken language input, and communicate with the dialogue state belief tracking system. The dialogue state belief tracking system is operable to communicate with the spoken language system and to search a knowledge base framework based on the tags identified by the spoken language system. The dialogue state belief tracking system is further operable to identify entities, attributes, and relationships within the knowledge base framework that match at least some of the tags and to create a state graph based on a portion of the knowledge base framework that includes any matched entities, matched attributes, and identified relationships. The state graph is formed by transforming the portion into a probabilistic model graph and by adding evidence nodes to the probabilistic model graph based on the tags”, paragraph 5;
	The prior art teaches utterance states detected by an utterance state detector, and determining a conversation state among a plurality of users based on the utterance states detected by the utterance state detector.
2007/0150274 “a determination unit that determines a conversation state among a plurality of users of the transmission devices, on a basis of the utterance states detected by the utterance state detector of the at least one of the reception devices”, claim 2;
2008/0201133 teaches matching between an input user text utterance and IVR dialog state category set descriptions.  This reference does not appear to be matching state (it describes categorizing the utterance as corresponding to a particular task which can be part of a state, but does not specifically state that the utterance is determined to correspond to the state).  Paragraph 18 in particular describes where the 3 tasks are all part of the same state, and so the categorization does not appear to estimate that the user utterance is related to a particular state (as opposed to a particular task that is available given a plurality of states).

Upon further search:
2018/0121415 teaches “The ranking model 50 may be a probabilistic model that outputs a set of candidate matches for each detected mention, which may be ranked based on a score. Each match is or is derived from one of the tuples. For example, the candidate matches can each be a (SLOT, VALUE) pair, given that the TOPIC also matches. The ranking model 50 may be a classifier that is trained on a set of features extracted from mentions in TOPIC-relevant text and their (SLOT, VALUE) pair labels. Example features include lemma form, maximum edit distance, word embeddings, e.g., multidimensional vectors generated with a neural network, such as word2vec” (paragraph 65) which suggests matching a TOPIC and also a “mention”.
2018/0130463 (foreign priority date precedes this application’s effective date, filing date does not precede this application’s effective date) teaches “each of the plurality of pieces of utterance data includes at least a pair of a virtual conversational state and an actual conversational state that are matched to each of the plurality of pieces of utterance data” (paragraph 72) and “The intention analysis apparatus 300 determines a virtual conversational state based on overlapping of user intentions between the plurality of pieces of utterance data. For example, the intention analysis apparatus 300 matches the plurality of pieces of utterance data to one of a plurality of predefined virtual conversational states based on independent user intentions that are distinguishable from each other” (paragraph 61).  This reference appears to describe determining a conversational state based on an utterance.  The matching appears to be for machine learning and does not appear to be for estimating a most related conversational state for a received utterance.
2008/0319748 teaches “the utterance generation unit 113 of the first utterance processing unit 110 reads the conversation state (specifying the stored state in each slot and the like) from the conversation state storage unit 112, reads the knowledge for utterance generation from the DB 115 for utterance generation, and compares the conversation state with the knowledge for utterance generation” (paragraph 60).  In this reference, “utterance generation” appears to refer to an output utterance and not an input utterance (see paragraph1 and 8).
2008/0201133 teaches “a categorization algorithm that accepts a user text utterance from an input source along with a category set and their corresponding text descriptions pertaining to the IVR dialog state.  The categorization algorithm, in one .

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249.  The examiner can normally be reached on M-F 9:00AM -5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602.  The fax phone 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






EY 4/20/2021
/ERIC YEN/           Primary Examiner, Art Unit 2658