Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Response to Amendments 
Applicant’s amendment filed on May 2, 2022 has been entered. 
In view of the amendment to the claim(s), the amendment of claim(s) 1-3, 5, and 13-14 and the cancellation of claim(s) 7 and 15 have been acknowledged and entered.  
In view of the amendment to claim(s) 14, the objection to claim(s) 14 is withdrawn.
In view of the amendment to claim(s) 2 and 3, the rejection of claim(s) 2 and 3 under 35 U.S.C. §112 is withdrawn.
In view of the amendment to claim(s) 1-3, 5, and 13-14 and the cancellation of claim(s) 7 and 15, the rejection of claims 1-20 under 35 U.S.C. §103 is withdrawn.

Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 13-16 of the Response to Non-Final Office Action dated February 9, 2022, which was received on May 2, 2022 (hereinafter Response and Office Action, respectively), have been fully considered.
As Applicant has amended independent claim(s) 1, 5, and 13 to incorporate the limitations of claim(s) 7 and 15, the discussion of the rejections of claim(s) 1, 5, and 13 have been amended to incorporate the rejection of the respective limitations of claim(s) 7 and 15, as appropriate. 
With respect to the rejection(s) of claim(s) 1, 5, and 13 under 35 U.S.C. §103 in light of Divakaran (U.S. Pat. App. Pub. No. 2017/0160813, hereinafter Divakaran) in view of Sinha (U.S. Pat. App. Pub. No. 2014/0365226, hereinafter Sinha), Khan (U.S. Pat. App. Pub. No. 2016/0019915, hereinafter Khan), and Aleksic (U.S. Pat. App. Pub. No. 2017/0270929, hereinafter Aleksic), applicant argues that the cited references fail to disclose all limitations of the claims as amended. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 1, 5, and 13 under 35 U.S.C. §103 are withdrawn.
Applicant further argues that dependent claims 2-4, 6, 8-12, 14, and 16-20 are allowable for at least the same reasons as independent claims 1, 5, and 13. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2-4, 6, 8-12, 14, and 16-20 under 35 U.S.C. §103 are withdrawn.
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.	

Reasons for Allowance
Claims 1-6,8-14 and 16-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
Regarding claim 1, the closest prior art of record Divakaran teaches A computer-implemented method comprising (The systems and methods for speech recognition described with reference to the virtual personal assistant.; Divakaran, ¶¶ [0062]): receiving first audio data representing a first utterance (In embodiments describing a request for a prescription, the system discloses “the person tells the system, ‘I’d like to refill a prescription.’ {a first utterance}” where “The system detects {receiving…} that the person is speaking slowly and hesitantly. {first audio data representing the first utterance}”; Divakaran, ¶¶ [0063]); associating the first audio data with a first dialogue session identifier (Though not expressly indicated as having an identifier, the example in FIG. 3 is a dialogue session and all dialogue represented in FIG. 3 is understood by the system to be part of the same dialogue session (as indicated by the description of each of the interactions as “dialog sessions” in changing the “dialog approach.” Thus the first audio, represented at element 310 is associated with the first dialogue session identifier.; Divakaran, ¶¶ [0063], [0297], FIG. 3); determining, using automatic speech recognition (ASR) processing, first ASR output data corresponding to the first audio data (“The automatic speech recognition 412 component can identify natural language in audio input” such as in the first utterance described above “and provide the identified words as text {first ASR output} to the rest of the system 400.”; Divakaran, ¶¶ [0063], [0056], [0075]); determining, using natural language understanding (NLU) processing, a first NLU hypothesis corresponding to the first ASR output data (In further embodiments, the system can use “a natural language recognition system {determine, using natural language understanding (NLU) processing}... to understand what the person wants {a first NLU hypothesis corresponding to the first ASR output}” which corresponds to the “natural language in the audio input {the first ASR output data}”; Divakaran, ¶¶ [0063], [0056], [0075]), the first NLU hypothesis associated with a first confidence score (“The interpreter’s 1016 can produce an output what the interpreter 1016 determined, with a statistically high degree of confidence {a first confidence score}, most closely matched the person’s actual intent or the goal of the person’s communication {the first NLU hypothesis}”; Divakaran, ¶¶ [0134])… performing a first action corresponding to the first NLU hypothesis (“Based on the conclusions that the system has made about the speaker’s emotional or cognitive state {corresponding to the first NLU hypothesis}, at step 312, the system determines to change its dialog approach by asking direct yes/no questions {performing a first action}, and responds, ‘Sure, happy to help you with that. I’ll need to ask you some questions first.’ “; Divakaran, ¶¶ [0063]); receiving second audio data representing a second utterance (“At step 330, the person responds, “I think so, I found something here.. but.. <sigh>.’ “; Divakaran, ¶¶ [0068]); associating the second audio data with the first dialogue session identifier (The system changes approach in the dialog session (e.g., “...perhaps a different approach is needed”) where change in approach is responsive to changes in the dialog. Therefore, the system associates the second audio data (which the system understands as indicating the need for a change in approach) with the first dialog session. Further evidence can be found in FIG. 3, which displays a continuing dialog between the system and the user.; Divakaran, ¶¶ [0068], FIG. 3); determining second ASR output data corresponding to the second audio data (“The automatic speech recognition 412 component can identify natural language in audio input” such as in the second utterance described above “and provide the identified words as text {first ASR output} to the rest of the system 400.”; Divakaran, ¶¶ [0068], [0056], [0075]); determining a third NLU hypothesis corresponding to the second ASR output data (In further embodiments, the system can use “a natural language recognition system {determine, using natural language understanding (NLU) processing}... to understand what the person wants {a first NLU hypothesis corresponding to the first ASR output}” which corresponds to the “natural language in the audio input {the first ASR output data}”; Divakaran, ¶¶ [0068], [0056], [0075])…receiving sentiment data indicating a sentiment based on acoustic characteristics of the second audio data (“From this reply {based on… the second audio data}, the system may detect audible {thus, acoustic characteristics} frustration” where frustration indicates a sentiment which is derived from the reply {sentiment data indicating a sentiment).; Divakaran, ¶¶ [0068]); determining that the sentiment data indicates frustration (“the system may detect audible frustration,” thus the sentiment data indicates frustration.; Divakaran, ¶¶ [0068]). 
Sinha further teaches determining, using natural language understanding (NLU) processing, a second NLU hypothesis corresponding to the first ASR output data (“The natural language processing module 332 (“natural language processor”) of the digital assistant takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant,” where one or more actionable intents includes a first actionable intent {a first NLU hypothesis} and a second actionable intent {a second NLU hypothesis}; Sinha, ¶¶ [0073]), the second NLU hypothesis associated with a second confidence score (“the natural language processing module 332 will select one of the actionable intents as the task that the user intended the digital assistant to perform... In some implementations, the domain having the highest confidence value (e.g., based on the relative importance of its various triggered nodes) is selected.” As the actionable intent is selected based on the “highest confidence value,” each of the actionable intents {e.g., the second NLU hypothesis} have a confidence value {a second confidence value}.; Sinha, ¶¶ [0083]); associating at least the first ASR output data, the first NLU hypothesis and the second NLU hypothesis with the first dialogue session identifier (The first, second, and third inputs are indicated as being “received within the same dialog session {first dialog session identifier} with the digital assistant.” Thus, the first input {first audio data}, as well as the “actionable intents” {the first NLU hypothesis and the second NLU hypothesis} and the “speech-to-text processing” {first ASR output data} of the first input, are associated with the dialog session {first dialog session identifier}; Sinha, ¶¶ [0127])… determining a fourth NLU hypothesis corresponding to the second ASR output data (“The natural language processing module 332 (“natural language processor”) of the digital assistant takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant,” where one or more actionable intents for the second ASR output includes a third actionable intent {a third NLU hypothesis} and a second actionable intent {a fourth NLU hypothesis}; Sinha, ¶¶ [0073]); associating at least the second ASR output data, the third NLU hypothesis and the fourth NLU hypothesis with the first dialogue session identifier (The first, second, and third inputs are indicated as being “received within the same dialog session {first dialog session identifier} with the digital assistant.” As well, the NLU hypotheses are acted upon within the first dialogue session. Thus, the second input {second audio data}, as well as the “actionable intents” {the third NLU hypothesis and the fourth NLU hypothesis} and the “speech-to-text processing” {second ASR output data} of the second input, are associated with the dialog session of the first input {first dialog session identifier}; Sinha, ¶¶ [0127]); determining, using the first dialogue session identifier, that the second utterance is a repeat of the first utterance based at least in part on a comparison of the first NLU hypothesis and the third NLU hypothesis (“Users may also indicate dissatisfaction by repeating the same speech input multiple times {determine... that the second utterance is a repeat of the first utterance} in an effort to make the digital assistant understand his or her words or intent. Accordingly, detecting the same input from a user multiple times within a short period of time and/or within the same dialog with the digital assistant {determining, using the first dialogue session identifier} can indicate that the user is not being properly understood, or that the digital assistant is not properly identifying the user’s intent from the speech input” even when “the words in the first and second speech input may be somewhat different from one another {comparison of the first NLU hypothesis and the second NLU hypothesis}”; Sinha, ¶¶ [0126])…determining, using the first dialogue session identifier, that the second NLU hypothesis corresponds to the fourth NLU hypothesis (Using the fact that the first, second, and third inputs are part of the same dialog session {determining, using the first dialogue session identifier}, the system determines at least two intents for the second input, being the natural language understanding of the second input and the second input being used to indicate that the selected actionable intent of first input was incorrect (i.e., “determining that the second speech input indicates dissatisfaction with the at least one action” from the first speech input) {where either may be the third and fourth NLU hypotheses}. Correspondingly, the first input has at least two intents, being the selected actionable intent {the first NLU hypothesis} and actionable intent corresponding to further information is required {the second NLU hypothesis}. In the case of dissatisfaction, the indication of dissatisfaction {the fourth NLU hypothesis} causes the system to provide a prompt requesting confirmation of the error and further explanation.; Sinha, ¶¶ [0120], [0135]); in response to determining that the sentiment data indicates frustration... determining output text data including a representation of a second action corresponding to the second NLU hypothesis… (“The digital assistant performs at least one action in furtherance of satisfying the request (404).” where the one action can include “a speech output that summarizes or describes the intent inferred by the digital assistant from the speech input”; Sinha, ¶¶ [0116]-[0117])... determining output audio data corresponding to the output text data using text-to-speech (TTS) processing (“speech synthesis module 265 synthesizes speech outputs based on text provided by the digital assistant. For example, the digital assistant generates text to provide as an output to a user, and the speech synthesis module 265 converts the text to an audible speech output.”; Sinha, ¶¶ [0049]); and sending the output audio data to a device. (“In some implementations, instead of (or in addition to) using the local speech synthesis module 265, speech synthesis is performed on a remote device (e.g., the server system 108), and the synthesized speech is sent to the user device 104 for output to the user.”; Sinha, ¶¶ [0049]). 
Khan further teaches determining that the second confidence score satisfies a threshold value (“Confidence scores for one or more defined emotions are computed, as indicated at block 414, based upon the computed audio fingerprint.” where defined emotions can include “emotion of ‘anger’ “; Khan, ¶¶ [0054]); in response to determining that the sentiment data indicates frustration and that the second confidence score satisfies the threshold value, determining output text data including a representation of a second action corresponding to the second NLU hypothesis (“The action initiating component 232 is configured to initiate any of a number of different actions in response to associating one or more emotions with an audio signal.” where “The matching component 230 is configured to associate one or more emotions with the audio signal based upon the computed confidence scores and whether or not one or more confidence score thresholds has been met or exceeded.”; Khan, ¶¶ [0043], [0044]). 
Aleksic further teaches wherein a first dialogue session includes a first dialogue session identifier (The system can include a “dialog session identifier [which] is data that indicates a particular dialog session associated with the request 212. The dialog session identifier may be used by the speech recognizer 202 to correlate a series of transcription requests {first audio data} that relate to a same dialog session. {associating... with a first dialogue session identifier}”; Aleksic, ¶¶ [0053]). 
However, none of the prior art references of record, either alone or in combination, teaches, suggests, or makes obvious the combination of limitations as recited in the independent claims.
More specifically, the limitation of “performing a first action in response to the first utterance, the first action corresponding to an intent included in the first NLU hypothesis [and] in response to (i) determining that the sentiment data indicates frustration, (ii) determining that the second utterance repeats the first utterance, and (iii) that the second confidence score satisfies the threshold value, determining output text data including a representation of a second action corresponding to the second NLU hypothesis” is not taught by the prior art of record.
Regarding claims 5 and 13, the closest prior art of record Divakaran teaches A computer-implemented method comprising (The systems and methods for speech recognition described with reference to the virtual personal assistant.; Divakaran, ¶¶ [0062]): receiving first audio data representing a first utterance (In embodiments describing a request for a prescription, the system discloses “the person tells the system, ‘I’d like to refill a prescription.’ {a first utterance}” where “The system detects {receiving…} that the person is speaking slowly and hesitantly. {first audio data representing the first utterance}”; Divakaran, ¶¶ [0063]); determining, using natural language understanding (NLU) processing, first NLU data corresponding to the first audio data (In further embodiments, the system can use “a natural language recognition system {determine, using natural language understanding (NLU) processing}... to understand what the person wants {a first NLU hypothesis corresponding to the first ASR output}” which corresponds to the “natural language in the audio input {the first ASR output data}”; Divakaran, ¶¶ [0063], [0056], [0075]); causing a first action to be performed corresponding to the first NLU data (“Based on the conclusions that the system has made about the speaker’s emotional or cognitive state {corresponding to the first NLU data}, at step 312, the system determines to change its dialog approach by asking direct yes/no questions {causing a first action to be performed}, and responds, ‘Sure, happy to help you with that. I’ll need to ask you some questions first.’ “; Divakaran, ¶¶ [0063]); receiving second audio data representing a second utterance (“At step 330, the person responds, “I think so, I found something here.. but.. <sigh>.’ “; Divakaran, ¶¶ [0068]); receiving sentiment data corresponding to the second audio data (“From this reply {corresponding to… the second audio data}, the system may detect audible frustration” where frustration indicates a sentiment which is derived from the reply {sentiment data}.; Divakaran, ¶¶ [0068]); determining that the sentiment data indicates frustration (“the system may detect audible frustration,” thus the sentiment data indicates frustration.; Divakaran, ¶¶ [0068]). 
Sinha further teaches determining a repeat indicator based on the second utterance being semantically similar to the first utterance (“Users may also indicate dissatisfaction by repeating the same speech input multiple times {determine... that the second utterance is a repeat of the first utterance} in an effort to make the digital assistant understand his or her words or intent. Accordingly, detecting the same input from a user multiple times within a short period of time and/or within the same dialog with the digital assistant {determining, using the first dialogue session identifier} can indicate that the user is not being properly understood, or that the digital assistant is not properly identifying the user’s intent from the speech input”; Sinha, ¶¶ [0126])…and in response to the repeat indicator and the sentiment data indicating frustration, determining output data other than performing the first action, (“In some implementations, upon determining that the user interaction is indicative of a problem (in step (407)) {in response to the repeat indicator and the sentiment data indicating frustration}, the digital assistant provides a first prompt requesting {determining...} the user to confirm whether there was a problem in the performing of the at least one action (430) {output other than performing the first action}.”; Sinha, ¶¶ [0135]) wherein the output data corresponds to a system-generated dialogue (“speech synthesis module 265 synthesizes speech outputs based on text provided by the digital assistant. For example, the digital assistant generates text to provide as an output to a user, and the speech synthesis module 265 converts the text to an audible speech output.”; Sinha, ¶¶ [0049]); further comprising: determining, using NLU processing, second NLU data corresponding to the first audio data (“The natural language processing module 332 (“natural language processor”) of the digital assistant takes the sequence of words or tokens (“token sequence”) generated by the speech-to-text processing module 330, and attempts to associate the token sequence with one or more “actionable intents” recognized by the digital assistant,” where one or more actionable intents includes a first actionable intent {a first NLU hypothesis} and a second actionable intent {a second NLU hypothesis}; Sinha, ¶¶ [0073]), the second NLU data different than the first NLU data (“scope of a digital assistant’s capabilities is dependent... on the number and variety of ‘actionable intents’,” thus indicating that the actionable intents are different {the second NLU data different than the first NLU data}; Sinha, ¶¶ [0073]). 
Aleksic further teaches wherein a first dialogue session includes a first dialogue session identifier (The system can include a “dialog session identifier [which] is data that indicates a particular dialog session associated with the request 212. The dialog session identifier may be used by the speech recognizer 202 to correlate a series of transcription requests {first audio data} that relate to a same dialog session. {associating... with a first dialogue session identifier}”; Aleksic, ¶¶ [0053]). 
Khan further teaches determining that the second NLU data satisfies a condition (“The matching component 230 is configured to associate one or more emotions with the audio signal based upon the computed confidence scores and whether or not one or more confidence score thresholds has been met or exceeded,” thus the confidence score confirms the presence of the emotion, and where the confidence score can either satisfy a threshold or fail to satisfy said threshold (thus, confirming or failing to confirm the hypothesis of the emotion {does not satisfy a second condition}).; Khan, ¶¶ [0043]), and wherein determining the output data comprises determining the output data including a representation of a second action corresponding to the second NLU data (“The action initiating component 232 is configured to initiate any of a number of different actions in response to associating one or more emotions with an audio signal.” where “The matching component 230 is configured to associate one or more emotions with the audio signal based upon the computed confidence scores and whether or not one or more confidence score thresholds has been met or exceeded.”; Khan, ¶¶ [0043], [0044]). 
However, none of the prior art references of record, either alone or in combination, teaches, suggests, or makes obvious the combination of limitations as recited in the independent claims.
More specifically, the limitation of “determining, using NLU processing, second NLU data corresponding to the first audio data, the second NLU data different than the first NLU data [and] and in response to the repeat indicator and the sentiment data indicating frustration, determining output data including a representation of a second action corresponding to the second NLU data, wherein the second action is different from the first action” is not taught by the prior art of record.
Regarding claims 2-4, 6, 8-12, 14, and 16-20, claims 2-4, 6, 8-12, 14, and 16-20 depend from allowable independent claims 1, 5, and 13 and are allowable for at least the same reasons as described above with reference to said independent claims.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Deshpande et al. (U.S. Pat. No. 11348601) teaches systems and methods for natural language understanding including the use of alternative hypotheses. 
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657