DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Response to Amendments and Arguments
Regarding a rejection under 35 U.S.C. §112(b), applicant corrected the antecedent basis issue. The rejection has been withdrawn. 

Regarding a rejection under 35 U.S.C. §103, applicant amended independent claims by adding new limitations. Applicant argued (Remarks, pages 7-8) that the previously cited references fail to teach the newly added limitations. The examiner performed an update search and discovered a new reference to Strope et al. (US Pat. 8,185,392). 

	Strope discloses a voice search system (Fig. 1, Col. 3, lines 26-35, Fig. 2B, #252). After a user speaking a voice search query, the system recognizes the voice search query and obtains a user’s feedback to the recognized search query (Fig. 2B, #254-#255). If the system determines that a user is not satisfied to the recognized search query by typing some queries on a mobile device within a predetermined time period (Col. 4, lines 9-21; typing a correction on the mobile device after submitting the voice query, which is a negative indication to the ASR result; Col. 5, lines 18-33; Col. 9, lines 47-54; Col. 10, lines 54-63; Col. 11, lines 49-67). Strope further discloses updating / adapting speech a recognition model based on a user’s feedback (typing a search query) (Fig. 2B, #257; Col. 2, lines 45-55). In fact, Strope meets all limitations recited in the independent claims as well as some dependent claims. Therefore, Strope meets the newly added limitations:

	receiving non-verbal feedback from the user, wherein the non-verbal feedback includes a textual search query received, from a smart device associated with the user, within a threshold time after the action is performed (Col. 4, lines 9-21; Col. 5, lines 18-33; Col. 9, lines 47-54; Col. 10, lines 54-63; Col. 11, lines 49-67; user types search queries after reviewing the incorrect results on a mobile device within a predetermined period; Fig. 1, #104 shows a mobile device, e.g., Col. 4, lines 32, a PDA or a smart phone);
	wherein the smart device is a separate device from the voice response system that is in communication with the voice response system (Fig. 1, Speech recognition engine #105 + Search Engine #106, corresponds to the claimed “the voice response system”; a mobile device (e.g. a PDA / a smart phone) #104 corresponds to the claimed “a smart device”);

	Strope describes adapting a speech recognition model based on a user’s feedback such as typing a search query. The concept is quite similar to that of the instant application. In the following rejection, the examiner combines Strope with the previously cited references to reject the amended claims. Applicant’s arguments have been considered but are moot because the arguments do not apply to a combined teaching being used in the current rejection.

Claim Objections
Claim 9 is objected to because of the following informalities:  

All limitations of claim 9, which depends from claim 8, were incorporated into independent claim 8. Applicant stated (Remarks, page 6) that claim 9 was cancelled. However, in the listing of claims filed on 11/17/2022, claim 9 was not cancelled but still in the claim listing. 

Appropriate correction is required.

	Claim Rejections - 35 USC § 103
Claims 1, 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al. (US-2022/0093094; hereafter Krishnan) in view of Hazra et al. (US-2021/0158138; hereafter Hazra), and further in view of Strope et al. (US Pat. 8,182,392, referred to as Strope).

Regarding claims 1, 8 and 15, Krishana discloses a method, a system and a computer program product (Fig. 1A, a computer implemented dialog management system) for training an artificial intelligence (Al) of a voice response system (see Krishnan ¶ 67/67, 77/77, 121/114, 130/123, 489/449: [67/67] language processing component (sometimes also referred to as a spoken language understanding (SLU) component) includes an automatic speech recognition (ASR) component and a natural language understanding (NLU) component [voice response system]; --[77/67] system includes a language output component [that] includes a natural language generation (NLG) component and a text-to-speech (TTS) component [that] can generate text for purposes of TTS output to a user. NLG component may generate text corresponding to instructions corresponding to a particular action for the user to perform. The NLG component may include one or more trained models configured to output text [artificial intelligence (Al) of a voice response system] appropriate for a particular input; --[121/114] to create the feature vector operable by the system, a feature extractor may be used. The feature extractor may input ASR results which include results from the processing of the audio data by the speech recognition component; --[130/123] For purposes of the feature extractor processing and representing a word embedding in a feature vector, a word embedding of unknown length may be processed by a neural network with memory, such as an LSTM (long short term memory) network. Each vector of a word embedding may be processed by the LSTM [artificial intelligence (Al) of a voice response system]. --[489/449] Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network [method for training an artificial intelligence (Al)]),
receiving, by the voice response system from a user, a voice command to perform a requested action (see Krishnan ¶ 44/44: a system through device/remote system may receive a command to enter a multi-user dialog mode. This may include a spoken utterance with a standard wakeword such as “Alexa, enter multi-user dialog mode),
interpreting, by an AI model, the voice command (see Krishnan ¶ 121/114, 130/123, 361/337: [121/114] to create the feature vector operable by the system, a feature extractor may be used. The feature extractor may input ASR results which include results from the processing of the audio data by the speech recognition component; --[130/123] For purposes of the feature extractor processing and representing a word embedding in a feature vector, a word embedding of unknown length may be processed by a neural network with memory, such as an LSTM (long short term memory) network. Each vector of a word embedding may be processed by the LSTM [interpreting, by an AI model]; --[361/337] system may perform entity resolution for the anaphoric references to select that entity for further processing according to the user's spoken command. [the voice command] The entity may then, for example, be added to dialog data for further operations),
performing an action based on the interpretation of the voice command (see Krishnan ¶ 58 (provisional ONLY): a user may have requested the system provide some information regarding potential recipes for dinner [the voice command]. The system may determine a list of such entries, perform TTS to create output audio data corresponding to the list, and begin playback of the output audio corresponding to the list [performing an action based on the interpretation of the voice command] [¶ 58 of PGPUB uses a different yet similar list example]);
receiving non-verbal feedback from the user (see Krishnan ¶ 306/282: users may provide feedback [receiving feedback from the user] to the system indicating the user's satisfaction in the service skill responding to the user request/performing an action in response to the user request. The feedback may be solicited by the system(s). In some cases, the feedback may be explicit feedback and may be received as a text input, gestures, [non-verbal feedback] or other types of input); 
Krishnan does not teach:
and updating the AI model based on a determination that the non-verbal feedback indicates that the user is not satisfied with the action performed.
Hazra discloses:
and updating the AI model based on a determination that the non-verbal feedback indicates that the user is not satisfied with the action performed (see Hazra ¶ 41, 42: intelligent device prompts the user for a feedback on the output of the voice recognition system [the action performed]. The user's gesture feedback (e.g., hand gestures) [non-verbal feedback] is captured and recognized by the intelligent device. Based on the user's gesture feedback, a user rating (e.g., a positive user rating, a negative user rating, or a neutral user rating) is assigned, and the intelligent device labels the input vector accordingly [a determination that the non-verbal feedback indicates]. When the user rating is positive (e.g., the voice recognition result is accurate or satisfying for the user), there is no need to update the local neural network model, thus the input vector is not saved. On the other hand, when the user rating is negative or neutral (e.g., the voice recognition results is not accurate or not satisfying for the user) [that the user is not satisfied with the action performed], the intelligent device prompts the user to enter the correct output. The user may choose one of the options as the expected output, or may directly enter the expected output, e.g., using touch screen of the display device; --[42] Next, the intelligent device labels the input vector with the correct output the user chooses [based on], and saves the labeled input vector for training the local neural network model [updating the AI model]).
Krishnan and Hazra are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan to incorporate the disclosure of Hazra in order to allow users of a voice response system to interact naturally with the system without any mechanical devices, thereby improving interfacing with the system (see Hazra ¶ 2: Gesture recognition has received much attention recently as a means for improved human-to-machine interface. Gesture recognition may allow humans to interact naturally with computers without any mechanical devices).

	Krishnan does not explicitly disclose the newly added limitations in the amendment filed on 11/17/2022, but Strope discloses:
	receiving non-verbal feedback from the user, wherein the non-verbal feedback includes a textual search query received, from a smart device associated with the user, within a threshold time after the action is performed (Col. 4, lines 9-21; Col. 5, lines 18-33; Col. 9, lines 47-54; Col. 10, lines 54-63; Col. 11, lines 49-67; user types search queries after reviewing the incorrect results on a mobile device within a predetermined period; Fig. 1, #104 shows a mobile device, e.g., Col. 4, lines 32, a PDA or a smart phone);
	wherein the smart device is a separate device from the voice response system that is in communication with the voice response system (Fig. 1, Speech recognition engine #105 + Search Engine #106, corresponds to the claimed “the voice response system”; a mobile device (e.g. a PDA / a smart phone) #104 corresponds to the claimed “a smart device”);

Krishnan, Hazra and Strope are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan to incorporate the disclosure of Hazra in order to allow users of a voice response system to interact naturally with the system without any mechanical devices, thereby improving interfacing with the system (see Hazra ¶ 2: Gesture recognition has received much attention recently as a means for improved human-to-machine interface. Gesture recognition may allow humans to interact naturally with computers without any mechanical devices).

Claims 4, 6-7, 11, 13-14, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Hazra and Strope, and further in view of Ding et al. (US 2020/0294489; hereafter Ding).

Regarding claims 4, 11, and 18, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
monitoring a behavior of the user within a threshold time after the action is performed.
Ding discloses:
monitoring a behavior of the user within a threshold time after the action is performed (see Ding ¶ 39, 40, 78: [39] the user behavior log [monitoring a behavior of the user]is determined as the second behavior log if this log is generated after the time of the first behavior log and has a time interval to the time of the first behavior log, the time interval is less than a preset time threshold [within a threshold time after the action is performed], and this log belongs to the same user; --[40] user speech and the corresponding speech recognition result in each piece of data are determined as a positive feedback sample or a negative feedback sample, based on a relationship between the first behavior log and the second behavior log in the corresponding piece of data; --[78] [¶ 78 gives an example embodiment where a user inputs a location/route search request [after the action is performed]]).
Where the motivation to combine is the same as previously presented.
		
Regarding claims 6, 13, and 20, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra and Strope do not teach:
creating a correlation among a determined change in behavior of the user within a threshold time after the action is performed and the interpreted voice command to identify a portion of the voice command that is being wrongly interpreted.
Ding discloses:
creating a correlation among a determined change in behavior of the user within a threshold time after the action is performed and the interpreted voice command to identify a portion of the voice command that is being wrongly interpreted (see Ding ¶ 72, 76: [72] the user speech and the corresponding speech recognition result in the corresponding piece of data are determined as the negative feedback sample [creating a correlation], in response to determining that a user behavior recorded in the second behavior log is a modification behavior on a user behavior recorded in the first behavior log [among a determined change in behavior of the user] during a predetermined period of time [within a threshold time]; --[76] the user inputs a speech A of “Juyuan” [voice command/behavior], and a speech recognition result, i.e., an erroneous text W “unexpectedly” is obtained through speech recognition [and the interpreted voice command]. The route search request is launched [the action is performed].  At this time, the user re-inputs a modification speech B of “Juyuan” [change in behavior] using the speech function, and a modification recognition result corresponding to the modification speech, i.e., anew text R “Juyuan” is obtained through speech recognition. A modification recognition result corresponding to the modification speech is obtained, that is, a new text R. When the text W and the text R are different, and the text W and the text R satisfy the preset semantic similarity condition, that is, the text W and the text R face are different in words but the semantic level of both are very similar [to identify a portion of the voice command that is being wrongly interpreted], the speech recognition result of the user speech A is considered to be wrong [creating a correlation after the action is performed]. Then, user speech A, text W, and text R are used as the negative feedback sample).
Where the motivation to combine is the same as previously presented.
Regarding claims 7 and 14, Krishnan in view of Hazra and Strope and further in view of Ding teach all the limitations of claims 6 and 13 above.
Ding further discloses:
comparing the portion of the voice command that is being wrongly interpreted to data from other users to identify a best AI model for the user (see Ding ¶ 34, 76: [34] The mining out the negative feedback samples may include: mining out the audios and their texts erroneously recognized [the portion of the voice command that is being wrongly interpreted] through the existing speech recognition model based on user behaviors, to generate the training corpus. Therefore, the training corpus for speech recognition may be automatically and purposefully mined out based on the historical behaviors of the users [comparing to data from other users], and provided to the subsequent training on the speech recognition model, thereby further improving the speech recognition effect [to identify a best AI model for the user]; --[76] When the text W and the text R are different, and the text W and the text R satisfy the preset semantic similarity condition, that is, the text W and the text R face are different in words but the semantic level of both are very similar, the speech recognition result of the user speech A is considered to be wrong. Then, user speech A, text W, and text R are used as the negative feedback sample.
Where the motivation to combine is the same as previously presented.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Hazra  and Strope and further in view of Jones et al. (US 2008/0021884; hereafter Jones).
Regarding claims 3, 10, and 17, Krishnan in view of Hazra  and Strope teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
the non-verbal feedback is provided by the user in response to a request from the voice response system for the user to type the voice command.
Jones discloses:
the non-verbal feedback is provided by the user in response to a request from the voice response system for the user to type the voice command (see Jones ¶ 35, 43, 54, 123, FIG 9B: [35] oral speech queries [voice command] by telephone are stored in the system database and converted into digital text queries by a speech translation server; --[43] An appropriate interface, such as a graphical user interface (GUI) for the computer system extracts a query from a user and transmits the query to the server; --[54] the user reviews the search results [feedback is provided by the user] and then does or does not "accept" the results, user respond[s] to a request for a user response, such as a pop-up or voice prompt [in response to a request from the voice response system], the user entering a revised, different or follow-up query [for the user to type the voice command], user can register dissatisfaction with the query results by a rejection button on the user GUI [non-verbal feedback]; --[123] items that are typed by the user [type] in entry box 477 [FIG 9B] may show up in the chat frame [FIG 9B shows an example GUI, above entry box 477, it reads “RESPOND HERE:”]).
Krishnan, Hazra, and Jones are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan and Hazra to incorporate the disclosure of Jones in order to provide relevant answers quickly to user voice queries (see Jones ¶ 118: searchers who sign-up for particular keywords are preferably motivated to collect excellent resources in order to provide relevant answers quickly to users).

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan in view of Hazra,  Strope, Ding and further in view of Jacobs et al. (US 10,460,342; hereafter Jacobs).
Regarding claims 5, 12, and 19, Krishnan in view of Hazra, Strope and Ding teach all the limitations of claims 4, 11, and 18 above.
The combination of Krishnan, Hazra, and Ding do not teach:
the non-verbal feedback includes a determination that a user is one of laughing or frustrated during the threshold time after the action is performed.
Jacobs discloses:
the non-verbal feedback includes a determination that a user is one of laughing or frustrated during the threshold time after the action is performed (see Jacobs col 5:56 – col 6:3, col 6:23-29, col 12:10-19, col 17:44-47: [col 5:56 – col 6:3] Product program usage database may comprise actions taken by individual users in the course of using one or more product programs. A product program may communicate actions derived from usage of a product program, to product program usage database. Such actions may comprise product program commands [the action]; --[col 6:23-29] Action-trigger database may comprise a list of actions that users may take in the course of using one or more product programs and corresponding triggers that targeted-advertising module may utilize to determine appropriate targeted advertising for each user. For example, a particularly hard tap on a touchpad [non-verbal feedback] may correspond to a trigger associated with user frustration [the non-verbal feedback includes a determination that a user is one of frustrated]; --[col 12:10-19] if a trigger associated with user frustration does not exceed an immediate advertisement time threshold [during the threshold time after the action is performed], targeted-advertising module may select an advertisement related to user frustration; --[col 17:40-47] User may enter commands and/or other information into computer system via [an] input device. Examples of an input device include a keyboard, a pointing device, an audio input device (a voice response system)).
Krishnan, Hazra, Ding, and Jones are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan, Hazra, and Ding to incorporate the disclosure of Jones in order to recognize a broader range of nonverbal user input data as potential feedback indicators (see Jacobs col 1:58: identifying an advertisement trigger from the end-user usage data).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359.  The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JIALONG HE/Primary Examiner, Art Unit 2659