DETAILED ACTION
This communication is in response to the application filed on 01 December 2020.  Claims 1-20 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01 December 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the IDS is being considered by the examiner.

Claim Objections
Claim 8 is objected to because of the following informalities: 
Line 1 of claim 8 should be changed from “system for ning an artificial intelligence” to “system for training an artificial intelligence” because “ning” in not a recognizable term of art.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claim 17 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Line 2 of claim 17 recites the limitation “from the voice response system.”  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al. (US-2022/0093094; hereafter Krishnan) in view of Hazra et al. (US-2021/0158138; hereafter Hazra).
Regarding claim 1, all paragraph citations for Krishnan are formatted with two paragraph numbers separated by a slash (XXX/YYY).  The first number (XXX) represents the paragraph number in the PG Publication number listed above and the second number (YYY) represents the paragraph number in the Provisional Application of Krishnan (63/081,012, filed on 09/21/2020) such that the format is PGPUB/Provisional.
Furthermore, regarding claim 1,
Krishnan teaches:
A method for training an artificial intelligence (Al) of a voice response system (see Krishnan ¶ 67/67, 77/77, 121/114, 130/123, 489/449: [67/67] language processing component (sometimes also referred to as a spoken language understanding (SLU) component) includes an automatic speech recognition (ASR) component and a natural language understanding (NLU) component [voice response system]; --[77/67] system includes a language output component [that] includes a natural language generation (NLG) component and a text-to-speech (TTS) component [that] can generate text for purposes of TTS output to a user. NLG component may generate text corresponding to instructions corresponding to a particular action for the user to perform. The NLG component may include one or more trained models configured to output text [artificial intelligence (Al) of a voice response system] appropriate for a particular input; --[121/114] to create the feature vector operable by the system, a feature extractor may be used. The feature extractor may input ASR results which include results from the processing of the audio data by the speech recognition component; --[130/123] For purposes of the feature extractor processing and representing a word embedding in a feature vector, a word embedding of unknown length may be processed by a neural network with memory, such as an LSTM (long short term memory) network. Each vector of a word embedding may be processed by the LSTM [artificial intelligence (Al) of a voice response system]. --[489/449] Connection weights may be initially learned by the neural network during training, where given inputs are associated with known outputs. In a set of training data, a variety of training examples are fed into the network [method for training an artificial intelligence (Al)]),
receiving, by the voice response system from a user, a voice command to perform a requested action (see Krishnan ¶ 44/44: a system through device/remote system may receive a command to enter a multi-user dialog mode. This may include a spoken utterance with a standard wakeword such as “Alexa, enter multi-user dialog mode),
interpreting, by an AI model, the voice command (see Krishnan ¶ 121/114, 130/123, 361/337: [121/114] to create the feature vector operable by the system, a feature extractor may be used. The feature extractor may input ASR results which include results from the processing of the audio data by the speech recognition component; --[130/123] For purposes of the feature extractor processing and representing a word embedding in a feature vector, a word embedding of unknown length may be processed by a neural network with memory, such as an LSTM (long short term memory) network. Each vector of a word embedding may be processed by the LSTM [interpreting, by an AI model]; --[361/337] system may perform entity resolution for the anaphoric references to select that entity for further processing according to the user's spoken command. [the voice command] The entity may then, for example, be added to dialog data for further operations),
performing an action based on the interpretation of the voice command (see Krishnan ¶ 58 (provisional ONLY): a user may have requested the system provide some information regarding potential recipes for dinner [the voice command]. The system may determine a list of such entries, perform TTS to create output audio data corresponding to the list, and begin playback of the output audio corresponding to the list [performing an action based on the interpretation of the voice command] [¶ 58 of PGPUB uses a different yet similar list example]);
receiving non-verbal feedback from the user (see Krishnan ¶ 306/282: users may provide feedback [receiving feedback from the user] to the system indicating the user's satisfaction in the service skill responding to the user request/performing an action in response to the user request. The feedback may be solicited by the system(s). In some cases, the feedback may be explicit feedback and may be received as a text input, gestures, [non-verbal feedback] or other types of input); 
Krishnan does not teach:
and updating the AI model based on a determination that the non-verbal feedback indicates that the user is not satisfied with the action performed.
Hazra discloses:
and updating the AI model based on a determination that the non-verbal feedback indicates that the user is not satisfied with the action performed (see Hazra ¶ 41, 42: intelligent device prompts the user for a feedback on the output of the voice recognition system [the action performed]. The user's gesture feedback (e.g., hand gestures) [non-verbal feedback] is captured and recognized by the intelligent device. Based on the user's gesture feedback, a user rating (e.g., a positive user rating, a negative user rating, or a neutral user rating) is assigned, and the intelligent device labels the input vector accordingly [a determination that the non-verbal feedback indicates]. When the user rating is positive (e.g., the voice recognition result is accurate or satisfying for the user), there is no need to update the local neural network model, thus the input vector is not saved. On the other hand, when the user rating is negative or neutral (e.g., the voice recognition results is not accurate or not satisfying for the user) [that the user is not satisfied with the action performed], the intelligent device prompts the user to enter the correct output. The user may choose one of the options as the expected output, or may directly enter the expected output, e.g., using touch screen of the display device; --[42] Next, the intelligent device labels the input vector with the correct output the user chooses [based on], and saves the labeled input vector for training the local neural network model [updating the AI model]).
Krishnan and Hazra are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan to incorporate the disclosure of Hazra in order to allow users of a voice response system to interact naturally with the system without any mechanical devices, thereby improving interfacing with the system (see Hazra ¶ 2: Gesture recognition has received much attention recently as a means for improved human-to-machine interface. Gesture recognition may allow humans to interact naturally with computers without any mechanical devices).

Regarding claims 8 and 15, 
method claim 1, system claim 8, and computer program product (CPP) claim 15 are related as a system, CPP, and method of using the same, with each claimed element's function corresponding to the claimed method step. Accordingly claims 8 and 15 are similarly rejected under the same rationale as applied above with respect to the method claim.

Furthermore, regarding Claim 8:
Krishnan teaches:
a memory having computer-readable instructions; and a processor for executing the computer-readable instructions, the computer- readable instructions including instructions (see Krishnan ¶ 522/482, 527/487: [522/482] components of the device(s) may include processors, memory, and/or storage; --[527/487] aspects of the disclosed system may be implemented as a computer method or as an article of manufacture such as a memory device. The computer readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure).

Furthermore, regarding Claim 15:
Krishnan teaches:
computer-readable storage medium having program instructions embodied therewith the program instructions executable by a computer processor to cause the computer processor to perform a method (see Krishnan ¶ 522/482, 527/487: [522/482] components of the device(s) may include processors, memory, and/or storage; --[527/487] aspects of the disclosed system may be implemented as an article of manufacture such as a non-transitory computer readable storage medium. The computer readable storage medium may be readable by a computer and may comprise instructions for causing a computer or other device to perform processes described in the present disclosure).

Claims 2, 4, 6-7, 9, 11, 13-14, 16, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al. (US-2022/0093094; hereafter Krishnan) in view of Hazra et al. (US-2021/0158138; hereafter Hazra) and further in view of Ding et al. (US 2020/0294489; hereafter Ding).
Regarding claims 2, 9, and 16, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
the non-verbal feedback is a textual search query received from the user within a threshold time after the action is performed.
Ding discloses:
the non-verbal feedback is a textual search query received from the user within a threshold time after the action is performed (see Ding ¶ 39, 40, 41, 46: [39] the user behavior log is determined as the second behavior log if this log is generated after the time of the first behavior log and has a time interval to the time of the first behavior log, the time interval is less than a preset time threshold [within a threshold time after the action is performed], and this log belongs to the same user; --[40] user speech and the corresponding speech recognition result in each piece of data are determined as a positive feedback sample or a negative feedback sample, based on a relationship between the first behavior log and the second behavior log in the corresponding piece of data; --[41] negative feedback samples may be the data erroneously recognized; --[46] After the user uses the speech function [and] he/she has a modification behavior in a short period of time. There are two types of modification behaviors: re-inputting the modification using the speech function and inputting the modification manually; --[78] [¶ 78 gives an example embodiment where a user inputs a location/route search request [after the action is performed]] At this time, the user manually inputs a correct text “Juyuan” [non-verbal feedback is a textual search query received from the user]). 
Krishnan, Hazra, and Ding are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan and Hazra to incorporate the disclosure of Ding in order to reduce the required number of manually labeled training examples in a training dataset, thereby reducing the iteration period and resources required by manual labeling (see Ding ¶ 4: the training corpus for speech recognition mainly comes from the manually-labeled random audios, which leads to two main problems. One is that, the iteration period of the speech recognition model is too long due to manual labeling, and the resource consumption is severe).
	
Regarding claims 4, 11, and 18, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
monitoring a behavior of the user within a threshold time after the action is performed.
Ding discloses:
monitoring a behavior of the user within a threshold time after the action is performed (see Ding ¶ 39, 40, 78: [39] the user behavior log [monitoring a behavior of the user]is determined as the second behavior log if this log is generated after the time of the first behavior log and has a time interval to the time of the first behavior log, the time interval is less than a preset time threshold [within a threshold time after the action is performed], and this log belongs to the same user; --[40] user speech and the corresponding speech recognition result in each piece of data are determined as a positive feedback sample or a negative feedback sample, based on a relationship between the first behavior log and the second behavior log in the corresponding piece of data; --[78] [¶ 78 gives an example embodiment where a user inputs a location/route search request [after the action is performed]]).
Where the motivation to combine is the same as previously presented.
		
Regarding claims 6, 13, and 20, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
creating a correlation among a determined change in behavior of the user within a threshold time after the action is performed and the interpreted voice command to identify a portion of the voice command that is being wrongly interpreted.
Ding discloses:
creating a correlation among a determined change in behavior of the user within a threshold time after the action is performed and the interpreted voice command to identify a portion of the voice command that is being wrongly interpreted (see Ding ¶ 72, 76: [72] the user speech and the corresponding speech recognition result in the corresponding piece of data are determined as the negative feedback sample [creating a correlation], in response to determining that a user behavior recorded in the second behavior log is a modification behavior on a user behavior recorded in the first behavior log [among a determined change in behavior of the user] during a predetermined period of time [within a threshold time]; --[76] the user inputs a speech A of “Juyuan” [voice command/behavior], and a speech recognition result, i.e., an erroneous text W “unexpectedly” is obtained through speech recognition [and the interpreted voice command]. The route search request is launched [the action is performed].  At this time, the user re-inputs a modification speech B of “Juyuan” [change in behavior] using the speech function, and a modification recognition result corresponding to the modification speech, i.e., anew text R “Juyuan” is obtained through speech recognition. A modification recognition result corresponding to the modification speech is obtained, that is, a new text R. When the text W and the text R are different, and the text W and the text R satisfy the preset semantic similarity condition, that is, the text W and the text R face are different in words but the semantic level of both are very similar [to identify a portion of the voice command that is being wrongly interpreted], the speech recognition result of the user speech A is considered to be wrong [creating a correlation after the action is performed]. Then, user speech A, text W, and text R are used as the negative feedback sample).
Where the motivation to combine is the same as previously presented.
Regarding claims 7 and 14, Krishnan in view of Hazra and further in view of Ding teach all the limitations of claims 6 and 13 above.
Ding further discloses:
comparing the portion of the voice command that is being wrongly interpreted to data from other users to identify a best AI model for the user (see Ding ¶ 34, 76: [34] The mining out the negative feedback samples may include: mining out the audios and their texts erroneously recognized [the portion of the voice command that is being wrongly interpreted] through the existing speech recognition model based on user behaviors, to generate the training corpus. Therefore, the training corpus for speech recognition may be automatically and purposefully mined out based on the historical behaviors of the users [comparing to data from other users], and provided to the subsequent training on the speech recognition model, thereby further improving the speech recognition effect [to identify a best AI model for the user]; --[76] When the text W and the text R are different, and the text W and the text R satisfy the preset semantic similarity condition, that is, the text W and the text R face are different in words but the semantic level of both are very similar, the speech recognition result of the user speech A is considered to be wrong. Then, user speech A, text W, and text R are used as the negative feedback sample.
Where the motivation to combine is the same as previously presented.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al. (US-2022/0093094; hereafter Krishnan) in view of Hazra et al. (US-2021/0158138; hereafter Hazra) and further in view of Jones et al. (US 2008/0021884; hereafter Jones).
Regarding claims 3, 10, and 17, Krishnan in view of Hazra teach all the limitations of claims 1, 8, and 15 above.
The combination of Krishnan and Hazra do not teach:
the non-verbal feedback is provided by the user in response to a request from the voice response system for the user to type the voice command.
Jones discloses:
the non-verbal feedback is provided by the user in response to a request from the voice response system for the user to type the voice command (see Jones ¶ 35, 43, 54, 123, FIG 9B: [35] oral speech queries [voice command] by telephone are stored in the system database and converted into digital text queries by a speech translation server; --[43] An appropriate interface, such as a graphical user interface (GUI) for the computer system extracts a query from a user and transmits the query to the server; --[54] the user reviews the search results [feedback is provided by the user] and then does or does not "accept" the results, user respond[s] to a request for a user response, such as a pop-up or voice prompt [in response to a request from the voice response system], the user entering a revised, different or follow-up query [for the user to type the voice command], user can register dissatisfaction with the query results by a rejection button on the user GUI [non-verbal feedback]; --[123] items that are typed by the user [type] in entry box 477 [FIG 9B] may show up in the chat frame [FIG 9B shows an example GUI, above entry box 477, it reads “RESPOND HERE:”]).
Krishnan, Hazra, and Jones are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan and Hazra to incorporate the disclosure of Jones in order to provide relevant answers quickly to user voice queries (see Jones ¶ 118: searchers who sign-up for particular keywords are preferably motivated to collect excellent resources in order to provide relevant answers quickly to users).

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al. (US-2022/0093094; hereafter Krishnan) in view of Hazra et al. (US-2021/0158138; hereafter Hazra) in view of Ding (US 2020/0294489; hereafter Ding) and further in view of Jacobs et al. (US 10,460,342; hereafter Jacobs).
Regarding claims 5, 12, and 19, Krishnan in view of Hazra in view of Ding teach all the limitations of claims 4, 11, and 18 above.
The combination of Krishnan, Hazra, and Ding do not teach:
the non-verbal feedback includes a determination that a user is one of laughing or frustrated during the threshold time after the action is performed.
Jacobs discloses:
the non-verbal feedback includes a determination that a user is one of laughing or frustrated during the threshold time after the action is performed (see Jacobs col 5:56 – col 6:3, col 6:23-29, col 12:10-19, col 17:44-47: [col 5:56 – col 6:3] Product program usage database may comprise actions taken by individual users in the course of using one or more product programs. A product program may communicate actions derived from usage of a product program, to product program usage database. Such actions may comprise product program commands [the action]; --[col 6:23-29] Action-trigger database may comprise a list of actions that users may take in the course of using one or more product programs and corresponding triggers that targeted-advertising module may utilize to determine appropriate targeted advertising for each user. For example, a particularly hard tap on a touchpad [non-verbal feedback] may correspond to a trigger associated with user frustration [the non-verbal feedback includes a determination that a user is one of frustrated]; --[col 12:10-19] if a trigger associated with user frustration does not exceed an immediate advertisement time threshold [during the threshold time after the action is performed], targeted-advertising module may select an advertisement related to user frustration; --[col 17:40-47] User may enter commands and/or other information into computer system via [an] input device. Examples of an input device include a keyboard, a pointing device, an audio input device (a voice response system)).
Krishnan, Hazra, Ding, and Jones are considered to be analogous because they are from the field of user feedback identification.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention(s) to have modified Krishnan, Hazra, and Ding to incorporate the disclosure of Jones in order to recognize a broader range of nonverbal user input data as potential feedback indicators (see Jacobs col 1:58: identifying an advertisement trigger from the end-user usage data).

Conclusion	
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure.
	Yang et al. (CN-111353033) is cited to disclose a feedback identification system.    
Lin et al. (WO2020119030) is cited to disclose a feedback identification system.
Waibel et al. (US-5,855,000) is cited to disclose correcting speech recognition errors and updating model based on users’ gesture feedback.
Waibel et al. (US-2011/0307241) is cited to disclose a feedback identification system.
Miller et al. (US20190005021) is cited to disclose a non-verbal feedback identification system.
Kim et al. (US-2019/0355351)  is cited to disclose updating an AI model based on a user’s gesture.
Liden et al. (US-2019/0340527) is cited to disclose updating an AI model based on a developer’s click.
Any inquiry concerning this communication or earlier communications from Examiner should be directed to AARON G. ZELLER whose telephone number is (571) 272-5765.  Examiner can normally be reached Monday - Thursday 10 AM - 7:30 PM and every other Friday 10:00 AM - 6:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach Examiner by telephone are unsuccessful, Examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AARON G ZELLER/
Examiner, Art Unit 2659
                                                                                                                                                                                            11 August 2022


/JIALONG HE/Primary Examiner, Art Unit 2659