Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 09/15/2022 has been entered. Claims 1-4, 6, 8-11, 13, 15-19, and 21 remain pending in the application. Applicant’s amendments to the Specification and Claims have overcome each and every objection previously set forth in the Non-Final Office Action mailed 06/15/2022.
Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Applicant’s amendments to the claims alter the scope of the invention. Independent claims 1, 9, and 16 now contain the limitations “determining an intended hotword corresponding to the failed hotword attempt, wherein the intended hotword is not included in the first spoken utterance and the second spoken utterance” which had not been previously considered. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 9-11, 13, and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims do not fall within at least one of the four categories of patent eligible subject matter because claim 9 is directed to "a computer program product comprising one or more computer-readable storage media," as recited in lines 1-2 of the claim. The claim is directed to software per se (see MPEP § 2106, subsection I). The examiner recommends amending the claim to recite "a computer program product comprising one or more non-transitory computer-readable storage media" to resolve the rejection.
Claims 10-11, 13 and 15 are rejected due to dependency on claim 9.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9-11, 13, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9 recites the limitation "the one or more non-transitory computer-readable storage media" in line 3 of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Claims 10-11, 13, and 15 are rejected due to dependence upon claim 9.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 6, 8-11, 13, 15-19, and 21  are rejected under 35 U.S.C. 103 as being unpatentable over Iso-Sipilä et al. (Patent No. US 6697782 B1 ), hereinafter Iso-Sipilä, in view of Prager (Patent No. US 5255386 A).
Regarding claim 1, Iso-Sipilä teaches a method (Abstract, line 1) implemented by one or more processors (Spec. Col. 8, lines 59-67), the method comprising: 
receiving, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
processing the first audio data using one or more machine learning models to generate a first predicted output that indicates a probability of one or more hotwords being present in the first audio data (Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 7, lines 30-37; the system processes the speech command, i.e. the first audio data, to generate a confidence value, i.e. a first predicted output that indicates a probability, that the speech command matches a command word, i.e. a hotword. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determining that the first predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Spec. Col. 7 lines 46-49; the confidence value, i.e. the first predicted output, satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receiving, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
processing the second audio data using the one or more machine learning models to generate a second predicted output that indicates a probability of the one or more hotwords being present in the second audio data (Spec. Col. 7 lines 56-60; the process detailed above for generating a confidence value with respect to the first audio data is repeated for the second audio data); 
determining that the second predicted output satisfies the secondary threshold but does not satisfy the primary threshold (Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the first predicted output and the second predicted output satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identifying a failed hotword attempt (Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt; 
providing a hint that is responsive to the failed hotword attempts comprising displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword (Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again. Spec. Col. 4, lines 17-19; upon two uncertain recognition results for a first and second audio data, the device displays a hint “Did you say yes?” including the intended hotword “yes.” As the device asks the user the hint, it is considered that the device gives an audio response that includes the hotword).
Iso-Sipilä however, fails to explicitly teach determining an intended hotword corresponding to the failed hotword attempt, wherein the intended hotword is not included in the first spoken utterance and the second spoken utterance.
Prager teaches a data processing system which suggests a valid command to a user when the user enters a question or an erroneous command by semantically comparing the intent of the erroneous command to other intents to determine the valid command (Abstract). 
Adapting the method as taught by Iso-Sipilä to incorporate the features of Prager provides a method (Iso-Sipilä: Abstract, line 1) implemented by one or more processors (Iso-Sipilä: Spec. Col. 8, lines 59-67), the method comprising: 
receiving, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Iso-Sipilä: Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
processing the first audio data using one or more machine learning models to generate a first predicted output that indicates a probability of one or more hotwords being present in the first audio data (Iso-Sipilä: Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 7, lines 30-37; the system processes the speech command, i.e. the first audio data, to generate a confidence value, i.e. a first predicted output that indicates a probability, that the speech command matches a command word, i.e. a hotword. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determining that the first predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Iso-Sipilä: Spec. Col. 7 lines 46-49; the confidence value, i.e. the first predicted output, satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receiving, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Iso-Sipilä: Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
processing the second audio data using the one or more machine learning models to generate a second predicted output that indicates a probability of the one or more hotwords being present in the second audio data (Iso-Sipilä: Spec. Col. 7 lines 56-60; the process detailed above for generating a confidence value with respect to the first audio data is repeated for the second audio data); 
determining that the second predicted output satisfies the secondary threshold but does not satisfy the primary threshold (Iso-Sipilä: Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the first predicted output and the second predicted output satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Iso-Sipilä: Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identifying a failed hotword attempt (Iso-Sipilä: Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt; 
determining an intended hotword corresponding to the failed hotword attempt, wherein the intended hotword is not included in the first spoken utterance and the second spoken utterance (the method of Iso-Sipilä for determining a failed hotword attempt and an intended hotword: Spec Col. 10, lines 10-24; the control unit compares the first spoken utterance and the second spoken utterance and determines a similarity between them based on calculated distances between the words. If the words are sufficiently similar, the control unit determines the utterances were a failed attempt to speak a certain command word, i.e. determines an intended hotword corresponding to the failed hotword attempt, now adapted to use the features as taught by Prager such that the intended hotword is not included in the first spoken utterance and the second spoken utterance. Prager: Spec., Col. 5, lines 18-20: an erroneous command may be entered due to the user forgetting part or all of the syntax of the command, i.e. the intended command is not included in the user input. Col. 3 line 65 – Col. 4 line 6: when the user enters an erroneous command, the system looks up the intent of that command and semantically compares such intent with other intents. When another intent is found, based on such comparison, to be within a predetermined degree of similarity, the command defined by such other intent is created as a suggestion for the user); and
providing a hint that is responsive to the failed hotword attempts comprising displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword (Iso-Sipilä: Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again. Spec. Col. 4, lines 17-19; upon two uncertain recognition results for a first and second audio data, the device displays a hint “Did you say yes?” including the intended hotword “yes.” This is adapted to use the features as taught by Prager above such that the intended hotword was not included in the first spoken utterance and the second spoken utterance. As the device asks the user the hint, it is considered that the device gives an audio response that includes the hotword).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Iso-Sipilä to incorporate the teachings of Prager. Both Iso-Sipilä and Prager are analogous as they are directed towards aiding a user entering natural language commands to control a device when the user has entered the commands incorrectly (Prager, Spec. Col. 5, lines 40-53; the natural language analyzer parses the user’s natural language input. The output of the analyzer is used by the inference engine and rules base to produce user output, which is displayed to the user with suggestions if the user selects to receive help). Prager teaches a method to improve intelligent help systems (Spec. Col. 1, lines 8-14) by guiding users when they make errors, as the errors cause the system to not execute commands successfully (Spec. Col. 4 line 64 – Col. 5 line 5: the system only executes actions for valid commands and sends the user an error message otherwise). Iso-Sipilä also recognizes that issues with the input of commands can cause undesirable consequences in the functioning of the device (Spec. Col. 1, lines 47-51). Therefore, it would have been obvious to combine the features of both disclosures, as Prager’s invention provides another method to improve a user’s experience by solving a problem with erroneous input through addressing user error. 

Regarding claim 2, the combination further teaches wherein identifying the failed hotword attempt is further in response to determining that a similarity between the first spoken utterance and the second spoken utterance exceeds a similarity threshold (Iso-Sipilä: Spec Col. 10, lines 10-24; the control unit compares the first spoken utterance and the second spoken utterance and determines a similarity between them based on calculated distances between the words. If the words are sufficiently similar, the control unit determines the utterances were a failed attempt to speak a certain command word.).

Regarding claim 3, the combination further teaches wherein identifying the failed hotword attempt is further in response to determining that the probability indicated by the first predicted output and the probability indicated by the second predicted output correspond to a same hotword of the one or more hotwords (Iso-Sipilä: Spec Col. 8, lines 4-11; the control unit determines the utterances were a failed attempt to speak the same command word based on a comparison between the first and second audio, which is only performed based on the probabilities indicated by the first and second predicted output indicating uncertain confidence that the first and second audio data matched the same command, i.e. a same hotword).

Regarding claim 4, the combination further teaches determining, using a model conditioned on acoustic features, that the first audio data and the second audio data comprise a command (Iso-Sipilä: Spec. Col. 1 lines 6-12; the disclosure is directed to recognizing commands in speech utterances. Spec. Col. 2, lines 3-42; the HMM used by the device to determine that the first audio data and the second audio data comprise a command is conditioned on acoustic features), 
wherein identifying the failed hotword attempt is further in response to the first audio data and the second audio data comprising the command (Iso-Sipilä: Spec. Col. 10, lines 19-24; identifying the failed hotword attempt is in response to the first audio data and the second audio data comprising the command).

Regarding claim 6, the combination further teaches the method according to claim 1, wherein the intended hotword is determined based on acoustic similarity between at least a portion of the first audio data, at least a portion of the second audio data, and the intended hotword (Iso-Sipilä: Spec. Col. 9, lines 28-38; the device determines acoustic similarity between the first audio data and the intended hotword by converting the input uttered command word to a feature vector representation and calculating the probability that it corresponds to a command word, i.e. an intended hotword, in a vocabulary. Col. 9, line 64-Col. 10 line 3; the control device repeats this process for the second audio data. Col. 10, lines 9-13; the control unit compares the feature vectors of the first and second audio data to each other to determine a similarity between them in order to determine the intended hotword).

Regarding claim 8, the combination further teaches the method according to claim 1, further comprising performing an action corresponding to the intended hotword (Iso-Sipilä: Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. Col. 9, lines 48-56; the device executes the command based on the control signal, i.e. it performs an action corresponding to the intended hotword).

Regarding claim 9, Iso-Sipilä teaches a computer program product comprising one or more computer-readable storage media having program instructions collectively stored on the one or more non-transitory computer- readable storage media (Spec. Col. 8, lines 59-67), the program instructions executable to: 39Attorney Docket No. ZS202-20828 
receive, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
process the first audio data using each of a plurality of classes in a machine learning model to generate a corresponding probability associated with the first audio data, each of the classes being associated with a corresponding hotword of a plurality of hotwords and each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the first audio data (Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 2, lines 23-36; each reference word, which corresponds to commands or hotwords, in a plurality of reference words has an HMM model, or class. Input speech, i.e. first audio data, is processed such that each HMM class calculates a corresponding probability of that reference word being present in the first audio data. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determine that the probability of one of the plurality of hotwords being present in the first audio data satisfies a secondary threshold that is less indicative of the one of the plurality of hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Spec. Col. 7, lines 29-37; probabilities are determined on the basis of the command word uttered by the user for different command words in the vocabulary of the speech recognition device. The command word with the greatest probability is selected as the preliminary result. Spec. Col. 7 lines 46-49; the confidence value, i.e. the probability, of the preliminary command word satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receive, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
process the second audio data using each of the plurality of classes in the machine learning model to generate a corresponding probability associated with the second audio data, each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the second audio data (Spec. Col. 7 lines 56-60; the process detailed above for processing the first audio data is repeated for the second audio data); 
determine that the probability of the one of the plurality of hotwords being present in the second audio data satisfies the secondary threshold but does not satisfy the primary threshold (Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the probability of the one of the plurality of hotwords being present in the first audio data satisfying the secondary threshold but not satisfying the primary threshold and the probability of the one of the plurality of hotwords being present in the second audio data satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identify a failed hotword attempt (Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt; 
providing a hint that is responsive to the failed hotword attempts comprising displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword (Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again. Spec. Col. 4, lines 17-19; upon two uncertain recognition results for a first and second audio data, the device displays a hint “Did you say yes?” including the intended hotword “yes.” As the device asks the user the hint, it is considered that the device gives an audio response that includes the hotword).
Iso-Sipilä however, fails to explicitly teach determining an intended hotword corresponding to the failed hotword attempt, wherein the intended hotword is not included in the first spoken utterance and the second spoken utterance.
Prager teaches a data processing system which suggests a valid command to a user when the user enters a question or an erroneous command by semantically comparing the intent of the erroneous command to other intents to determine the valid command (Abstract). 
Adapting the computer program product as taught by Iso-Sipilä to incorporate the features of Prager provides a computer program product comprising one or more computer-readable storage media having program instructions collectively stored on the one or more non-transitory computer- readable storage media (Iso-Sipilä: Spec. Col. 8, lines 59-67), the program instructions executable to: 39Attorney Docket No. ZS202-20828 
receive, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Iso-Sipilä: Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
process the first audio data using each of a plurality of classes in a machine learning model to generate a corresponding probability associated with the first audio data, each of the classes being associated with a corresponding hotword of a plurality of hotwords and each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the first audio data (Iso-Sipilä: Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 2, lines 23-36; each reference word, which corresponds to commands or hotwords, in a plurality of reference words has an HMM model, or class. Input speech, i.e. first audio data, is processed such that each HMM class calculates a corresponding probability of that reference word being present in the first audio data. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determine that the probability of one of the plurality of hotwords being present in the first audio data satisfies a secondary threshold that is less indicative of the one of the plurality of hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Iso-Sipilä: Spec. Col. 7, lines 29-37; probabilities are determined on the basis of the command word uttered by the user for different command words in the vocabulary of the speech recognition device. The command word with the greatest probability is selected as the preliminary result. Spec. Col. 7 lines 46-49; the confidence value, i.e. the probability, of the preliminary command word satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receive, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Iso-Sipilä: Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
process the second audio data using each of the plurality of classes in the machine learning model to generate a corresponding probability associated with the second audio data, each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the second audio data (Iso-Sipilä: Spec. Col. 7 lines 56-60; the process detailed above for processing the first audio data is repeated for the second audio data); 
determine that the probability of the one of the plurality of hotwords being present in the second audio data satisfies the secondary threshold but does not satisfy the primary threshold (Iso-Sipilä: Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the probability of the one of the plurality of hotwords being present in the first audio data satisfying the secondary threshold but not satisfying the primary threshold and the probability of the one of the plurality of hotwords being present in the second audio data satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Iso-Sipilä: Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identify a failed hotword attempt (Iso-Sipilä: Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt; 
determining an intended hotword corresponding to the failed hotword attempt, wherein the intended hotword is not included in the first spoken utterance and the second spoken utterance (the method of Iso-Sipilä for determining a failed hotword attempt and an intended hotword: Spec Col. 10, lines 10-24; the control unit compares the first spoken utterance and the second spoken utterance and determines a similarity between them based on calculated distances between the words. If the words are sufficiently similar, the control unit determines the utterances were a failed attempt to speak a certain command word, i.e. determines an intended hotword corresponding to the failed hotword attempt, now adapted to use the features as taught by Prager such that the intended hotword is not included in the first spoken utterance and the second spoken utterance. Prager: Spec., Col. 5, lines 18-20: an erroneous command may be entered due to the user forgetting part or all of the syntax of the command, i.e. the intended command is not included in the user input. Col. 3 line 65 – Col. 4 line 6: when the user enters an erroneous command, the system looks up the intent of that command and semantically compares such intent with other intents. When another intent is found, based on such comparison, to be within a predetermined degree of similarity, the command defined by such other intent is created as a suggestion for the user); and
providing a hint that is responsive to the failed hotword attempts comprising displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword (Iso-Sipilä: Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again. Spec. Col. 4, lines 17-19; upon two uncertain recognition results for a first and second audio data, the device displays a hint “Did you say yes?” including the intended hotword “yes.” This is adapted to use the features as taught by Prager above such that the intended hotword was not included in the first spoken utterance and the second spoken utterance. As the device asks the user the hint, it is considered that the device gives an audio response that includes the hotword).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Iso-Sipilä to incorporate the teachings of Prager. Both Iso-Sipilä and Prager are analogous as they are directed towards aiding a user entering natural language commands to control a device when the user has entered the commands incorrectly (Prager, Spec. Col. 5, lines 40-53; the natural language analyzer parses the user’s natural language input. The output of the analyzer is used by the inference engine and rules base to produce user output, which is displayed to the user with suggestions if the user selects to receive help). Prager teaches a method to improve intelligent help systems (Spec. Col. 1, lines 8-14) by guiding users when they make errors, as the errors cause the system to not execute commands successfully (Spec. Col. 4 line 64 – Col. 5 line 5: the system only executes actions for valid commands and sends the user an error message otherwise). Iso-Sipilä also recognizes that issues with the input of commands can cause undesirable consequences in the functioning of the device (Spec. Col. 1, lines 47-51). Therefore, it would have been obvious to combine the features of both disclosures, as Prager’s invention provides another method to improve a user’s experience by solving a problem with erroneous input through addressing user error. 

Regarding claim 10, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 2, and is rejected on the same grounds.

Regarding claim 11, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 4, and is rejected on the same grounds.

Regarding claim 13, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 6, and is rejected on the same grounds.

Regarding claim 15, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 8, and is rejected on the same grounds.

Regarding claim 16, the claim is directed to a system comprising: 41Attorney Docket No. ZS202-20828 
a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media, the program instructions executable to perform the claimed method of claim 1.
Iso-Sipilä teaches , a system comprising: 41Attorney Docket No. ZS202-20828 
a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media, the program instructions executable to perform the claimed method of claim 1 (Spec. Col. 8, lines 59-67) therefore claim 16 is rejected on the same grounds.

Regarding claim 17, the claim is directed to the system according to claim 16 for performing the claimed method of claim 2, and is rejected on the same grounds.

Regarding claim 18, the claim is directed to the system according to claim 16 for performing the claimed method of claim 3, and is rejected on the same grounds.

Regarding claim 19, the claim is directed to the system according to claim 16 for performing the claimed method of claim 4, and is rejected on the same grounds.

Regarding claim 21, the combination further teaches the method according to claim 1, wherein the intended hotword is determined based on semantic similarity between the intended hotword and text generated based on at least a portion of the first audio data or based on at least a portion of the second audio data (the method of Iso-Sipilä for determining a failed hotword attempt and an intended hotword based on text generated from at least a portion of the first audio data or at least a portion of the second audio data as detailed in claim 1, now adapted to use the features as taught by Prager such that the intended hotword is determined based on semantic similarity between the intended hotword and text generated based on at least a portion of the first audio data or based on at least a portion of the second audio data. Prager: Spec., Col. 5, lines 18-20: an erroneous command may be entered due to the user forgetting part or all of the syntax of the command, i.e. the intended command is not included in the user input. Col. 3 line 65 – Col. 4 line 6: when the user enters an erroneous command, the system looks up the intent of that command and semantically compares such intent with other intents. When another intent is found, based on such comparison, to be within a predetermined degree of similarity, the command defined by such other intent is created as a suggestion for the user).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kunitake et al. (Patent No. US 10,650, 802 B2) discloses a voice recognition method involving calculating a recognition result and confidence level for first speech and calculating a confidence level for a repetition  as second speech based on the confidence level of the first speech (Abstract).
Atal et al. (US Patent No. 5,737,724) discloses a method and apparatus for speech recognition involving analyzing a first utterance with speech models to determine one or more similarity metrics between the utterance and the models and analyzing the most closely matched model to determine if the similarity metric satisfies a first recognition criterion. Similarly, a second utterance is analyzed to determine if it satisfies a second recognition criterion. A recognition result is determined when the second utterance satisfies the first and second recognition criterion (Abstract).

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARKER L MAYFIELD whose telephone number is (571)272-4745. The examiner can normally be reached Monday - Thursday 8:00 AM-6:00 PM, Friday 8:00 AM-12:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PARKER L MAYFIELD/
Examiner
Art Unit 2655



/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655