DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 02/23/2022 and 02/28/2022. The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Drawings
The drawings are objected to because page 15, [0041] references element on-device machine learning training engine 132A. This element is referred to by "132" in Fig. 1A. The examiner suggests amending the Specification to read "on-device machine learning training engine 132" or the reference numeral of Fig. 1A to read "132A" (see MPEP R 1.84 Standards for drawings(p)(5)).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 9-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because claim 9 is directed to "a computer program product comprising one or more computer-readable storage media." The claim is directed to software per se (see MPEP § 2106, subsection I). The examiner recommends amending the claim to recite "a computer program product comprising one or more non-transitory computer-readable storage media" to resolve the rejection.
Claims 10-15 are rejected due to dependency on claim 9.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Iso-Sipilä et al. (Patent No. US 6697782 B1 ), hereinafter Iso-Sipilä.

Regarding claim 1, Iso-Sipilä teaches a method (Abstract, line 1) implemented by one or more processors (Spec. Col. 8, lines 59-67), the method comprising: 
receiving, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
processing the first audio data using one or more machine learning models to generate a first predicted output that indicates a probability of one or more hotwords being present in the first audio data (Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 7, lines 30-37; the system processes the speech command, i.e. the first audio data, to generate a confidence value, i.e. a first predicted output that indicates a probability, that the speech command matches a command word, i.e. a hotword. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determining that the first predicted output satisfies a secondary threshold that is less indicative of the one or more hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Spec. Col. 7 lines 46-49; the confidence value, i.e. the first predicted output, satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receiving, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
processing the second audio data using the one or more machine learning models to generate a second predicted output that indicates a probability of the one or more hotwords being present in the second audio data (Spec. Col. 7 lines 56-60; the process detailed above for generating a confidence value with respect to the first audio data is repeated for the second audio data); 
determining that the second predicted output satisfies the secondary threshold but does not satisfy the primary threshold (Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the first predicted output and the second predicted output satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identifying a failed hotword attempt (Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt, providing a hint that is responsive to the failed hotword attempt (Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again).

Regarding claim 2, Iso-Sipilä further teaches wherein identifying the failed hotword attempt is further in response to determining that a similarity between the first spoken utterance and the second spoken utterance exceeds a similarity threshold (Spec Col. 10, lines 10-24; the control unit compares the first spoken utterance and the second spoken utterance and determines a similarity between them based on calculated distances between the words. If the words are sufficiently similar, the control unit determines the utterances were a failed attempt to speak a certain command word.).

Regarding claim 3, Iso-Sipilä further teaches wherein identifying the failed hotword attempt is further in response to determining that the probability indicated by the first predicted output and the probability indicated by the second predicted output correspond to a same hotword of the one or more hotwords (Spec Col. 8, lines 4-11; the control unit determines the utterances were a failed attempt to speak the same command word based on a comparison between the first and second audio, which is only performed based on the probabilities indicated by the first and second predicted output indicating uncertain confidence that the first and second audio data matched the same command, i.e. a same hotword.).

Regarding claim 4, Iso-Sipilä further teaches determining, using a model conditioned on acoustic features, that the first audio data and the second audio data comprise a command (Spec. Col. 1 lines 6-12; the disclosure is directed to recognizing commands in speech utterances. Spec. Col. 2, lines 3-42; the HMM used by the device to determine that the first audio data and the second audio data comprise a command is conditioned on acoustic features.), 
wherein identifying the failed hotword attempt is further in response to the first audio data and the second audio data comprising the command (Spec. Col. 10, lines 19-24; identifying the failed hotword attempt is in response to the first audio data and the second audio data comprising the command).

Regarding claim 5, Iso-Sipilä further teaches further comprising determining an intended hotword corresponding to the failed hotword attempt (Spec Col. 10, lines 10-24; the control unit compares the first spoken utterance and the second spoken utterance and determines a similarity between them based on calculated distances between the words. If the words are sufficiently similar, the control unit determines the utterances were a failed attempt to speak a certain command word, i.e. determines an intended hotword corresponding to the failed hotword attempt).

Regarding claim 6, Iso-Sipilä further teaches the method according to claim 5, wherein the intended hotword is determined based on acoustic similarity between at least a portion of the first audio data, at least a portion of the second audio data, and the intended hotword (Spec. Col. 9, lines 28-38; the device determines acoustic similarity between the first audio data and the intended hotword by converting the input uttered command word to a feature vector representation and calculating the probability that it corresponds to a command word, i.e. an intended hotword, in a vocabulary. Col. 9, line 64-Col. 10 line 3; the control device repeats this process for the second audio data. Col. 10, lines 9-13; the control unit compares the feature vectors of the first and second audio data to each other to determine a similarity between them in order to determine the intended hotword).

Regarding claim 7, Iso-Sipilä further teaches the method according to claim 5, wherein providing the hint comprises displaying the intended hotword on a display of the client device or providing, by the client device, an audio response that includes the intended hotword (Spec. Col. 4, lines 17-19; upon two uncertain recognition results for a first and second audio data, the device displays a hint “Did you say yes?” including the intended hotword “yes.” As the device asks the user the hint, it is considered that the device gives an audio response that includes the hotword).

Regarding claim 8, Iso-Sipilä further teaches the method according to claim 5, further comprising performing an action corresponding to the intended hotword (Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. Col. 9, lines 48-56; the device executes the command based on the control signal, i.e. it performs an action corresponding to the intended hotword).

Regarding claim 9, Iso-Sipilä teaches a computer program product comprising one or more computer-readable storage media having program instructions collectively stored on the one or more computer- readable storage media (Spec. Col. 8, lines 59-67), the program instructions executable to: 39Attorney Docket No. ZS202-20828 
receive, via one or more microphones of a client device, first audio data that captures a first spoken utterance of a user (Spec. Col. 1, lines 6-8; a speech command is received, i.e. first audio data that captures a first spoken utterance of a user. Spec. Col. 8 line 59-Col. 9 line 2 teaches a client device operable by speech command received via a microphone); 
process the first audio data using each of a plurality of classes in a machine learning model to generate a corresponding probability associated with the first audio data, each of the classes being associated with a corresponding hotword of a plurality of hotwords and each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the first audio data (Spec. Col. 1 line 66- Col. 2 line 2; a Hidden Markov Model [HMM] is used for processing the audio data. The HMM is considered to be a machine learning model as the Specification discloses that the speech control unit of the speech recognition device [Col. 2, lines 40-45: the speech recognition unit can use the HMM method for recognition] is taught the commands in Col. 9 lines 19-21. Col. 2, lines 23-36; each reference word, which corresponds to commands or hotwords, in a plurality of reference words has an HMM model, or class. Input speech, i.e. first audio data, is processed such that each HMM class calculates a corresponding probability of that reference word being present in the first audio data. The command words are considered to be hot words as they are reference terms which generate a response from the device such as the command “yes” trigger a call acceptance in Col. 3, lines 60-64); 
determine that the probability of one of the plurality of hotwords being present in the first audio data satisfies a secondary threshold that is less indicative of the one of the plurality of hotwords being present in audio data than is a primary threshold but does not satisfy the primary threshold (Spec. Col. 7, lines 29-37; probabilities are determined on the basis of the command word uttered by the user for different command words in the vocabulary of the speech recognition device. The command word with the greatest probability is selected as the preliminary result. Spec. Col. 7 lines 46-49; the confidence value, i.e. the probability, of the preliminary command word satisfies the second threshold value A, i.e. the secondary threshold, but does not satisfy the first threshold value Y, i.e. the primary threshold, indicative of the command word being present in the speech command); 
receive, via the one or more microphones of the client device, second audio data that captures a second spoken utterance of a user (Spec. Col. 7 lines 49-51; the user is given time to utter the speech command a second time); 
process the second audio data using each of the plurality of classes in the machine learning model to generate a corresponding probability associated with the second audio data, each of the corresponding probabilities being associated with a probability of the corresponding hotword being present in the second audio data (Spec. Col. 7 lines 56-60; the process detailed above for processing the first audio data is repeated for the second audio data); 
determine that the probability of the one of the plurality of hotwords being present in the second audio data satisfies the secondary threshold but does not satisfy the primary threshold (Spec. Col. 8 lines 4-7; if the command word cannot be recognized with sufficient confidence, i.e. the confidence value of the second utterance fails to satisfy threshold value Y, then the repeated speech commands are compared to each other. Col. 11, lines 32-35; the comparison of the repeated speech commands only occurs if the confidence value of the second utterance does satisfy second threshold value A); 
in response to the probability of the one of the plurality of hotwords being present in the first audio data satisfying the secondary threshold but not satisfying the primary threshold and the probability of the one of the plurality of hotwords being present in the second audio data satisfying the secondary threshold but not satisfying the primary threshold, and in response to the first spoken utterance and the second spoken utterance satisfying one or more temporal criteria relative to one another (Spec. Col. 7 lines 55-60; the repeated utterance must be made within the specified extended time window in order for the recognition process to proceed), identify a failed hotword attempt (Spec Col. 10, lines 21-24; if the comparison of the repeated speech commands determines that the command words were probably the same, the command word is converted to the corresponding control signal. This is construed as the speech control unit identifying a failed attempt to speak a certain command, i.e. hotword, by determining which command was intended by the user in the utterances.); and 
in response to identifying the failed hotword attempt, provide a hint that is responsive to the failed hotword attempt (Spec Col. 10, lines 29-34; when the recognition result is not sufficiently reliable, the device recognizes and informs the user of the failed recognition and requests that the user utters the command again).

Regarding claim 10, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 2, and is rejected on the same grounds.

Regarding claim 11, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 4, and is rejected on the same grounds.

Regarding claim 12, the claim is directed to the computer program product according to claim 9 for performing the claimed method of claim 5, and is rejected on the same grounds.

Regarding claim 13, the claim is directed to the computer program product according to claim 12 for performing the claimed method of claim 6, and is rejected on the same grounds.

Regarding claim 14, the claim is directed to the computer program product according to claim 12 for performing the claimed method of claim 7, and is rejected on the same grounds.

Regarding claim 15, the claim is directed to the computer program product according to claim 12 for performing the claimed method of claim 8, and is rejected on the same grounds.

Regarding claim 16, the claim is directed to a system comprising: 41Attorney Docket No. ZS202-20828 
a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media, the program instructions executable to perform the claimed method of claim 1.
Iso-Sipilä teaches , a system comprising: 41Attorney Docket No. ZS202-20828 
a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media, the program instructions executable to perform the claimed method of claim 1 (Spec. Col. 8, lines 59-67) therefore claim 16 is rejected on the same grounds.

Regarding claim 17, the claim is directed to the system according to claim 16 for performing the claimed method of claim 2, and is rejected on the same grounds.

Regarding claim 18, the claim is directed to the system according to claim 16 for performing the claimed method of claim 3, and is rejected on the same grounds.

Regarding claim 19, the claim is directed to the system according to claim 16 for performing the claimed method of claim 4, and is rejected on the same grounds.

Regarding claim 20, the claim is directed to the system according to claim 16 for performing the claimed method of claim 5, and is rejected on the same grounds.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Kunitake et al. (Patent No. US 10,650, 802 B2) discloses a voice recognition method involving calculating a recognition result and confidence level for first speech and calculating a confidence level for a repetition  as second speech based on the confidence level of the first speech (Abstract).
Atal et al. (US Patent No. 5,737,724) discloses a method and apparatus for speech recognition involving analyzing a first utterance with speech models to determine one or more similarity metrics between the utterance and the models and analyzing the most closely matched model to determine if the similarity metric satisfies a first recognition criterion. Similarly, a second utterance is analyzed to determine if it satisfies a second recognition criterion. A recognition result is determined when the second utterance satisfies the first and second recognition criterion (Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARKER L MAYFIELD whose telephone number is (571)272-4745. The examiner can normally be reached Monday - Friday 7:30 AM-5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PARKER L MAYFIELD/
Examiner
Art Unit 2655



/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655