DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) were submitted on 12/07/2020 and 03/15/2022.  The submissions are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 10-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claims do not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of the “computer program product comprising one or more computer-readable storage media having program instructions” encompasses signals per se (see MPEP § 2106, subsection I). The specification provides no structure or medium for the “computer program product” and thus the claims can include transitory forms of signal transmission. The further recitation of “instructions” in claim 10 only serves to limit the content carried by the program product. As understood in light of the specification, the broadest reasonable interpretation of claim 10 encompasses signals which are not within one of the four statutory categories of invention. See MPEP 2106.03 I. It is suggested that claim 10 be amended to recite a “non-transitory” computer program product to overcome this rejection.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 10 recites the limitation "the one or more hotwords" in the second to last line of the claim.  There is insufficient antecedent basis for this limitation in the claim. For expedited prosecution, “the one or more hotwords” shall be interpreted as “one or more hotwords.”
Claims 11-18 are rejected as being indefinite due to dependence on claim 10. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 8, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia (Doc. ID US 20180350356 A1) in view of Gruenstein et al. (Doc ID. US 2018/0130469 A1), hereinafter Gruenstein.

Regarding claim 1, Garcia teaches a method (Spec. page 1, [0008], lines 1-4) implemented by one or more processors (Spec. page 7, [0065]), the method comprising: 
receiving, via one or more microphones of a client device (Spec. page 3, [0021]), lines 7-9; the device includes a microphone), audio data that captures a spoken utterance (Spec. page 1, [0008], lines 1-8; a computing device receives audio data capturing an utterance); 
processing the audio data using one or more machine learning models (Spec. page 3, [0025]), lines 14-16; the hotworder may use a neural network to process the audio) to generate a predicted output that indicates a probability of one or more hotwords being present in the audio data (Spec. page 3, [0025], lines 1-2; the computing device contains a hotworder. Lines 18-21; the hotworder generates a hotword confidence score for the audio to determine if the audio contains a hotword, i.e. a predicted output that indicates a probability of one or more hotwords being present in the audio data); 
determining that the predicted output satisfies a threshold that is indicative of the one or more hotwords being present in the audio data (Spec. page 3, [0025], lines 18-21; the hotworder determines that the audio includes a hotword if the hotword confidence score satisfies a hotword confidence score threshold); 
in response to determining that the predicted output satisfies the threshold, processing the audio data using automatic speech recognition to generate a speech transcription feature or intermediate embedding (Spec. page 2, [0010], lines 7-14; after determining that the utterance includes a predefined hotword, the computing device performs speech recognition on the audio and generates a speech transcription); and
detecting a watermark that is embedded in the audio data (Spec. page 1, [0008], lines 1-8; the computing device detects that the audio includes an audio watermark).
While Garcia does not teach in response to detecting the watermark: 
determining that the speech transcription feature or intermediate embedding corresponds to one of a plurality of stored speech transcription features or intermediate embeddings, Garcia does teach suppressing processing of a query included in the audio data (Spec. page 2, [0010], lines 21-23; the computing device suppresses an action corresponding to the audio. Page 6, [0058]; The action may comprise a query). Specifically, Garcia does not teach in response to detecting the watermark:
determining that the speech transcription feature or intermediate embedding corresponds to one of a plurality of stored speech transcription features or intermediate embeddings; and 
in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings, suppressing processing of a query included in the audio data.
	In a related field of endeavor (hotword trigger suppression to avoid activation due to recorded media audio, Spec. page 1, [0004]), Gruenstein teaches the detection and extraction of an audio signature from received audio and an audio fingerprint for comparison to stored audio fingerprints (Spec. page 3, [0026], lines 1-6).
Adapting Garcia’s hotword trigger suppression techniques to incorporate the audio fingerprint comparison features as detailed by Gruenstein further discloses: in response to detecting the watermark (Garcia’s method of detecting a watermark in received audio, now adapted to correspond to Gruenstein’s detection of an audio signature and extracting an audio fingerprint, i.e. an intermediate embedding from the audio, for comparison to stored audio fingerprints as in Gruenstein, Spec. page 3, [0026], lines 1-6):
determining that the speech transcription feature or intermediate embedding corresponds to one of a plurality of stored speech transcription features or intermediate embeddings (the neural network of Garcia, adapted to incorporate the fingerprint comparer of Gruenstein: Spec. page 3, [0026], lines 6-14; The fingerprint comparer may compute a match score that indicates the likelihood that the audio data matches an audio fingerprint in the fingerprint database, which may contain known audio recordings (e.g., music, TV programs, movies, etc.) that may contain or are associated with hotwords); and 
in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings, suppressing processing of a query included in the audio data (Garcia’s method of using the computing device to suppress an action corresponding to the audio as detailed in Spec. page 2, [0010], lines 21-23, the action potentially comprising a query Page 6, [0058], now adapted to be performed in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings as detailed in Garcia in Spec. page 3, [0026], lines 6-14).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Garcia by incorporating the teachings of Gruenstein. Both Garcia and Gruenstein are directed to hotword trigger suppression techniques. Further, Garcia recognizes that a commercial creator may include audio in the commercials to ensure that particular devices respond correctly to audio contained in the commercials (Spec. page 3, [0022], lines 1-5) and Gruenstein details a particular form of included audio, audio fingerprints, implemented in a similar manner. Given the overlap, in particular, the detection of audio intended by commercial creators to elicit a particular response from the computing device of commercial audiences, incorporation of the features of Gruenstein into Garcia would have been predictable to one of ordinary skill in the art at the time of filing. 

Regarding claim 2, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein above further teaches wherein the detecting the watermark is in response to determining that the predicted output satisfies the threshold (Garcia, Spec. page 3, [0026], lines 23-26; the audio watermark identifiers process the audio in response to the respective hotworder detecting a hotword, i.e. detecting the watermark is in response to determining that the predicted output satisfies the threshold).

Regarding claim 3, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein above further teaches wherein the watermark is an audio watermark that is imperceptible to humans (Garcia, Spec. page 1, [0005], lines 3-5; the watermark is an audio watermark that is inaudible to humans).

Regarding claim 4, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein above further teaches wherein the plurality of stored speech transcription features or intermediate embeddings is stored on the client device (Gruenstein, Spec. page 3, [0033], lines 1-4; the fingerprint database, which is the plurality of stored speech transcription features or intermediate embeddings as detailed above with respect to claim 1, can be stored locally on the computing device).

Regarding claim 8, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein above further teaches determining whether a current time or date is within an active window of the watermark (Garcia, Spec. page 5, [0046], lines 11-15; the computing device may determine whether to respond to hotwords with any watermark within or outside of Nugget World’s business hours, i.e. the active window of the watermark is determined by the business hours), and 
wherein suppressing processing of the query included in the audio data is further in response to determining that the current time or date is within the active window of the watermark (Garcia, Spec. page 5, [0046], lines 11-15; the device suppresses a response to the hotwords with watermarks if the current time is outside of the active window determined by business hours).

Regarding claim 19, the claim is directed to a system comprising: a processor, a computer-readable memory, one or more computer-readable storage media, and program instructions collectively stored on the one or more computer-readable storage media, the program instructions executable to perform features of the claimed method of claim 1. Garcia teaches a system comprising these elements (Spec. page 7, [0065]) for performing the method of claim 1, therefore claim 19 is rejected under the same grounds.

Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein and Bar-Yossef, Ziv, et al. (“Approximating Edit Distance Efficiently”), hereinafter Bar-Yossef.

Regarding claim 5, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein further teaches wherein determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings (Garcia, Spec. page 5, [0051], lines 7-13; the audio fingerprint is compared to the one or more audio fingerprints in the fingerprint database 122 using efficient matching algorithms. The fingerprint comparer 120 may compute a match score that indicates the likelihood that the audio data 104 matches an audio fingerprint in the fingerprint database) comprises computing a match score that indicates the likelihood that the audio data 104 matches an audio fingerprint in the fingerprint database (Garcia, Spec. page 5, [0051], lines 16-18; The fingerprint comparer 120 performs this determination by comparing the match score to a threshold match score). 
However, the combination does not teach determining that an edit distance between the speech transcription feature and one of the plurality of stored speech transcription features satisfies a threshold edit distance to determine that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings.
Bar-Yossef teaches algorithms to improve the computation of edit distance between two strings (Abstract, page 1). The techniques involve the embedding of edit distance space into Hamming space (Page 2, Col. 1, “Techniques,” lines 1-3) and estimating a Hamming distance between the strings (Page 4, Col. 2, Section 2 “Overview,” lines 19-22), i.e. determining an edit distance between two strings by determining an embedding distance between the embeddings of the strings. 
Adapting the combination of Garcia and Gruenstein to incorporate the teachings of Bar-Yossef for estimating edit distance between strings provides the method according to claim 1, wherein determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings  comprises determining that an edit distance between the speech transcription feature and one of the plurality of stored speech transcription features satisfies a threshold edit distance (The method of Garcia, Spec. page 5, [0051], lines 7-13; the audio fingerprint is compared to the one or more audio fingerprints in the fingerprint database 122 using the fingerprint comparer 120 to compute a match score that indicates the likelihood that the audio data 104 matches an audio fingerprint in the fingerprint database. The computation of the match score is now adapted to use the algorithm of Bar-Yossef as taught above to determine an edit distance between the audio fingerprint and the audio fingerprints in the fingerprint database. Garcia, Spec. page 5, [0051], lines 16-18; The fingerprint comparer 120 performs this determination by comparing the match score to a threshold match score).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Garcia and Gruenstein by incorporating the teachings of Bar-Yossef to provide the claimed invention of claim 5. Gruenstein suggests the use of a matching algorithm for comparing the audio fingerprint to the audio fingerprints in the fingerprint database and Bar-Yossef teaches a particular algorithm for determining edit distance between two strings. Given the overlap, in particular, the use of an algorithm for determining how closely matched two inputs are, incorporation of the features of Bar-Yossef into the combination of Garcia and Gruenstein would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 6, in addition to the elements stated above regarding claim 1, the combination of Garcia and Gruenstein further teaches wherein determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings comprises determining that an embedding-based distance satisfies a threshold embedding-based distance (The method of Garcia, Spec. page 5, [0051], lines 7-13; the audio fingerprint is compared to the one or more audio fingerprints in the fingerprint database 122 using the fingerprint comparer 120 to compute a match score that indicates the likelihood that the audio data 104 matches an audio fingerprint in the fingerprint database. The computation of the match score is now adapted to use the algorithm of Bar-Yossef as taught above to determine an embedding distance between the audio fingerprint and the audio fingerprints in the fingerprint database. Garcia, Spec. page 5, [0051], lines 16-18; The fingerprint comparer 120 performs this determination by comparing the match score to a threshold match score).

Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein and Mahmood et. al (Doc. ID US 20210090575 A1), hereinafter Mahmood.

Regarding claim 7, the combination of Garcia and Gruenstein teaches the method according to claim 1 as detailed above for suppressing processing of a query. Gruenstein further teaches that speaker identification may be performed on the audio data (Spec. page 2, [0024], lines 9-11). However, the combination does not teach in response to detecting the watermark: 
using speaker identification on the audio data to determine a speaker vector corresponding to the query included in the audio data; and 
determining that the speaker vector corresponds to one of a plurality of stored speaker vectors, 
wherein suppressing processing of the query included in the audio data is further in response to determining that the speaker vector corresponds to one of the plurality of stored speaker vectors.
Mahmood teaches techniques for a natural language processing system to implement multiple assistants during dialog with one or more users (Abstract). The system also includes a user recognition component (Spec. page 5, [0067]).
Adapting the combination of Garcia and Gruenstein to incorporate the teachings of Mahmood for user recognition provides the method according to claim 1, further comprising, in response to detecting the watermark: 
using speaker identification on the audio data to determine a speaker vector corresponding to the query included in the audio data (perform speaker identification as provided for in Gruenstein, Spec. page 2, [0024], lines 9-11, now adapted to use the user recognition of Mahmood: Spec. page 5, [0068], lines 1-7; a user recognition component compares speech characteristics in audio data to stored speech characteristics of users to identify a speaker. Page 19, [0213], lines 12-14; user recognition is done with user recognition feature vector data); and 
determining that the speaker vector corresponds to one of a plurality of stored speaker vectors (Mahmood, Spec. page 5, [0070], lines 1-3; the user recognition component outputs a single user identifier corresponding to the most likely user that originated the natural language input. [0072], lines 1-3; the user identifier is associated with a user profile in a plurality of user profiles in profile storage), 
wherein suppressing processing of the query included in the audio data is further in response to determining that the speaker vector corresponds to one of the plurality of stored speaker vectors (the method of hotword trigger suppression of Gruenstein including speaker identification performed on the audio data, Spec. page 2, [0024], lines 9-11).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Garcia and Gruenstein by incorporating the teachings of Mahmood to provide the claimed invention of claim 7. Gruenstein suggests the use of speaker identification in hotword trigger suppression and Mahmood teaches a particular technique for identifying a speaker for received audio. Given the overlap, in particular, the use of speaker identification in natural language processing, incorporation of the features of Mahmood into the combination of Garcia and Gruenstein would have been predictable to one of ordinary skill in the art at the time of filing.

Regarding claim 20, the claim is directed to the system according to claim 19 for performing the features of the claimed method of claim 7 and is rejected under the same grounds.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein and Bartosik et al. (US 20070033026 A1), hereinafter Bartosik.

Regarding claim 9, the combination of Garcia and Gruenstein teaches the method according to claim 1 as detailed above for suppressing processing of a query, however the combination does not teach wherein the plurality of stored speech transcription features or intermediate embeddings includes erroneous transcriptions.
Bartosik teaches a speech recognition and correction system which creates a lexicon of alternatives for frequently incorrect utterance transcriptions (Abstract).
Adapting the combination of Garcia and Gruenstein provides the method according to claim 1, wherein the plurality of stored speech transcription features or intermediate embeddings includes erroneous transcriptions (The fingerprint storage of Gruenstein, now adapted to include erroneous transcriptions of the fingerprints as taught by Bartosik in the Spec. page 1, [0003]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Garcia and Gruenstein by incorporating the teachings of Bartosik to provide the claimed invention of claim 9. Both disclosures are directed to the processing of natural language input. Gruenstein is directed to the prevention of a user device performing an action incorrectly due to received input. Similarly, Bartosik is directed to the prevention of erroneous action by providing a list of alternatives to replace incorrectly recognized text (Spec. page 1, [0004]). Furthermore, inclusion of the features of Bartosik would have improved the ability to correctly identify when an action should be suppressed, as Bartosik notes that a ready list of alternatives for incorrectly identified input eases correction by making it so that correction can be done more quickly (Spec. page 1, [0006]).

Claims 10-13 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein and Kim et al. (US 20160077794 A1), hereinafter Kim.

Regarding claim 10, Garcia teaches a computer program product comprising one or more computer-readable storage media having program instructions collectively stored on the one or more computer-readable storage media (Spec. page 7, [0065]), the program instructions executable to: 
receive, via one or more microphones of a client device (Spec. page 3, [0021]), lines 7-9; the device includes a microphone), first audio data that captures a first spoken utterance (Spec. page 1, [0008], lines 1-8; a computing device receives audio data capturing an utterance); 
process the first audio data using automatic speech recognition to generate a speech transcription feature or intermediate embedding (Spec. page 2, [0010], lines 7-14; the computing device performs speech recognition on the audio and generates a speech transcription); and 
detect a watermark that is embedded in the first audio data (Spec. page 1, [0008], lines 1-8; the computing device detects that the audio includes an audio watermark).
However, Garcia does not teach in response to detecting the watermark: 
determining that the speech transcription feature or intermediate embedding corresponds to one of a plurality of stored speech transcription features or intermediate embeddings; and 
in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings, modifying a threshold that is indicative of the one or more hotwords being present in audio data.
In a related field of endeavor (hotword trigger suppression to avoid activation due to recorded media audio, Spec. page 1, [0004]), Gruenstein teaches the detection and extraction of an audio signature from received audio and an audio fingerprint for comparison to stored audio fingerprints (Spec. page 3, [0026], lines 1-6).
Adapting Garcia’s hotword trigger suppression techniques to incorporate the audio fingerprint comparison features as detailed by Gruenstein further discloses: in response to detecting the watermark (Garcia’s method of detecting a watermark in received audio, now adapted to correspond to Gruenstein’s detection of an audio signature and extracting an audio fingerprint, i.e. an intermediate embedding from the audio, for comparison to stored audio fingerprints as in Gruenstein, Spec. page 3, [0026], lines 1-6):
determining that the speech transcription feature or intermediate embedding corresponds to one of a plurality of stored speech transcription features or intermediate embeddings (the neural network of Garcia, adapted to incorporate the fingerprint comparer of Gruenstein: Spec. page 3, [0026], lines 6-14; The fingerprint comparer may compute a match score that indicates the likelihood that the audio data matches an audio fingerprint in the fingerprint database, which may contain known audio recordings (e.g., music, TV programs, movies, etc.) that may contain or are associated with hotwords).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Garcia by incorporating the teachings of Gruenstein. Both Garcia and Gruenstein are directed to hotword trigger suppression techniques. Further, Garcia recognizes that a commercial creator may include audio in the commercials to ensure that particular devices respond correctly to audio contained in the commercials (Spec. page 3, [0022], lines 1-5) and Gruenstein details a particular form of included audio, audio fingerprints, implemented in a similar manner. Given the overlap, in particular, the detection of audio intended by commercial creators to elicit a particular response from the computing device of commercial audiences, incorporation of the features of Gruenstein into Garcia would have been predictable to one of ordinary skill in the art at the time of filing. 
Kim teaches systems and processes for dynamically adjusting a speech trigger threshold for triggering a virtual assistant in response to perceived events to minimize missed and false positive triggers (Abstract). 
Further adapting Garcia’s and Gruenstein’s hotword trigger suppression techniques to incorporate the features for adjusting a speech trigger threshold for triggering a virtual assistant as detailed by Kim further discloses: in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings, modify a threshold that is indicative of the one or more hotwords being present in audio data (Kim’s method of modifying a threshold indicating that a sampled audio input includes a spoken command trigger as detailed in Spec. page 1, [0008], now adapted to be performed in response to determining that the speech transcription feature or intermediate embedding corresponds to one of the plurality of stored speech transcription features or intermediate embeddings as detailed in Garcia in Spec. page 3, [0026], lines 6-14).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Garcia and Gruenstein by incorporating the teachings of Kim. Garcia, Gruenstein, and Kim are directed to speech triggers for client devices. Gruenstein recognizes that there may situations in which it is desirable to cancel the suppression of the performance of the operation indicated by the hotword, and suggests that the processing of audio may be adjusted to account for this (Spec. page 5, [0048]). Kim provides for this need by teaching processes that can respond to a change in circumstances by adjusting the threshold of the trigger command recognition as needed. Therefore, it would have been predictable to one of ordinary skill in the art at the time of filing to combine the disclosures for accounting for the need to modify the threshold for the hotword command recognition.

Regarding claim 11, the combination of Garcia, Gruenstein, and Kim further teaches the computer program product according to claim 10, wherein the program instructions are further executable to: 
receive, via the one or more microphones of the client device, second audio data that captures a second spoken utterance (Gruenstein, Spec. page 5, [0047], lines 21-25; the computing device adjusts the process for subsequently received audio including an utterance of the predefined hotword); 
process the second audio data using one or more machine learning models (Garcia, Spec. page 3, [0025]), lines 14-16; the hotworder may use a neural network to process the audio) to generate a predicted output that indicates a probability of one or more hotwords being present in the second audio data (the hotworder of Garcia as detailed in the  Spec. page 3, [0025], lines 1-2; the computing device contains a hotworder. Lines 18-21: now adapted to generate a hotword confidence score for the second audio as taught in Gruenstein Spec. page 5, [0047], lines 21-25 to determine if the audio contains a hotword, i.e. a predicted output that indicates a probability of one or more hotwords being present in the second audio data;36 
Attorney Docket No. ZS202-21328determine that the predicted output satisfies the modified threshold that is indicative of the one or more hotwords being present in the second audio data (Garcia, Spec. page 3, [0025], lines 18-21; the hotworder adapted to determine that the audio includes a hotword if the hotword confidence score satisfies a modified hotword confidence score threshold, modified as taught by Kim); and 
in response to determining that the predicted output satisfies the modified threshold, process a query included in the second audio data (Gruenstein, Spec. page 5, [0047], lines 21-25; the computing device processes subsequently received audio including an utterance of the predefined hotword).

Regarding claim 12, the claim is directed to the computer program product according to claim 10 for performing the features of the claimed method of claim 3 and is rejected under the same grounds.

Regarding claim 13, the claim is directed to the computer program product according to claim 10 for performing the features of the claimed method of claim 4 and is rejected under the same grounds.

Regarding claim 17, the combination of Garcia, Gruenstein, and Kim further teaches the computer program product according to claim 10, wherein the program instructions are further executable to determine whether a current time or date is within an active window of the watermark (Garcia, Spec. page 5, [0046], lines 11-15; the computing device may determine whether to respond to hotwords with any watermark within or outside of Nugget World’s business hours, i.e. the active window of the watermark is determined by the business hours), and 
wherein modifying the threshold that is indicative of the one or more hotwords being present in audio data is further in response to determining that the current time or date is within the active window of the watermark (Garcia’s system for determining whether the current time or date is within an active window of the watermark as detailed above, now adapted to modify the threshold that is indicative of the one or more hotwords being present in audio data as taught by Kim in the Spec. page 1, [0008] in response to determining that the current time or date is within the active window of the watermark).

Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein, Kim, and Bar-Yossef.

Regarding claim 14, the claim is directed to the computer program product according to claim 10 for performing the features of the claimed method of claim 5 and is rejected under the same grounds.

Regarding claim 15, the claim is directed to the computer program product according to claim 10 for performing the features of the claimed method of claim 6 and is rejected under the same grounds.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein, Mahmood, and Kim.

Regarding claim 16, the combination of Garcia, Gruenstein, and Mahmood as detailed above with respect to claim 7 further teaches the computer program product according to claim 10, wherein the program instructions are further executable to, in response to detecting the watermark: 
use speaker identification on the audio data to determine a speaker vector corresponding to the query included in the audio data (perform speaker identification as provided for in Gruenstein, Spec. page 2, [0024], lines 9-11, now adapted to use the user recognition of Mahmood: Spec. page 5, [0068], lines 1-7; a user recognition component compares speech characteristics in audio data to stored speech characteristics of users to identify a speaker. Page 19, [0213], lines 12-14; user recognition is done with user recognition feature vector data); and 
determine that the speaker vector corresponds to one of a plurality of stored speaker vectors (Mahmood, Spec. page 5, [0070], lines 1-3; the user recognition component outputs a single user identifier corresponding to the most likely user that originated the natural language input. [0072], lines 1-3; the user identifier is associated with a user profile in a plurality of user profiles in profile storage).
Kim teaches systems and processes for dynamically adjusting a speech trigger threshold for triggering a virtual assistant in response to perceived events to minimize missed and false positive triggers (Abstract). 
Further adapting the combination of Garcia, Gruenstein, and Mahmood to incorporate the teachings of Kim further discloses wherein modifying the threshold that is indicative of the one or more hotwords being present in audio data is further in response to determining that the speaker vector corresponds to one of the plurality of stored speaker vectors (Mahmood’s system for user recognition as detailed above, now adapted to modify the threshold that is indicative of the one or more hotwords being present in audio data as taught by Kim in the Spec. page 1, [0008] in response to determining that the speaker vector corresponds to one of the plurality of stored speaker vectors).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Garcia, Gruenstein, and Mahmood by incorporating the teachings of Kim. Garcia, Gruenstein, Mahmood and Kim are directed to natural language input processing. Gruenstein recognizes that there may situations in which it is desirable to cancel the suppression of the performance of the operation indicated by the hotword, and suggests that the processing of audio may be adjusted to account for this (Spec. page 5, [0048]). Kim provides for this need by teaching processes that can respond to a change in circumstances by adjusting the threshold of the trigger command recognition as needed. Therefore, it would have been predictable to one of ordinary skill in the art at the time of filing to combine the disclosures for accounting for the need to modify the threshold for the hotword command recognition.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Garcia in view of Gruenstein, Kim, and Bartosik.

Regarding claim 18, the claim is directed to the computer program product according to claim 10 for performing the features of the claimed method of claim 9 and is rejected under the same grounds.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Tai et al. (Pub. No. US 2020/0098380 A1) teaches a system for embedding and detecting audio watermarks in audio data to enable wakeword suppression or signal transmission between devices in proximity with one another (Abstract).
Salem et al. (Patent No. US 11,100,930 B1) teaches a method for avoiding false wake word triggers from remote devices during communication sessions (Abstract). 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARKER L MAYFIELD whose telephone number is (571)272-4745. The examiner can normally be reached Monday - Friday 7:30 AM-5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PARKER L MAYFIELD/
Examiner
Art Unit 2655



/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655