DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
	The information disclosure statement(s)(IDS) submitted on the following date 8/15/2019 have been considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1 - 3, 7 - 9, 12 - 14, 17, 18, 21 - 23, and 27  are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Griffin et al. (US20210020162A1)(hereinafter "Griffin").

	Regarding claim 1, Griffin teaches A computing system for training custom phrase spotter executables for virtual assistants, the system comprising a processor and a memory in computing devices that utilize virtual assistants in connection with computing devices. For example, Amazon has Alexa, Microsoft has Cortana, and Apple has Siri. These virtual assistants each have a pre-defined set of words or phrases that the associated computing device [via a microphone] will continually be listening for.”, and Par. 0059:”Example system 300 includes at least one processing unit [CPU or processor] 310 and connection 305 that couples various system components including system memory 315, such as read only memory [ROM] 320 and random access memory [RAM] 325 to processor 310. Computing system 300 can include a cache of high-speed memory 312 connected directly with, in close proximity to, or integrated as part of processor 310”).
	receive a request for training a custom phrase spotter executable and an identification of a specific virtual assistant; responsive to receiving the request, receive: (Par. 0009:”Disclosed herein are computer-implemented methods, computer-readable media, and systems for generating and automatically training custom wake words. The generation and training of the custom wake words is first established based on receiving a user input from a user that corresponds to the custom wake word. The user input would include the one or more words that will be spoken by the user in a vicinity of a computing device in order to invoke a virtual assistant associated with the computing device.”).
	one or more positive audio samples corresponding to spoken audio of a custom wake phrase; (Par. 0009:” The user input would include the one or more words that will be spoken by the user in a vicinity of a computing device in order to invoke a virtual assistant associated with the computing device.”).
	train, using the positive audio samples, a model for the custom wake phrase audio; (Par. 0009:” A machine learned model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples.”).
	and compile the executable, including the model, such that, when deployed on the specific virtual assistant as identified by the identification, the executable recognizes the custom wake phrase. (Par. 0040:” ... there may be a pre-determined number of samples that would need to be used in order adequately generate a wake word detection algorithm that would allow the computing device to recognize the customized wake word having a pre-determined accuracy.”, and Par. 0063:”The storage device 330 can include software services, servers, services, etc., that when the code that defines such software is executed by the processor 310, it causes the system to perform a function.”). Note: Per as filed applicant specification, Par. 0034: Generally, any audio that contains the exact phrase of the custom wake phrase is considered a positive sample.
	
	With regard to claim 12, Griffin teaches a computer implemented method for training a custom phrase spotter executable, (Par. 0025:”… the present technology pertains to an improved method of automatically training the machine learning models to recognize custom wake words.”) to perform operations with steps virtually identical to the functions performed in claim 1.

computing devices that utilize virtual assistants in connection with computing devices. For example, Amazon has Alexa, Microsoft has Cortana, and Apple has Siri. These virtual assistants each have a pre-defined set of words or phrases that the associated computing device [via a microphone] will continually be listening for.”, and Par. 0059:” Example system 300 includes at least one processing unit [CPU or processor] 310 and connection 305 that couples various system components including system memory 315, such as read only memory [ROM] 320 and random access memory [RAM] 325 to processor 310. Computing system 300 can include a cache of high-speed memory 312 connected directly with, in close proximity to, or integrated as part of processor 310.”) to perform operations with steps virtually identical to the functions performed in claim 1.

With regard to claim 22, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants, the system comprising a processor and a memory in communication with the processor, the memory storing instructions that, when executed by the processor, configure the computing system to: (Par. 0003:” There are a number of different existing computing devices that utilize virtual assistants in connection with computing devices. For example, Amazon has Alexa, Microsoft has Cortana, and Apple has Siri. These virtual assistants each have a pre-defined set of words or phrases that the associated computing device [via a microphone] will continually be listening for.”, and Par. 0059:” Example system 300 includes at least one processing unit [CPU or processor] 310 and connection 305 that couples various system components including system memory 315, such as read only memory [ROM] 320 and random access memory [RAM] 325 to processor 310. Computing system 300 can include a cache of high-speed memory 312 connected directly with, in close proximity to, or integrated as part of processor 310.”).
	receive a request for training a custom phrase spotter executable and an identification of a specific virtual assistant responsive to receiving the request, (Par. 0009:”Disclosed herein are computer-implemented methods, computer-readable media, and systems for generating and automatically training custom wake words. The generation and training of the custom wake words is first established based on receiving a user input from a user that corresponds to the custom wake word. The user input would include the one or more words that will be spoken by the user in a vicinity of a computing device in order to invoke a virtual assistant associated with the computing device.”).
receive text corresponding to the custom wake phrase; (Par. 0021:” The customize wake word may be entered as text.”).
	search within a corpus of audio samples, stored on a database of the computing system, for one or more stored positive audio samples corresponding to the text; (Par. 0046:”...in additional to generating samples of variations pertaining to the custom wake word, the present technology may also generate other words that are pronounced or at least sound similar to the custom wake word [e.g. white-listed data 180]”, and Par. 0039:”Furthermore, the wake word detection algorithm can also be trained using crowd-sourced data 175 [e.g audio data obtained by other users and stored on a server or in the cloud] or user acquired data 180 [e.g. actual voice samples of individuals reciting the custom wake word].”).
	and train, using the positive audio samples, a model for the custom wake phrase audio; (Par. 0009:” A machine learned model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples.”).
	and compile the executable, including the model, such that, when deployed on the specific virtual assistant as identified by the identification, the executable recognizes the custom wake phrase. (Par. 0040:” ... there may be a pre-determined number of samples that would need to be used in order adequately generate a wake word detection algorithm that would allow the computing device to recognize the customized wake word having a pre-determined accuracy.”, and Par. 0063:”The storage device 330 can include software services, servers, services, etc., that when the code that defines such software is executed by the processor 310, it causes the system to perform a function.”). Note: Per as filed applicant specification, Par. 0034: Generally, any audio that contains the exact phrase of the custom wake phrase is considered a positive sample.

	Regarding claim 2, and 13, Griffin teaches The computing system of claim 1, and 12 respectively, further configured to: receive text corresponding to the custom wake phrase; (Par. 0021:” ... many existing virtual assistants utilize a pre-determined set of words or phrases as wake words. In order to use a new or custom wake word, it may be possible to receive a customized word or words [e.g. phrase] via a user interface 145. The customize wake word may be entered as text.”).
samples of variations pertaining to the custom wake word, the present technology may also generate other words that are pronounced or at least sound similar to the custom wake word [e.g. white-listed data 180]”, and Par. 0039:”Furthermore, the wake word detection algorithm can also be trained using crowd-sourced data 175 [e.g audio data obtained by other users and stored on a server or in the cloud] or user acquired data 180 [e.g. actual voice samples of individuals reciting the custom wake word].”).
	and include the stored positive audio samples in the training of the model. (Par. 0039:” Furthermore, the wake word detection algorithm can also be trained using crowd-sourced data 175 [e.g. audio data obtained by other users and stored on a server or in the cloud] or user acquired data 180 [e.g. actual voice samples of individuals reciting the custom wake word].... crowd-sourced data 175 and user acquired data 180 may be part of the data set used to train a wake word detection algorithm.”).
	
Regarding claim 3, and 14, Griffin teaches the computing system of claim 1, and 12 respectively,  further configured to: receive text corresponding to the custom wake phrase; (Par. 0021:” The customize wake word may be entered as text.”).
apply text-to-speech (TTS) to the text to generate a synthesized positive audio sample of the custom wake phrase; (Par. 0032:” The sample generation service 150 may also utilize different text-to-speech services 165….the information used to generate a different variation of how to pronounce the custom wake word from a text version.”).
model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples.”, and Par. 0053:”The generated samples [from step 220] may be used to supplement the training of machine learning models when there are not sufficient number of “live” samples.”).

Regarding claim 7, Griffin teaches The computing system of claim 1, further configured to: search, within a corpus of audio samples, stored in a database on the computing system for a stored positive audio sample having an alternate pronunciation of the custom wake phrase but that is an accurate representation of the custom wake phrase; (Par. 0039 Furthermore, the wake word detection algorithm can also be trained using crowd-sourced data 175 (e.g. audio data obtained by other users and stored on a server or in the cloud) or user acquired data 180 (e.g. actual voice samples of individuals reciting the custom wake word).”, and Par. 0037:”To ensure that a diverse variety of samples are used, a pre-determined number of samples can be associated with each different way/element that influences a variation of how the custom wake word can be pronounced.”, and Par. 0045:”The generated wake word detection algorithm can then be stored in the model database 160. The model database 160 can provide the wake word detection algorithm for a particular custom wake word to other computing devices [e.g. via download from the cloud] in situations where other users would like to implement that custom wake word in connection with their virtual assistant.”).
and, include the stored positive audio sample in the training of the model. (Par. 0022:” ... the machine learning model is capable of learning [on a more efficient and expedient different pronunciations of the custom wake word.”).

Regarding claim 8, Griffin teaches The computing system of claim 1, wherein the positive audio samples comprise one of: a spoken input provided directly via a developer interface of the computing system; (Par. 0009:”The user input would include the one or more words that will be spoken by the user in a vicinity of a computing device in order to invoke a virtual assistant associated with the computing device.”).
and an audio file provided to the developer interface. (Par. 0021:” Alternatively, the user interface 145 may initiate a recording function that would allow the user to record the user saying the custom wake word using the microphone 130.”).

Regarding claims 9, and 27, Griffin teaches the computing system of claim 1, and 22 respectively  wherein the model for the custom wake phrase audio comprises a neural network receiving input audio features of the positive audio samples and outputting one or more sub phrase units for the input audio features, (Par. 0042:”… a neural network can be used to decompose a spoken phrase while a classifier can determine if the decomposed spoken phrase includes the spoken wake word.”, and Par. 0044:”Using a neural network, the speech can be analyzed to identify important aspects of the speech and to represent these aspects of the speech.”).
and the model further comprises a sub phrase unit sequence detector for detecting the custom wake phrase within the one or more output sub phrase units. (Par. 0009:”A machine learned model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples. The algorithm can then be deployed to the computing device so that the computing device is able to recognize the wake word when spoken.”, and Par. 0042:” ... a neural network can be used to decompose a spoken phrase while a classifier can determine if the decomposed spoken phrase includes the spoken wake word.”).

Regarding claim 17, Griffin teaches The computing system of claim 12, further configured to: searching within a corpus of audio samples for stored positive audio samples acoustically similar to the received one or more positive audio samples; (Par. 0039 Furthermore, the wake word detection algorithm can also be trained using crowd-sourced data 175 (e.g. audio data obtained by other users and stored on a server or in the cloud) or user acquired data 180 (e.g. actual voice samples of individuals reciting the custom wake word).”, and Par. 0037:”To ensure that a diverse variety of samples are used, a pre-determined number of samples can be associated with each different way/element that influences a variation of how the custom wake word can be pronounced.”, and Par. 0045:”The generated wake word detection algorithm can then be stored in the model database 160. The model database 160 can provide the wake word detection algorithm for a particular custom wake word to other computing devices [e.g. via download from the cloud] in situations where other users would like to implement that custom wake word in connection with their virtual assistant.”, and Par. 0046:” the present technology may also generate other words that are pronounced or at least sound similar to the custom wake word (e.g. white-listed data 180).”).
machine learning model is capable of learning [on a more efficient and expedient manner] what features and parameters to use in order to detect different pronunciations of the custom wake word.”).

Regarding claim 18, Griffin teaches The method of claim 12, wherein the model for the custom wake phrase comprises: a neural network receiving input audio features of the positive audio samples and outputting one or more sub phrase units for the input audio features; (Par. 0042:”… a neural network can be used to decompose a spoken phrase while a classifier can determine if the decomposed spoken phrase includes the spoken wake word.”, and Par. 0044:”Using a neural network, the speech can be analyzed to identify important aspects of the speech and to represent these aspects of the speech.”).
and a sub phrase unit sequence detector for detecting the custom wake phrase within the one or more output sub phrase units. (Par. 0009:”A machine learned model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples. The algorithm can then be deployed to the computing device so that the computing device is able to recognize the wake word when spoken.”, and Par. 0042:” ... a neural network can be used to decompose a spoken phrase while a classifier can determine if the decomposed spoken phrase includes the spoken wake word.”).

Regarding claim 23, Griffin teaches the computing system of claim 22 further configured to: apply text-to-speech (TTS) to the text to generate a synthesized positive audio sample of the text-to-speech services 165….the information used to generate a different variation of how to pronounce the custom wake word from a text version.”).
and include the synthesized positive audio sample in the training of the model. (Par. 0009:”A machine learned model is trained in order to output an algorithm that is used to detect the custom wake word by using the generated samples.”, and Par. 0053:”The generated samples [from step 220] may be used to supplement the training of machine learning models when there are not sufficient number of “live” samples.”).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 4, 15,  and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 1, 12 and 22, respectively, in further view of Meacham et al. (US9886954B1)( hereinafter “Meacham”).

	With regard to claims 4, 15, and 26, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants.
	With regard to claims 4, and 15 Griffin does not teach the computing system of claim 1, and 12 respectively, further configured to: responsive to receiving the request, receive one or more negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase; and include the negative audio samples in the training of the model.
	Meacham teaches responsive to receiving the request, receive one or more negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase; (Col. 21 lines 26 - 31:"Machine learning model 510 can be trained with a positive results with confidence scores that indicate the ambient audio stream includes a particular type of audio characteristic and a set of negative results with confidence scores that indicate that the ambient audio stream does not include a particular type of audio characteristic").
	and include the negative audio samples in the training of the model. (Col. 19 lines 10 - 12:"Any number of positive audio samples and negative audio samples may be used to train the machine learning model").
Therefore, It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Meacham to receive one or more negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase and include the negative audio samples in the training of the model in order to adapt automatically to a changing audio environment when users are using hearing support equipment, as evidence by Meacham  (See Col. 1 lines 25 - 30).

	With regard to claim 26, Griffin does not teach the computing system of claim 22 further configured to:  subsequent to the deploying of the model, receive feedback from a developer, indicative of the model for the phrase spotter executable recognizing incorrect audio samples as the custom wake phrase; dynamically re-train the model by including the incorrect audio samples as negative samples to generate an updated model.
	Meacham teaches subsequent to the deploying of the model, receive feedback from a developer, indicative of the model for the phrase spotter executable recognizing incorrect models may be further refined by receiving user feedback that indicates whether the machine learning model correctly identified a particular source of a sound contained within the ambient audio stream.").
	dynamically re-train the model by including the incorrect audio samples as negative samples to generate an updated model. (Col. 20 lines 1 - 12:" … machine learning model 504 may be trained to determine whether input ambient audio stream 520 includes a siren [incorrect audio samples], machine learning model 506 may be trained to determine whether input ambient audio stream 520 includes speech, and machine learning model 508 may be trained to determine whether input ambient audio stream 520 includes a female or male speaker. The machine learning models can be trained to determine whether input ambient audio stream 520 includes other types of sounds [incorrect audio samples], e.g., baby crying, engine noise, siren, etc. The machine learning models 504, 506, 508 are trained using ambient sound streams from all users associated with an active acoustic filter.").
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Meacham to receive feedback from a developer, indicative of the model for the phrase spotter executable recognizing incorrect audio samples as the custom wake phrase; dynamically re-train the model by including the incorrect audio samples as negative samples to generate an updated model in order to adapt automatically to a changing audio environment when users are using hearing support equipment, as evidence by Meacham  (See Col. 1 lines 25 - 30).


Claims 5, 16, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 1, 12, and 22 respectively, in further view of Smith et al. (US 20200090646A1)( hereinafter “Smith”).

	With regard to claims 5, 16, and 24, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants.
	Griffin does not teach search within a corpus of audio samples, stored on a database of the computing system, for one or more negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase; and include the negative audio samples in the training of the model.
	Smith teaches search within a corpus of audio samples, stored on a database of the computing system, for one or more negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase; (Par. 0143:” when the audio of a commercial advertising LEXUS automobiles is output in the vicinity of the NMD 503 with the wake-word engine 570 trained to spot ‘Alexa,’ the word ‘Lexus’ spoken in the commercial is considered a false wake word. As another example, when the audio of a TV news coverage of an election is output in the vicinity of the NMD 503 with the wake-word engine 570 trained to spot ‘Alexa,’ the word ‘Election’ spoken in that news coverage is considered a false wake word.”).
	and include the negative audio samples in the training of the model. (Par. 0143:” ... a false positive can occur when the wake-word engine 570 identifies in detected sound that trained to spot.”). Note: false positive is negative.
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Smith for one or more stored negative audio samples having audible similarities to the positive audio samples but that are not the custom wake phrase; and include the stored negative audio sample in the training of the model as a negative sample in order to tune a wake-word engine to a higher sensitivity level to accommodate a wider range of dialectical and speech pattern variations for a given wake word, despite the possibility of this leading to an increase in false positives, as evidence by Smith  (See Par. 0026).

Claim 6, is rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 2, in further view of Rubin et al. (US20140012586A1)( hereinafter “Rubin”).

	With regard to claim 6, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants.
	With regard to claim 6, Griffin does not teach the computing system of claim 2, further configured to: generate a phoneme representation for the custom wake phrase, in dependence upon the text; search, within the database, for a phonetically similar wake phrase sharing phonetic features with the phoneme representation and retrieve, from the database a stored 
Rubin teaches generate a phoneme representation for the custom wake phrase, in dependence upon the text; (Par. 0024:” To identify possible pronunciations [phoneme] of the word "pizza",  for example, the hotword strength evaluation engine may reference the word in a dictionary or pronunciation guide, based on the transcription 116 [text]”).
search, within the database, for a phonetically similar wake phrase sharing phonetic features with the phoneme representation and retrieve, from the database a stored positive audio sample corresponding to the phonetically similar wake phrase; (Par. 0024:” In the present example, it may be determined that although only one official American English pronunciation [phoneme] of the word ‘pizza’ is found in the dictionary or pronunciation guide, there is some variation in how the word is pronounced by American speakers. For example, some American speakers may use a native Italian-speaker pronunciation of the word ‘pizza’. Based on a quantity of pronunciations criteria, for example, one of the hotword strength evaluation engines 120 may determine that a low to moderate quantity of pronunciations exist for the candidate hotword ‘pizza’, and thus may produce a corresponding feature score 124c [e.g., a high or medium score]”).
and, utilize the stored positive audio sample in the training of the model. (Par. 0031:”Further, in the present example, one of the hotword strength evaluation engines 120 may evaluate the transcription 136 and/or data provided by the training examples data store 122 to identify a quantity of pronunciations specified for the candidate hotword.”, and Par. 0050:” Referring to FIG. 1, for example, recorded instances [audio samples] of various individuals [e.g., spoken [audio] various words and phrases may maintained by the training examples data store 122.”, and Par. 0025:”To generate the hotword suitability score, for example, the hotword score generator 128 may use logistic regression or an alternative classifier to train a model that estimates a confidence value for the candidate hotword [e.g., the word ‘pizza’].”).
	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Rubin to generate a phoneme representation for the custom wake phrase for a phonetically similar wake phrase sharing phonetic features, from the database a stored positive audio sample corresponding to the phonetically similar wake phrase, and utilize the stored positive audio sample in the training of the model in order to evaluate the speech data or a transcription of the candidate hotword using one or more predetermined criteria, and providing a representation of the hotword suitability score for display to the user, as evidence by Rubin (See Par. 005).

Claims 10, and 19  are rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 1,and 12, respectively, in further view of Kracun et al. (WO2020005202A1)( hereinafter “Kracun”).

	With regard to claim 6, 10, and 19, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants.

	Kracun teaches the computing system of claim 1, wherein the custom wake phrase audio comprises a first wake phrase audio and a second wake phrase audio, the model comprising a neural network receiving input audio features of the positive audio samples of both the first and the second wake phrase audio and outputting one or more sub phrase units for the input audio features, (Par. 0025:”In some examples, the hotword-aware model is trained on an audio representation (e.g., acoustic features) of the hotword such as a sequence or string of the hotword.”, and Par. 0030 By training the hotword detector model to detect the presence of synthesized speech in audio input data, the hotword detector may advantageously use the hotword detector model to detect the presence of synthesized speech through an analysis of acoustic features of received audio input data without transcribing or semantically interpreting the audio input data”, and Par. 0048:” Additionally or alternatively, the hotword detector model 220 is a neural network.”, and Par. 0029:” ... the waveform generator may generate the output audio signal by using a neural network [e.g., based on WaveNet] to output an audio sequence of synthesized phonemes that represent the text input data.”).
output from one of the devices [e.g., the smart speaker] that is directed toward the user contains one or more words or sub-words that make up a hotword assigned to one of the other devices (e.g., the user’s tablet) in the environment”, and Par. 0036:”The hotword-aware model 320 may be trained on any combination of hotwords 130 assigned to nearby devices ...”, and Par. 0055:”In the case of the hotword-aware model 320, the data sets and results sets may be audio samples or text samples associated with the hotword 130, such as a phrase, a word, a sub-word, a text-to-speech sequence ... ”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Kracun to include the custom wake phrase audio comprising a neural network receiving input audio features of the positive audio samples of both the first and the second wake phrase audio and outputting one or more sub phrase units for the input audio features in order to discern an utterance of human speech directed at the system and an utterance of synthesized speech output from a nearby device not directed at the system, as evidence by Kracun (See Par. 002).

Claims 11, 20, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 1, 12, and 22 respectively, in further view of Shahid et al. (US20200349925A1)( hereinafter “Shahid”).

With regard to claim 11, 20, and 28, Griffin teaches a computing system for training custom phrase spotter executables for virtual assistants.
	With regard to claim 11, 20, and 28, Griffin does not teach the computing system of claim 1, 12, and 22 respectively, wherein the custom wake phrase audio comprises a plurality of wake phrase audio, the model comprising a recurrent neural network receiving input audio features of the positive audio samples of each of the plurality of wake phrase audio and outputting one or more hidden audio features, the model configured to detect a presence of any of the plurality of wake phrase audio.
	Shahid teaches wherein the custom wake phrase audio comprises a plurality of wake phrase audio, (Par. 0042:” wherein the custom wake word verification model further includes a general acoustic model [AM] that operates on the audio samples or the features extracted from the audio samples to predict a series of phonemes corresponding thereto.”, and Par. 0043:”… a confidence that the custom wake word was uttered.”).
	the model comprising a recurrent neural network receiving input audio features of the positive audio samples of each of the plurality of wake phrase audio and outputting one or more hidden audio features, (Par. 0047:”… the AM 330 can include a recurrent neural network [RNN] trained using a connectionist temporal classification [CTC] neural network [NN]. CTC refers to outputs and scoring and is independent of underlying NN structure. The RNN can include long short-term memory [LSTM] units.”, and Par. 0092:”There is nothing that limits the number of wake words that can be chosen by the user 130.”, and Par. 0049:”The AM 330 can audio features 342 and produce a series of likelihood vectors that the audio features 342 correspond to phonemes.”).
	the model configured to detect a presence of any of the plurality of wake phrase audio. (Par. 0045:” The system 300 can be implemented by the wake word model engine 142, such as using the wake word verifier 144. The system 300 operates to verify whether the audio [from which audio features 342 are extracted] includes an utterance of a wake word 344.”, and Par. 0063:”FIG. 4 illustrates, by way of example, a diagram of an embodiment of a system 400 for wake word verification. The system 400 can be implemented by the wake word model engine 142. In some embodiments, the AM 330 and the LM 334 can be implemented using a recurrent neural network transducer (RNNT). The system 400 is sometimes called an RNNT.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Shahid to include a plurality of wake phrase audio, the model comprising a recurrent neural network receiving input audio features of the positive audio samples of each of the plurality of wake phrase audio and outputting one or more hidden audio features in order to verify the detection of wake word at the device when utterance is made where the message can include audio samples or features extracted from the audio samples, as evidence by Shahid (See Par. 0004).

Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Griffin as applied to claim 22, in further view of Wang et al. (US 20190027138 A1)( hereinafter “Wang”).


	With regard to claim 25, Griffin does not teach the computing system of claim 22, further configured to: receive input, from a developer indicating a modification request to modify the model; responsive to the modification request, search within the corpus of audio samples, stored on the database of the computing system, for one or more additional stored positive audio samples corresponding to an additional custom wake phrase; and include the additional positive audio sample in the training of the model.
	Wang teaches the computing system of claim 22, further configured to: receive input, from a developer indicating a modification request to modify the model; (Par. 0028” One of the possible commands is a customization command to define a new wake-up utterance for the wake-up command. The default wake-up utterance might be “Computer” and now the user wants to change the utterance to “Gort.” The command logic 156 executes the customization command for users. When a user requests to customize a wake-up utterance for the hub 104, the user utters the customization command followed by the new wake-up utterance corresponding to the wake-up command.”, and Par. 0030:” The command logic 156 may update the model using the user's additional utterance of the wake-up command and/or updated training data”).
	responsive to the modification request, search within the corpus of audio samples, stored on the database of the computing system, for one or more additional stored positive audio samples corresponding to an additional custom wake phrase; (Par. 0030:” The wake-up utterance model includes features representing characteristics of the user's utterance of the update the model using the user's additional utterance of the wake-up command and/or updated training data. The model is associated with the user and stored in the command store 158.”, and Par. 0031:” In some implementations, after a voice print has been created for a user, the command logic 156 further updates the voice print using additional samples that are available as the user interacts with the command hub 104”).
	and include the additional positive audio sample in the training of the model. (Par. 0030:” The command logic 156 may update the model using the user's additional utterance of the wake-up command and/or updated training data. The model is associated with the user and stored in the command store 158.”, and Par. 0030:” In some embodiments, the model includes an utterance model that represents the user's utterance of the wake-up command.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Griffin in view of Wang to receive input from a developer indicating a modification request to modify the model; and include the additional positive audio sample in the training of the model in order to receive utterances from users and converts the utterances to commands from a vocabulary of predetermined commands, which includes a customization command to define a new wake-up utterance corresponding to a wake-up command, as evidence by Wang (See Par. 0003).




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689.  The examiner can normally be reached on Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 






/D.A./Examiner, Art Unit 2656                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
03/22/2021