Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding the prior art rejection, Applicant’s remarks regarding the amendments to the claim, Applicant’s arguments, see applicant’s remarks, filed 10/11/2022, with respect to the rejection(s) of claim(s) 21-40 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779).
The applicant further contends
II. Rejections under 35 U.S.C. 4 103 Applicant respectfully traverses the rejection of claims 21-26, 29, 31-32, 38 and 39 under 35 U.S.C. § 103 as allegedly unpatentable in view of Dai and Gan for the following reasons. The Office Action fails to establish a prima facie case of obviousness for at least the reason that the Office Action has neither properly determined the scope and content of the prior art nor properly ascertained the differences between the prior art and the claimed combination.
Nevertheless, solely to advance prosecution and without acquiescing to the rejection, Applicant amends independent claims 21, 39, and 40. 
The Office admits that Dai does not teach or suggest the claim elements of a "training command" and "based on the training command, training a speech model," and instead relies on Gan to teach or suggest at least these claim elements. Office Action at 6-7. Applicant respectfully disagrees. Gan is directed to voice activation technology using a wake up word and operation commands. Gan at Abstract. The Office states that Gan's disclosure of a "first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of the automatic speech recognition training" teaches or suggests the "training command" of the present application. See Office Action at 7. Applicant respectfully disagrees. Gan's disclosure requires both a "first voice command and the preset wake-up word," which does not teach or suggest the "training command" as claimed. Emphasis added. See Gan at page 4. 

	The examiner disagrees. The limitation merely recites “training command”, but does not recite boundaries or limitations to such broad limitation. As per MPEP 2111, the claim is interpreted in the broadest reasonable interpretation in light of the specification without reading the specification into the claim. As a result of the breath of the claim, a training command is interpreted as merely a command that requests training and Gan discloses such limitation via “first voice command and the preset wake-up word” which triggers automatic speech recognition training as indicated in the applicant’s remarks as well as Gan. 
Further, proposed amended claim 21 recites in part, "based on the training command, training a speech model, using training data with audio signals comprising speech and audio data labeled as containing a passphrase, to determine a match between a received voice signal and the processed audio data when the speech model is deployed on the user device," which neither Dai nor Gan teaches or suggests. The Office relies on Dai's teaching of "the established model corresponding to the user-defined wake-up phrase may be trained and stored in the cloud model library," to teach or suggest "training a speech model." Dai ¶ [0313]; see also Office Action at 5. Applicant respectfully submits that while Dai recites a training model, it does not teach or suggest how the model is trained. In particular, Dai does not recite the claim element of "using training data with audio signals comprising speech and audio data labeled as containing a passphrase," to train the data as recited in proposed amended claim 21. For at least this additional reason, the rejection should be withdrawn. 

The claimed limitation of “based on the training command, training a speech model, using training data with audio signals comprising speech and audio data labeled as containing a passphrase, to determine a match between a received voice signal and the processed audio data when the speech model is deployed on the user device” has been amended and changes the scope of the claim. Consideration of such limitation is found in the office action below.
For at least these reasons, proposed amended independent claim 21 is allowable over Dai in view of Gan. Proposed amended independent claims 39 and 40 although different in scope, includes similar limitations and are allowable for similar reasons. Claims 22-38 each depend from independent claim 1 and are allowable at least due to their dependency from an allowable base claim. The Office rejected claims 22-38 in view of other cited references Devries, Khoury, Park, Argenti, Benkreira, Wang, and Teo. None of these cited references cure the deficiencies of Dai and Gan. Thus, claims 22-38, which each depend from independent claim 21 are allowable at least due to their dependency from an allowable base claim. 
	
	The examiner disagrees. Please see the rebuttal above and office action below for explanation.
	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 21-26,29,31-32,38,39 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779).
Claim 21, Dai et al discloses 
at least one memory storing instructions (paragraphs 501-504); and 
one or more processors configured to execute the instructions to perform operations (paragraphs 501-504) comprising: 
receiving a speech model request from a user device (Fig. 6, label 407 discloses the voice signal the user inputs. By submitting a voice signal, such indicates a request for the electronic apparatus for a speech model including verification and user request. An example of the speech model process is in paragraph 278 in which a voice signal received is matched to a user-defined voice data with pre-set instructions pre-stored in the electronic device by the user. When a match is determined, the electronic apparatus performs the pre-set instructions in order to perform that task as requested.), 
the speech model request comprising audio data having an original audio data format (Paragraph 278,316 discloses the speech model request is received in the form of a voice signal or original audio data format.); 
processing the audio data by converting the audio data from the original audio data format into a different audio data format (Paragraphs 349-354 discloses embodiment 21, wherein embodiment 21 further discloses the matching of the voice signal with the one or more preset instructions (paragraph 316,307,308) comprises further instructions. Paragraph 352 discloses “comparing text data obtained by converting the voice signal with the user-defined text data in the first preset instruction and determining that a second determination result is MATCH if a similarity degree therebetween exceeds a second threshold; …” Since the text data is obtained by converting the voice signal, the text data is a data format of the audio or voice signal.); 
training a speech model (Paragraph 314,313 discloses the model in the cloud model library or established library may be trained and stored.), using training data with audio signals comprising speech (Paragraph 209 discloses “a training module configured for training a model to be used in the preset-command-unrelated voiceprint authentication using voice data of a specific scene to optimize the model.” Paragraph 210 discloses “The voice data of the specific scene may comprise frequently-used phrases and sentences in a scene where a voice engine is to be used.” Such paragraphs discloses the training data or voice data of a specific scene is used to train a model used in voiceprint authentication includes audio signals including speech.) 

to determine a match between a received voice signal and the processed audio data when the speech model is deployed on the user device (Paragraph 315,316 discloses matching the voice signal from the user or processed audio data when the speech model is deployed on the user device and the wake-up phrases pre-stored in the electronic apparatus, wherein the wake-up phrases pre-stored in the electronic apparatus are received voice signal or voice data of the user’s phrases per a model. Paragraph 307,308 discloses “the user inputs the user-defined wake-up phrase via an application in the electronic apparatus for setting the preset instructions using the user-defined voice data. In particular, the user inputs the user-defined wake-up phrase.” Such indicates that the user-defined wake-up phrase as stored in the local model library and cloud model library corresponds to a received voice signal. Paragraph 311 discloses the model corresponding to the user-defined wake-up phrase is transmitted to a local model library in the electronic apparatus from the server. (paragraph 310) Such local model library is used to select or confirm the user-defined wake-up phrase that is provided or recommended by the local model library to be used for comparison as indicated above. Such indicates the performance of the comparison is performed when the speech model or model at the electronic apparatus or user device.); and 
transmitting the trained speech model to the user device (Paragraph 310,311 discloses transmission of the speech model to the user device.).
Dai et al discloses training data for training a model for authentication comprising audio signals including speech (paragraph 209-210), but fails to disclose the training data comprises speech and audio data labeled as containing a passphrase. 
Piersol et al discloses speech recognition matching received feature vectors to language phonemes and words as known in the stored acoustic models and language models (paragraph 32), wherein the training data comprises speech and audio data labeled as containing a passphrase (paragraph 92 discloses training of the machine learning models using training examples of sample utterance audio along with labeled ground truths about utterance beginnings, utterance conclusions, existence of wakewords or passphrase, etc. Such disclosure indicates the training data or training examples of sample utterance audio comprises speech or utterance and audio data labeled as containing a passphrase or labeled ground truths of the existence of wakewords.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the training data as disclosed by Dai et al with training data as disclosed by Piersol et al so to optimize the learning model for better ability for speech recognition.
Dai et al discloses receiving a voice request or command (Fig. 6, label 407) and training a speech model as a result of the voice request or command (paragraph 314,313), but fails to disclose the received voice request or command comprises a training command and training the speech model is based on the training command.
Gan et al discloses voice data processing receiving a voice request or command (page 4 discloses “in step 101, the receiving user, based on the first voice, command and the preset wake-up word …”) comprise an audio data having an original audio data format (page 4, discloses “in step 101, …” receiving user command via voice or an original audio data format.) and a training command (page 4 discloses “receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of automatic speech recognition training and identifying the wake-up word …” Such is reiterated on page 13. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition.” Such indicates the voice request includes a training command.) and training the speech is based on the training command (Page 4 discloses “In this embodiment, for step 101, intelligent voice conversational terminal receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of the automatic speech recognition training …”. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition … request of automatic speech training and identification based on the starting default client end up words, training the automatic speech recognition model based on the preset wake-up word, wherein the request comprises the preset wake-up word and a first voice command.”). It would be obvious to one skilled in the art before the effective filing date of the application to modify the received voice command or request as disclosed by Dai et al to include training command and training the model according to the training command as disclosed by Gan et al so to improve recognition performance by training the model and improve the user’s experience by responding correctly to the user’s voice request or command.
Claim 22, Dai et al discloses wherein the speech model is a stored speech model previously trained to match voice signals. (paragraph 310,311 discloses the speech model is stored in the cloud model library and the local model library. Paragraph 313 discloses the established model may be trained and stored in the cloud model library.)  
Claim 23, Dai et al discloses the operations further comprising generating the speech model prior to training the trained speech model. (Paragraph 313 discloses the model corresponding to the user-defined wake-up phrase is established or generated if the wake-up phrase does not exist in the cloud model library. The established model is trained and stored. This indicates that the speech model or established model is generated prior to training the established model.)
Claim 24, Dai et al discloses the operations further comprising storing the speech model. (paragraph 310,311 discloses the speech model is stored in the cloud model library and the local model library.)  
Claim 25, Dai et al discloses 
wherein the speech model request is received at a storage system component of the authentication system (Paragraph 316 discloses the input voice signal is matched with one of the pre-stored wake-up phrases stored in the electronic apparatus. The pre-stored wake-up phrase is stored in the local model library with the established model (paragraph 313,315,314, Fig. 6, label 407,406). Such paragraphs indicate that the input voice signal or the speech model request is received at a storage system component or element of the electronic apparatus since the local model library with the pre-stored wake-up phrases stored in the library is compared with the input voice signal.), and 
the storage system component or element performs the processing of the audio data (Fig. 6, label 407,406 the processing of the audio data or audio data in the voice signal such as matching the pre-stored wake-up phrase with the input voice signal. Such is performed by the storage system component (local model library with electronic apparatus performing matching).).  
Claim 26, Dai et al discloses the operations further comprising transmitting an alert from the storage system component to an authentication module of the authentication system (Fig. 6, label 407 matches the input voice signal with a pre-stored wake-up phrase. The wake-up phrase acts as an alert to the system to determine whether the input voice signal is an authorized user. Paragraph 316 discloses pre-stored wake-up phrase and input voice signal indicates authentication of the voice signal. Label 407 indicates the authentication module. ), the alert comprising the processed audio data (Fig. 6, label 407, the input voice signal matches with the wake-up phrases.).
Claim 29, Dai et al discloses wherein transmitting the match result comprises transmitting the updated speech model (paragraph 313,314 disclose training the established model prior to storage and transmission of the trained established model.).  
Claim 31, Dai et al discloses wherein the speech model comprises at least one of a recurrent neural network model, a hidden Markov model, a discriminative learning model, a Bayesian learning model, a structured sequence learning model, or an adaptive learning model (paragraph 313,314 discloses the established model is trained and stored. This indicates that the established model is an adaptive learning model.).  
Claim 32, Dai et al discloses wherein the request comprises user data (Paragraph 316,352,351 discloses the voice signal or the request comprises user data or data pertaining the user’s input in the voice signal.) and the speech model is a previously trained speech model associated with the user data (Paragraph 313,314 discloses the speech model is previously trained model, for example when a model is not found in the cloud or local model library, a model is established and the established model is trained and stored to be transmitted to the electronic apparatus’ local model library.).
Claim 38, Dai et al discloses wherein the speech model comprises an algorithm to recognize a known speaker (paragraph 308 discloses the user defined wake up phrase input by the user is verified by a predetermined verification strategy. Such is part of the correlation of the user defined wake up phrase input by the user and corresponding operation or speech model.). 
Claim 39, Dai et al discloses 
receiving a speech model request from a user device (Fig. 6, label 407 discloses the voice signal the user inputs. By submitting a voice signal, such indicates a request for the electronic apparatus for a speech model including verification and user request. An example of the speech model process is in paragraph 278 in which a voice signal received is matched to a user-defined voice data with pre-set instructions pre-stored in the electronic device by the user. When a match is determined, the electronic apparatus performs the pre-set instructions in order to perform that task as requested.), 
the speech model request comprising audio data having an original data format (Paragraph 278,316 discloses the speech model request is received in the form of a voice signal.); 
processing the audio data by converting the audio data from the original audio data format into a different audio data format (Paragraphs 349-354 discloses embodiment 21, wherein embodiment 21 further discloses the matching of the voice signal with the one or more preset instructions (paragraph 316,307,308) comprises further instructions. Paragraph 352 discloses “comparing text data obtained by converting the voice signal with the user-defined text data in the first preset instruction and determining that a second determination result is MATCH if a similarity degree therebetween exceeds a second threshold; …” Since the text data is obtained by converting the voice signal, the text data is a data format of the audio or voice signal.); 
training a speech model (Paragraph 314,313 discloses the model in the cloud model library or established library may be trained and stored.), using training data with audio signals comprising speech (Paragraph 209 discloses “a training module configured for training a model to be used in the preset-command-unrelated voiceprint authentication using voice data of a specific scene to optimize the model.” Paragraph 210 discloses “The voice data of the specific scene may comprise frequently-used phrases and sentences in a scene where a voice engine is to be used.” Such paragraphs discloses the training data or voice data of a specific scene is used to train a model used in voiceprint authentication includes audio signals including speech.), 
to determine a match between a received voice signal and the processed audio data when the speech model is deployed on the user device (Paragraph 315,316 discloses matching the voice signal from the user or processed audio data when the speech model is deployed on the user device and the wake-up phrases pre-stored in the electronic apparatus, wherein the wake-up phrases pre-stored in the electronic apparatus are received voice signal or voice data of the user’s phrases per a model. Paragraph 307,308 discloses “the user inputs the user-defined wake-up phrase via an application in the electronic apparatus for setting the preset instructions using the user-defined voice data. In particular, the user inputs the user-defined wake-up phrase.” Such indicates that the user-defined wake-up phrase as stored in the local model library and cloud model library corresponds to a received voice signal. Paragraph 311 discloses the model corresponding to the user-defined wake-up phrase is transmitted to a local model library in the electronic apparatus from the server. (paragraph 310) Such local model library is used to select or confirm the user-defined wake-up phrase that is provided or recommended by the local model library to be used for comparison as indicated above. Such indicates the performance of the comparison is performed when the speech model or model at the electronic apparatus or user device.); and 
transmitting the trained speech model to the user device (Paragraph 310,311 discloses transmission of the speech model to the user device.).
Dai et al discloses training data for training a model for authentication comprising audio signals including speech (paragraph 209-210), but fails to disclose the training data comprises speech and audio data labeled as containing a passphrase. 
Piersol et al discloses speech recognition matching received feature vectors to language phonemes and words as known in the stored acoustic models and language models (paragraph 32), wherein the training data comprises speech and audio data labeled as containing a passphrase (paragraph 92 discloses training of the machine learning models using training examples of sample utterance audio along with labeled ground truths about utterance beginnings, utterance conclusions, existence of wakewords or passphrase, etc. Such disclosure indicates the training data or training examples of sample utterance audio comprises speech or utterance and audio data labeled as containing a passphrase or labeled ground truths of the existence of wakewords.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the training data as disclosed by Dai et al with training data as disclosed by Piersol et al so to optimize the learning model for better ability for speech recognition.
Dai et al discloses receiving a voice request or command (Fig. 6, label 407) and training a speech model as a result of the voice request or command (paragraph 314,313), but fails to disclose the received voice request or command comprises a training command and training the speech model is based on the training command.
Gan et al discloses voice data processing receiving a voice request or command (page 4 discloses “in step 101, the receiving user, based on the first voice, command and the preset wake-up word …”) comprise an audio data having an original audio data format (page 4, discloses “in step 101, …” receiving user command via voice or an original audio data format.) and a training command (page 4 discloses “receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of automatic speech recognition training and identifying the wake-up word …” Such is reiterated on page 13. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition.” Such indicates the voice request includes a training command.) and training the speech is based on the training command (Page 4 discloses “In this embodiment, for step 101, intelligent voice conversational terminal receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of the automatic speech recognition training …”. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition … request of automatic speech training and identification based on the starting default client end up words, training the automatic speech recognition model based on the preset wake-up word, wherein the request comprises the preset wake-up word and a first voice command.”). It would be obvious to one skilled in the art before the effective filing date of the application to modify the received voice command or request as disclosed by Dai et al to include training command and training the model according to the training command as disclosed by Gan et al so to improve recognition performance by training the model and improve the user’s experience by responding correctly to the user’s voice request or command.

Claim 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), further in view of Devries et al (US Patent No.: 10515637).
Claim 27, Dai et al and Gan et al fails to discloses the operations further destroying processed audio data stored on the storage system component.
Devries et al discloses the operations further destroying processed audio data stored on the storage system component. (Col. 4, lines 18-25 discloses deleting speech processing data associated with commands.) It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al in view of Gan et al by incorporating deleting of speech processing data so to remove any data that is no longer needed, hence improving the processor by clearing space in the database so to ready the database for further data that is needed by the processor.

Claim 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), further in view of Khoury et al (US Publication No.: 20180226079), and further in view of Park et al (US Publication No.: 20190371300).
Claim 28, Dai et al discloses the operations further comprising: 
receiving a matching request from the user device (Fig. 6, label 407), the matching request comprising additional audio data (Paragraph 350 discloses matching the voice signal with one or more preset instructions such as corresponding user defined text data comprises comparing audio data containing the voice signal and comparing text data obtained by converting the voice signal with the user defined text data.); 
	generating a match result based on the processed additional audio data (Fig. 6, label 407 compares the audio data with the user defined text data.); and 
transmitting the match result to the user device (paragraph 316 discloses transmission of the match result to the user device, which performs the wake-up operation. Similarly shown in Fig. 1, label accept, yes, wake up and perform corresponding operation indicates the matching result is transmitted to the user device. It would be obvious to one skilled in the art before the effective filing date of the application for the match result as shown in Fig. 6, label 407, discussed in paragraph 649 is transmitted to the user device to perform the operation as shown in Fig. 1, label accept, wake up and perform corresponding operation so to refuse or affirm the wake up operation and request from the user, hence improving the security of the user device to perform operation requested by an authorized user and improve user experience with speech recognition upon voice command or request.).
Dai et al fails to disclose updating the speech model based on the match result.
Khoury et al discloses updating the speech model based on the matched result (Fig. 3, label 312 as the model. Paragraph 53 discloses the test voiceprint received and compared with the voice model.) It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by updating the speech model based on the matched result as disclosed by Khoury et al so to compensate for changes in the voice by continuously updating the speech or voice model according to whether the matched output indicates the speaker is genuine or authenticate, hence improving the voice or speech model to accurately determine whether the speaker is genuine.
	Dai et al discloses audio data in the voice signal (paragraph 351), but fails to disclose processing the additional audio data by converting the additional audio data into the different audio data format.
	Park et al discloses processing the additional data by converting the additional audio data into the different audio data format (paragraph 88 discloses converting user voice signal into audio data or digital signal.). It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by generating audio data in the voice signal by conversion as disclosed by Park et al so to output audio data needed for matching process as described in paragraph 351 and improve security of the user device to perform actions or task as requested in the input voice signal from a genuine or authenticate speaker.

Claim 30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), and further in view of Argenti et al (US Publication No.: 20170308401).
Claim 30, Dai et al disclose processing the audio data of the authentication system (Fig. 6, label 407,406 processes the audio data at the data management module on the server by transmitting the audio data to the wake-up performance verification model to verify the performance of the model.), but fails to disclose such processing is performed by a container instance.
Argenti et al discloses processing captured audio by a container instance (paragraph 106 discloses “capture audio information using a microphone attached to the first mobile device 1008 and provide captured audio information to the first companion container instance 1020 for processing by the software function 1024”.)
It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by incorporating container instance as disclosed by Argenti et al so to isolate computing functionalities without overhead associated with starting and maintaining virtual machines for running separate user space instances (paragraph 37).

Claims 33 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), and further in view of Benkreira et al (US Publication No.: 20200013037).
Claim 33, Dai et al the authentication system (Fig. 6) comprises at least one scalable cloud service (Paragraph 290 discloses a cloud server as the predetermined apparatus associated with the electronic device and participates in authentication of the user as per Fig. 6, label 402. A scalable cloud service is a type of cloud server that provides cloud service. By disclosing a cloud server, it would be obvious to one skilled in the art before the effective filing date of the application for the cloud server to provide scalable cloud service since a cloud server provides cloud service and scalable cloud service is a type of cloud service.), but fails to disclose at least one scalable cloud service configured to generate and terminate container instances.
Benkreira et al discloses the authentication system (Fig. 4 shows the process of the server performing bill processing. Label 402 starts the process following an authentication process (paragraph 81).) comprising at least one scalable cloud service configured to generate and terminate container instances (paragraph 98 discloses the group billing server 106 is hosted on a cloud compute service, wherein a scalable cloud service is a type of cloud compute service and comprising termination of container instance upon completion of a job and generating container instances by providing instructions to the container instance.). It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al’s cloud server with termination and generation of container instances as disclosed by Benkreira et al so to provide security benefits to system 100 by protecting personal or financial information (paragraph 65).

Claims 34,36,40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), and further in view of Wang et al (US Publication No.: 20190027138).
Claim 34, Dai et al discloses training the model (Paragraph 314,313 discloses the model in the cloud model library or established library may be trained and stored.) using training data (Paragraphs 312-314discloses training the model using user utterance of wake-up phrase.), but fails to disclose the training data comprising the processed audio data and audio data associated with a plurality of individuals. 
Wang et al discloses a speech model is trained using the training data comprising the processed audio data and audio data associated with a plurality of individuals (paragraph 19 discloses speech model in the form of a phones model of user utterances. The model is trained using training data such as utterances of the wake-up utterance by users. Paragraph 19 further discloses the wake-up utterances are processed such as speech or phrase recognition.).
It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by training the speech model with data as disclosed by Wang et al so to improve accuracy of the speech model used to authenticate or verify the user and increase security.
Claim 36, Dai et al fails to disclose storing the processed audio data as reference data associated with a user.
Wang et al discloses storing the processed audio data as reference data associated with a user (Fig. 2, label 226 stores the word that is received when requesting a speech model such as uttering the wake-up command. Paragraph 19 discloses analyzes user’s voice input in reference to the model to detect whether a user has uttered the wake-up utterance, wherein the model is trained based on wake-up utterances by users and each model is associated to a user and associated wake-up utterance.).
	It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by store the processed audio data as disclosed by Wang et al so to improve accuracy of the speech model used to authenticate or verify the user and increase security.
Claim 40, Dai et al discloses 
at least one memory storing instructions (paragraphs 501-504); and 
one or more processors configured to execute the instructions to perform operations (paragraphs 501-504) comprising: 
receiving a speech model request from a user device (Fig. 6, label 407 discloses the voice signal the user inputs. By submitting a voice signal, such indicates a request for the electronic apparatus for a speech model including verification and user request. An example of the speech model process is in paragraph 278 in which a voice signal received is matched to a user-defined voice data with pre-set instructions pre-stored in the electronic device by the user. When a match is determined, the electronic apparatus performs the pre-set instructions in order to perform that task as requested.), the speech model request comprising user data and audio data having an original data format (Paragraph 278,316 discloses the speech model request is received in the form of a voice signal.); 
processing the audio data by converting the audio data from the original audio data format into a different audio data format (Paragraphs 349-354 discloses embodiment 21, wherein embodiment 21 further discloses the matching of the voice signal with the one or more preset instructions (paragraph 316,307,308) comprises further instructions. Paragraph 352 discloses “comparing text data obtained by converting the voice signal with the user-defined text data in the first preset instruction and determining that a second determination result is MATCH if a similarity degree therebetween exceeds a second threshold;…” Since the text data is obtained by converting the voice signal, the text data is a data format of the audio or voice signal.); 
retrieving a stored speech model associated with the user data (Paragraph 316 discloses “the user inputs a voice signal to wake up the electronic apparatus. The voice signal is first compared with the wake-up phrases pre-stored in the electronic apparatus …” This indicates that the stored speech model is retrieved, wherein such speech model is associated with the user data such as voice signal including wake-up phrase.); 
training the stored speech model (Paragraph 314,313 discloses the model in the cloud model library or established library may be trained and stored.), using training data with audio signals comprising speech (Paragraph 209 discloses “a training module configured for training a model to be used in the preset-command-unrelated voiceprint authentication using voice data of a specific scene to optimize the model.” Paragraph 210 discloses “The voice data of the specific scene may comprise frequently-used phrases and sentences in a scene where a voice engine is to be used.” Such paragraphs discloses the training data or voice data of a specific scene is used to train a model used in voiceprint authentication includes audio signals including speech.), 
to determine a match between a received voice signal and the processed audio data when the speech model is deployed on the user device (Paragraph 315,316 discloses matching the voice signal from the user or processed audio data when the speech model is deployed on the user device and the wake-up phrases pre-stored in the electronic apparatus, wherein the wake-up phrases pre-stored in the electronic apparatus are received voice signal or voice data of the user’s phrases per a model. Paragraph 307,308 discloses “the user inputs the user-defined wake-up phrase via an application in the electronic apparatus for setting the preset instructions using the user-defined voice data. In particular, the user inputs the user-defined wake-up phrase.” Such indicates that the user-defined wake-up phrase as stored in the local model library and cloud model library corresponds to a received voice signal. Paragraph 311 discloses the model corresponding to the user-defined wake-up phrase is transmitted to a local model library in the electronic apparatus from the server. (paragraph 310) Such local model library is used to select or confirm the user-defined wake-up phrase that is provided or recommended by the local model library to be used for comparison as indicated above. Such indicates the performance of the comparison is performed when the speech model or model at the electronic apparatus or user device.), and 
storing the trained speech model (Paragraph 314,313 discloses the model in the cloud model library or established library may be trained and stored.); 
transmitting the trained speech model to the user device (Paragraph 310,311 discloses transmission of the speech model to the user device.).
Dai et al discloses training data for training a model for authentication comprising audio signals including speech (paragraph 209-210), but fails to disclose the training data comprises speech and audio data labeled as containing a passphrase. 
Piersol et al discloses speech recognition matching received feature vectors to language phonemes and words as known in the stored acoustic models and language models (paragraph 32), wherein the training data comprises speech and audio data labeled as containing a passphrase (paragraph 92 discloses training of the machine learning models using training examples of sample utterance audio along with labeled ground truths about utterance beginnings, utterance conclusions, existence of wakewords or passphrase, etc. Such disclosure indicates the training data or training examples of sample utterance audio comprises speech or utterance and audio data labeled as containing a passphrase or labeled ground truths of the existence of wakewords.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the training data as disclosed by Dai et al with training data as disclosed by Piersol et al so to optimize the learning model for better ability for speech recognition.
Dai et al discloses receiving a voice request or command (Fig. 6, label 407) and training a speech model as a result of the voice request or command (paragraph 314,313), but fails to disclose the received voice request or command comprises a training command and training the speech model is based on the training command.
Gan et al discloses voice data processing receiving a voice request or command (page 4 discloses “in step 101, the receiving user, based on the first voice, command and the preset wake-up word …”) comprise an audio data having an original audio data format (page 4, discloses “in step 101, …” receiving user command via voice or an original audio data format.) and a training command (page 4 discloses “receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of automatic speech recognition training and identifying the wake-up word …” Such is reiterated on page 13. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition.” Such indicates the voice request includes a training command.) and training the speech is based on the training command (Page 4 discloses “In this embodiment, for step 101, intelligent voice conversational terminal receiving user of the first voice command, and based on the first voice command and the preset wake-up word to the server end sending the starting default request of the automatic speech recognition training …”. Page 13 discloses “receiving a first voice command of the user, based on the first voice command and the preset wake-up word to the server end sends the request of starting the preset wake-up word automatic speech recognition training and recognition … request of automatic speech training and identification based on the starting default client end up words, training the automatic speech recognition model based on the preset wake-up word, wherein the request comprises the preset wake-up word and a first voice command.”). It would be obvious to one skilled in the art before the effective filing date of the application to modify the received voice command or request as disclosed by Dai et al to include training command and training the model according to the training command as disclosed by Gan et al so to improve recognition performance by training the model and improve the user’s experience by responding correctly to the user’s voice request or command.
Dai et al fails to disclose storing the trained speech model and the processed audio data and the training data comprising the processed audio data and audio data associated with a plurality of individuals.
Wang et al discloses a speech model is trained using the training data comprising the processed audio data and audio data associated with a plurality of individuals (paragraph 19 discloses speech model in the form of a phones model of user utterances. The model is trained using training data such as utterances of the wake-up utterance by users. Paragraph 19 further discloses the wake-up utterances are processed such as speech or phrase recognition.); and 
	storing the trained speech model and the processed audio data (Fig. 2, label 226 stores the word that is received when requesting a speech model such as uttering the wake-up command.).
	It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by training the speech model with data as disclosed by Wang et al and store such trained speech model and the processed audio data as disclosed by Wang et al so to improve accuracy of the speech model used to authenticate or verify the user and increase security.

	Claim 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), and further in view of Teo et al (US Publication No.: 20200097980).
	Claim 35, Dai et al discloses training the model includes optimizing model parameter (paragraph 313 discloses the model corresponding to the user-defined wake-up phrase is trained and stored, wherein the recommendation of a wake-up phrase is based on training result of model optimization indicates training is performed to optimize model parameter.), but fails to disclose training includes optimizing hyperparameter.
	Teo et al discloses “a data warehouse 310 stores information regarding each individual user which can be used as features to train models” (paragraph 54), wherein a hyperparameter is used to indicate the optimized or desired result variable for each model is being trained. ((paragraph 54) Paragraph 55 discloses “… performance of each model may be retained for monitoring purposes to evaluate the efficiency of the model. Models may be re-trained periodically based upon observed performance and individual models may be tuned to obtain the desired behavior.”) It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al’s training of the model by incorporating optimizing hyperparameter as disclosed by Teo et al so to obtain desired behavior of the model, hence improving efficiency of the model.

Claim 37 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dai et al (US Publication No.: 20150142438) in view of Piersol et al (US Publication No.: 20200279552), further in view of Gan et al (CN Patent No.: CN109147779), and further in view of Khoury et al (US Publication No.: 20180226079).
Claim 37, Dai et al discloses training the speech model is performed by the speech module (paragraph 313,314 discloses the established model is trained and training is performed.).  
Dai et al fails to disclose the operations include transmitting, from an authentication module of the authentication system to a speech module of the authentication system, instructions to train the speech model. 
Khoury et al discloses the operations include transmitting, from an authentication module of the authentication system to a speech module of the authentication system, instructions to train the speech model.  (Fig. 3, label 334,332,330 as the authentication module transmits the update, label 340, to update or train the speech model, label 312, wherein label 312 is considered the speech module.) It would be obvious to one skilled in the art before the effective filing date of the application to modify Dai et al by updating the speech model based on the matched result as disclosed by Khoury et al so to compensate for changes in the voice by continuously updating the speech or voice model according to whether the matched output indicates the speaker is genuine or authenticate, hence improving the voice or speech model to accurately determine whether the speaker is genuine.



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LINDA WONG/Primary Examiner, Art Unit 2655