Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 1/28/2022 have been fully considered but they are not persuasive. 
The applicant contends
Amended independent claim 1 recites: A user authentication method, performed by an electronic apparatus, of authenticating a user based on an utterance input, the user authentication method comprising: 
obtaining an input audio signal based on the utterance input of the user; 
obtaining, from the input audio signal, at least one audio signal of an utterance section and at least one audio signal of a non- utterance section, wherein the at least one audio signal of the utterance section is generated based on a speech of the user, and the at least one audio signal of the non-utterance section is not generated based on the speech of the user; 
generating environment information indicating an environment in which the utterance input is received, based on the at least one audio signal of the non-utterance section; 
obtaining a result of a comparison between the generated environment information and registration environment information indicating an environment in which a registration utterance input included in a previously registered registration audio signal of the user is received, wherein the registration audio signal comprises at least one audio signal of a registration utterance section generated based on a previous speech of the user; 
adjusting an authentication criterion for authenticating the user based on the result of the comparison between the generated environment information and registration environment information; and 
authenticating the user based on the adjusted authentication criterion and the input audio signal, wherein the adjusting of the authentication criterion comprises: 
calculating a first average energy value of the at least one audio signal of the utterance section; 
calculating a second average energy value of the at least one audio signal of the registration utterance input; and 
adjusting a threshold corresponding to the authentication criterion based on a result of a comparison between the first average energy value and the second average energy value.

Applicant respectfully submits that claim 1 is patentable because the combined references do not disclose or suggest each and every element of the claim. For example, Applicant respectfully submits that Huang fails to disclose or suggest "obtaining, from the input audio signal, at least one audio signal of an utterance section and at least one audio signal of a non- utterance section, wherein the at least one audio signal of the utterance section is generated based on a speech of the user, and the at least one audio signal of the non-utterance section is not generated based on the speech of the user; obtaining a result of a comparison between the generated environment information and registration environment information indicating an environment in which a registration utterance input included in a previously registered registration audio signal of the user is received, wherein the registration audio signal comprises at 
"wherein the adjusting of the authentication criterion comprises "calculating a first average energy value of the at least one audio signal of the utterance section; calculating a second average energy value of the at least one audio signal of the registration utterance input; and adjusting a threshold corresponding to the authentication criterion based on a result of a comparison between the first average energy value and the second average energy value" as claimed. 

The examiner disagrees. The office action below clearly correlates the prior art and recited limitations as highlighted in the applicant’s remarks. Please see the office action below.
As discussed in Applicant's previous remarks, at best, Huang only discloses selecting context using SNR (speech to noise ratio). (See Huang, paragraphs [0027], [0069] and FIG. 4). In particular, Applicant submits that Huang discloses inputting noise type level (411) and SNR (420) to a context identification unit, and selecting context corresponding to SNR closest to SNR (420). 
However, Applicant submits that SNR is just a number that identifies how much noise is generated in the surrounding environment, and is independent of the energy value of the voice signal caused by the user's utterance. Furthermore, Applicant submits that the first average energy value and the second average energy value are related to energy value of signals, and are not related to noise. 
In response to these arguments, on pages 4-5 of the Office Action, the Examiner asserts that: 
The examiner disagrees. The recited limitation states "calculating a second average energy value of the at least one audio signal of the registration utterance" as opposed to "the voice signal caused by the user's utterance" as indicated in the applicant's remarks. The "at least one audio signal of the registration utterance" is a registered utterance input "included in a previously registered registration audio signal of the user".

The examiner disagrees. The applicant’s remarks are similarly made in the previous response. The previous office action addresses such remarks regarding the SNR. As indicated in the previous office action, the claimed language merely recites “average energy value”, specifically, “average energy value of the at least one audio signal of the utterance session” and “average energy value of the at least one audio signal of the registration utterance input”. Such limitation merely recites a value of the energy, specifically average energy. Such limitation does not indicate the “average energy value”, whether it be “a first average energy value” and “a second average energy value”, excludes the noise and/or speech signal. Furthermore, the previous office action as well 
Without conceding to the merits of the Examiner's assertions, Applicant has amended claim 1 to specify that "the at least one audio signal of the utterance section is generated based on a speech of the user, and that "at least one audio signal of a registration utterance section generated based on a previous speech of the user", as set forth above.

	Although the independent claims have been amended, the limitation “the at least one audio signal of the utterance section is generated based on a speech of the user” and “at least one audio signal of a registration utterance section generated based on a previous speech of the user” are new and considered in the office action below. Please see the office action below.
Accordingly, Applicant submits that nothing in Huang discloses determining an average energy value of an audio signal generated based on a speech of a user, and an average energy value of an audio signal generated based on a previous speech of the user, comparing the two average energy values, and adjusting an authentication criterion based on the comparison. 
Therefore, Huang fails to disclose "obtaining, from the input audio signal, at least one audio signal of an utterance section and at least one audio signal of a non-utterance section, wherein the at least one audio signal of the utterance section is generated based on a speech of the user, and the at least one audio signal of the non-utterance section is not generated based on the speech of the user; obtaining a result of a comparison between the generated environment information and registration environment information indicating an environment in which a registration utterance input included in a previously registered registration audio signal of the user is received, wherein the registration audio signal comprises at least one audio signal of a registration utterance section generated based on a previous speech of the user; [and] adjusting an authentication criterion for authenticating the user based on the result of the comparison between the generated environment information and registration environment information" "wherein the adjusting of the authentication criterion comprises "calculating a first average energy value of the at least one audio signal of the utterance section; calculating a second average energy value of the at least one audio signal of the registration utterance input; and adjusting a threshold corresponding to the authentication criterion based on a result of a comparison between the first average energy value and the second average energy value" as recited in claim 1. Accordingly, Applicant respectfully submits that claim 1 is patentable because each and every element of the claim is not disclosed or suggested by the combined references. 

	The examiner disagrees. Such remarks are merely statements without reasoning and explanation as to why the applicant does not believe the prior art discloses the 
Accordingly, Applicant respectfully submits that claim 1 is patentable because each and every element of the claim is not disclosed or suggested by the combined references. 
Regarding independent claims 12 and 20, Applicant respectfully submits that this claim is patentable for at least similar reasons as those provided above with reference to claim 1. 

	The examiner disagrees. Please see the rebuttal above.
Regarding dependent claims 2-11 and 13-19, Applicant respectfully submits that these 
claims are patentable for at least the reasons set forth above due to their respective dependencies. 

	Such claims are dependent on respective independent claims. Please see the office action below and rebuttal above. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al (US Publication No.: 20180293988) in view of Boregowda et al (US Publication No.: 20120010887).

	Preamble: A user authentication method, performed by an electronic apparatus, of authenticating a user based on an utterance input (Fig. 1 shows an electronic apparatus performing the user authentication method shown in Fig. 2-4.), the user authentication method comprising:
	obtaining an input audio signal based on the utterance input of the user (Fig. 2, label 201 receives an input audio signal based on the user speaking, label 101. Paragraph 31 defines the audio input signal.);
	 obtaining, from the input audio signal (label 111), at least one audio signal of an utterance section and at least one audio signal of a non-utterance section (Fig. 2, label microphone obtains an audio data, label AD. Paragraph 34 discloses 211 is a conversion of the audio input signal, which indicates utterance or speech and noise (paragraph 31).),
wherein the at least one audio signal of the utterance section is generated based on a speech of the user (Fig. 1, label 101,111,201,211 shows the user’s speech is received. Paragraph 31 discloses the audio signal includes noise and speech. Fig. 4, label 406,412 indicates the speech or signal to noise ratio which indicates the audio signal include speech or signal.), and the at least one audio signal of the non-utterance section is not generated based on the speech of the user (Fig. 1, label 101,111,201,211, Fig. 4, label 406,412. Paragraph 31 discloses the audio signal includes noise and speech.);
	generating environment information indicating an environment in which the utterance input is received, based on the at least one audio signal of the non-utterance section (Fig. 3, label 308, Fig. 4, label 402 determines the environment of the audio data based on the noise of the audio signal, label 111,211 of Fig. 2.);

	  adjusting an authentication criterion for authenticating the user based on the result of the comparison (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304. Label 308, Fig. 4, label 404 performs the comparison as indicated above.) between the generated environment information (Fig. 4, label 404 determines the generated environment information or noise or background noise information.) and registration environment information (Fig. 4, label 408 indicates the registered environment information.); and
	authenticating the user based on the adjusted authentication criterion and the input audio signal. (Fig. 3, label 304 authenticates the speaker based on the adjusted 
	wherein the adjusting of the authentication criterion (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304.) comprises:
calculating a first average energy value of the at least one audio signal of the utterance section (Fig. 4, label SNR 420);
calculating a second average energy value of the at least one audio signal of the registration utterance (Paragraph 27 discloses a context database includes context score distributions indexed by context and having associated data with each context entry. The context database may include a development corpus for different combinations of noise types and/or SNRS.); and
adjusting the threshold (Fig. 3, label 306,308, Fig. 4, label 414) based on a result of a comparison between the first average energy value and the second average energy value (Paragraph 69 discloses “the context is selected by first determining the noise type label and as a sub-class, then selecting the context with the closest SNR to the SNR 420 and within the noise class of noise type label 411.”). 
	Huang et al discloses a registration utterance input included in a previously registered registration audio signal of the user is received (Paragraph 59 discloses “a number of training (or pretraining) or learning trials are performed to provide samples of audio signals or data of noise recordings for each specific desired noise type. By one form, the learning trails include the capture of acoustic waves in a controlled environment with intentionally set and known levels or contexts. … The acoustic environment in which acoustic waves are created and captured, and that affect the audio 
	Boregowda et al discloses the registration audio signal comprises at least one audio signal of a registration utterance section generated based on a previous speech of the user (Paragraph 57 discloses voice training data that is stored in a storage device such as a personal speech profile data (PSPD) storage device. “The PSPD storage device is configured to store voice training data (e.g., in voice training data storage block 254), which is based on speech captured during previous speech/voice sessions (e.g., the initial training sessions and/or subsequent sessions).” Such indicates the registration audio signal, such as speech profile data of the voice training data shown in Fig. 2, label 250 comprises at least one audio signal of registration utterance section such as speech captured during previous speech/voice sessions, wherein such audio signal is based on previous speech of the user such as profile data of users.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the storage of training trails with samples of audio as disclosed by Huang et al by incorporating training data of profiles of users using the electronic apparatus as disclosed by Boregowda et al so to readily detect the user’s voice so to improve authentication or verification of a speech input as the authentic person and prevent impersonators.
	Claim 2, Huang et al discloses wherein the obtaining of the at least one audio signal of the utterance section and the at least one audio of the non-utterance section (Fig. 2, label microphone obtains an audio data, label AD. Paragraph 34 discloses 211 is a conversion of the audio input signal, which indicates utterance or speech and noise (paragraph 31).) comprises:

	extracting an audio feature of the plurality of frames (Paragraph 48 discloses feature vector for each unit or sample of the captured audio.); and 
based on the extracted audio feature (Paragraph 48), distinguishing frames corresponding to the utterance section from frames corresponding to the non-utterance section among the plurality of frames (paragraph 48 discloses feature vector for each unit or sample of the captured audio is used to determine probabilities for each or individual noise types. Paragraph 46 discloses separates the voice signal from the noise signal.).
Claim 3, Huang et al discloses wherein the environment information is generated based on an audio feature of the frames corresponding to the non-utterance section (Paragraph 62 discloses label 404 outputs the indication of the type of noise in the input audio. Label 408 includes the noise type models or audio data for each sample of audio data and for each or individual implemented noise classes.).
Claim 4, Huang et al discloses wherein the environment information comprises information about a plurality of situations (Paragraph 62 discloses noise type models for each sample of audio data and for each or individual implemented noise classes.), 
information indicating the situation in which the audio signal is received (Paragraph 59 discloses the noise types are noise from different types of situations, wherein each noise type indicates the situation the audio signal is received such as traffic, etc.), and 
information about a plurality of vectors corresponding to the plurality of situations (Paragraph 60 discloses the noise type model 408 may include a large dataset of instances of different noise classes such as that a mixture model or other type of 
Claim 5, Huang et al discloses wherein the authenticating of the user (Fig. 2-4) comprises:
obtaining the previously registered registration audio signal (Fig. 4, label 408, paragraph 60 discloses label 408 may be pretrained with recorded and noise types sampled from the audio data.);
obtaining the audio signal of the registration utterance section from the registration audio signal (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. Such obtained noise samples or obtained clean speech samples are used to generate each specific desired noise type.); and
authenticating the user by comparing the at least one audio signal of the utterance section and the audio signal of the registration utterance section (Fig. 3, label 308, Fig. 4, label 402 compares the at least one audio signal of the utterance section (Fig. 3, label audio signal) and the noise type models, which is generated based on the audio signal of the registration utterance section or obtained noise samples and clean speech samples  (paragraph 59).).
Claim 6, Huang et al discloses wherein the registration audio signal (Fig. 4, label 408 outputs the noise type model, which includes noise type affecting the audio data (paragraph 59).) comprises at least one audio signal of a registration utterance section and at least one audio signal of a registration non-utterance section (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. 
wherein the at least one audio signal of the registration non-utterance section is used to generate registration environment information indicating a situation in which the utterance input corresponding to the registration audio signal is received (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. Paragraph 60 discloses the recorded noise type audio signals are used to train a node type model. The noise type model may include situations that can occur during a user utterance such as a babble (cafeteria) noise class, a side noise class, etc. (paragraph 60)).
Claim 7, Huang et al discloses wherein the adjusting of the authentication criterion comprises adjusting a threshold corresponding to a similarity between the at least one audio signal of the utterance section and the at least one audio signal of the registration utterance section (Fig. 3, label 306 is adjusted to generate a threshold indicating the similarity between the audio signal, label audio signal and stored attributes of the audio signal, such as noise type model, Fig. 4, label 408.).
Claim 8, Huang et al discloses wherein the adjusting of the threshold comprises:
selecting the threshold from a preset threshold table (paragraph 104 discloses determine a threshold from “looking up a pre-calculated threshold value on a table including associated possible confidence indicator values”.) based on a length of the utterance section and a length of the registration utterance section (Fig. 3, label 306 adjusts the threshold based on label 312. Fig. 4, label 402 shows the parameters used to adjust the threshold. The noise of the audio signal is based on the noise found in the audio signal, which includes noise and speech (Fig. 3, label 301 outputs noise and 
adjusting the threshold based on a result of a comparison between the environment information and the registration environment information (Fig. 4, label 404, paragraph 62 discloses comparing the environment information, noise from label 211 and registered environment information, label 408. Label 306 of Fig. 3 adjusts the threshold based on the comparison.).
Claim 9, Huang et al discloses wherein the selected threshold is adjusted (Fig. 3, label 306) based on a similarity between a vector corresponding to the environment information and a vector corresponding to the registration environment information (paragraph 62 discloses the comparison of the audio data 211 and the noise type models or audio data model 408, wherein “the noise type model … may include a large dataset of instances of different noise classes …”. Paragraph 48 discloses the comparison is performed using a sequence of short samples of the input audio signal.).
Claim 11, Huang et al discloses wherein the threshold is adjusted based on at least one of a parameter value previously set based on a timbre of a voice of the user or a parameter value previously set based on a characteristic of the electronic apparatus (Paragraph 27 discloses context score distributions which indicates various combinations of noise types and/or SNRs. Each combination or context includes audio signal data that forms the context, a context score distribution, and a context index 
Claim 12, Huang et al discloses
a microphone (Fig. 2, label microphone);
a memory (Fig. 11, label 1106); and 
at least one processor (Fig. 11, label 1102),
wherein the at least one processor (Fig. 11, label 1102) is configured to:
obtain an input audio signal based on an utterance input of a user received through the microphone (Fig. 2, label microphone. Fig. 2, label 201 receives an input audio signal based on the user speaking, label 101. Paragraph 31 defines the audio input signal.),
distinguish, from the input audio signal, at least one audio signal of an utterance section and at least one audio signal of a non-utterance section (Fig. 2, label microphone obtains an audio data, label AD. Paragraph 34 discloses 211 is a conversion of the audio input signal, which indicates utterance or speech and noise (paragraph 31). Fig. 3, label 301, paragraph 46), 
wherein the at least one audio signal of the utterance section is generated based on a speech of the user (Fig. 1, label 101,111,201,211 shows the user’s speech is received. Paragraph 31 discloses the audio signal includes noise and speech. Fig. 4, label 406,412 indicates the speech or signal to noise ratio which indicates the audio signal include speech or signal.), and the at least one audio signal of the non-utterance section is not 
generate environment information indicating an environment in which the utterance input is received (Fig. 3, label 308, Fig. 4, label 402 determines the environment of the audio data based on the noise of the audio signal, label 111,211 of Fig. 2.),
obtain a result of a comparison (Fig. 4, label 404. Paragraph 62 discloses label 404 compares the audio data 211 to noise type models or audio data model 408 for each sample of audio data 211 and for each or individual implemented noise classes.) between the generated environment information (Fig. 4, label 404 determines the generated environment information or noise or background noise information.) and registration environment information (Fig. 4, label 408 indicates the registered environment information.) indicating an environment in which a registration utterance input corresponding to a previously registered registration audio signal of the user is received (Paragraph 59 discloses “The noise types may be any internal or external factor that an SR system can classify and that may create or influence the acoustic environment in which acoustic waves are created and captured and that affect the audio data. …” Paragraph 60 discloses recorded noise type audio signals are used to train a noise type model, label 408.),
based on a result of the comparison between the generated environment information (Fig. 4, label 404 determines the generated environment information or noise or background noise information.) and registration environment information (Fig. 4, label 408 indicates the registered environment information.), adjust an authentication criterion for authenticating the user (Fig. 3, label 306 adjusts the 
authenticate the user based on the adjusted authentication criterion and
the input audio signal (Fig. 3, label 304, Fig. 2, label 210),
wherein the at least one processor is further configured to:
calculate a first average energy value of the at least one audio signal of the utterance section (Fig. 4, label SNR 420);
calculate a second average energy value of the at least one audio signal of the registration utterance (Paragraph 27 discloses a context database includes context score distributions indexed by context and having associated data with each context entry. The context database may include a development corpus for different combinations of noise types and/or SNRS.); and
adjust a threshold (Fig. 3, label 306,308, Fig. 4, label 414) corresponding to the authentication criterion (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304.) based on a result of a comparison between the first average energy value and the second average energy value (Paragraph 69 discloses “the context is selected by first determining the noise type label and as a sub-class, then selecting the context with the closest SNR to the SNR 420 and within the noise class of noise type label 411.”). 
Huang et al discloses a registration utterance input included in a previously registered registration audio signal of the user is received (Paragraph 59 discloses “a number of training (or pretraining) or learning trials are performed to provide samples of audio signals or data of noise recordings for each specific desired noise type. By one form, the learning trails include the capture of acoustic waves in a controlled environment with 
	Boregowda et al discloses the registration audio signal comprises at least one audio signal of a registration utterance section generated based on a previous speech of the user (Paragraph 57 discloses voice training data that is stored in a storage device such as a personal speech profile data (PSPD) storage device. “The PSPD storage device is configured to store voice training data (e.g., in voice training data storage block 254), which is based on speech captured during previous speech/voice sessions (e.g., the initial training sessions and/or subsequent sessions).” Such indicates the registration audio signal, such as speech profile data of the voice training data shown in Fig. 2, label 250 comprises at least one audio signal of registration utterance section such as speech captured during previous speech/voice sessions, wherein such audio signal is based on previous speech of the user such as profile data of users.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the storage of training trails with samples of audio as disclosed by Huang et al by incorporating training data of profiles of users using the electronic apparatus as disclosed by Boregowda et al so to readily detect the user’s voice so to improve authentication or verification of a speech input as the authentic person and prevent impersonators.
Claim 13, Huang et al discloses
the at least one processor is further configured to: 
	split the input audio signal into a plurality of frames (paragraph 48 discloses noise is generated from samples or frames of the captured audio.);

based on the extracted audio feature (Paragraph 48), distinguishing frames corresponding to the utterance section from frames corresponding to the non-utterance section among the plurality of frames (paragraph 48 discloses feature vector for each unit or sample of the captured audio is used to determine probabilities for each or individual noise types. Paragraph 46 discloses separates the voice signal from the noise signal.).
Claim 14, Huang et al discloses wherein the at least one processor is further configured to generate the environment information based on an audio feature of the frames corresponding to the non-utterance section (Paragraph 62 discloses label 404 outputs the indication of the type of noise in the input audio. Label 408 includes the noise type models or audio data for each sample of audio data and for each or individual implemented noise classes.).
Claim 15, Huang et al discloses wherein the at least one processor is further configured to: 
obtaining the previously registered registration audio signal (Fig. 4, label 408, paragraph 60 discloses label 408 may be pretrained with recorded and noise types sampled from the audio data.);
obtaining the audio signal of the registration utterance section from the registration audio signal (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. Such obtained noise samples or obtained clean speech samples are used to generate each specific desired noise type.); and

Claim 16, Huang et al discloses wherein the registration audio signal (Fig. 4, label 408 outputs the noise type model, which includes noise type affecting the audio data (paragraph 59).) comprises at least one audio signal of a registration utterance section and at least one audio signal of a registration non-utterance section (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. Such obtained noise samples or obtained clean speech samples are used to generate each specific desired noise type.), and
 wherein the at least one audio signal of the registration non-utterance section is used to generate registration environment information indicating a situation in which the utterance input corresponding to the registration audio signal is received (Paragraph 59 discloses obtaining noise samples are separated from obtained clean speech samples. Paragraph 60 discloses the recorded noise type audio signals are used to train a node type model. The noise type model may include situations that can occur during a user utterance such as a babble (cafeteria) noise class, a side noise class, etc. (paragraph 60)).
Claim 17, Huang et al discloses wherein the adjusting of the authentication criterion comprises adjusting a threshold corresponding to a similarity between the at least one audio signal of the utterance section and the at least one audio signal of the 
Claim 18, Huang et al discloses wherein the adjusting of the threshold comprises:
select the threshold from a preset threshold table (paragraph 104 discloses determine a threshold from “looking up a pre-calculated threshold value on a table including associated possible confidence indicator values”.) based on a length of the utterance section and a length of the registration utterance section (Fig. 3, label 306 adjusts the threshold based on label 312. Fig. 4, label 402 shows the parameters used to adjust the threshold. The noise of the audio signal is based on the noise found in the audio signal, which includes noise and speech (Fig. 3, label 301 outputs noise and speech. Such indicates the threshold is adjusted based on a length of utterance spoken by the user, Fig. 1, label 101,111. Paragraph 59 discloses the context database used to determine the context to adjust the threshold, Fig. 3, label 306 is trained based on obtained noise samples separated from obtained clean speech samples. This indicates adjustment of the threshold is based on the length of the registration utterance section.); and
adjust the threshold based on a result of a comparison between the environment information and the registration environment information (Fig. 4, label 404, paragraph 62 discloses comparing the environment information, noise from label 211 and registered environment information, label 408. Label 306 of Fig. 3 adjusts the threshold based on the comparison.).
Claim 19, Huang et al discloses wherein the at least one processor is further configured to adjust the threshold (Fig. 3, label 306) based on a similarity between a 
Claim 20, Huang et al discloses
obtaining an input audio signal based on an utterance input of a user (Fig. 2, label 201 receives an input audio signal based on the user speaking, label 101. Paragraph 31 defines the audio input signal.); 
obtaining, from the input audio signal (label 111), at least one audio signal of an utterance section and at least one audio signal of a non-utterance section (Fig. 2, label microphone obtains an audio data, label AD. Paragraph 34 discloses 211 is a conversion of the audio input signal, which indicates utterance or speech and noise (paragraph 31).), 
wherein the at least one audio signal of the utterance section is generated based on a speech of the user (Fig. 1, label 101,111,201,211 shows the user’s speech is received. Paragraph 31 discloses the audio signal includes noise and speech. Fig. 4, label 406,412 indicates the speech or signal to noise ratio which indicates the audio signal include speech or signal.), and the at least one audio signal of the non-utterance section is not generated based on the speech of the user (Fig. 1, label 101,111,201,211, Fig. 4, label 406,412. Paragraph 31 discloses the audio signal includes noise and speech.);
	generating environment information indicating an environment in which the utterance input is received, based on the at least one audio signal of the non-utterance 
	obtaining a result of a comparison (Fig. 4, label 404. Paragraph 62 discloses label 404 compares the audio data 211 to noise type models or audio data model 408 for each sample of audio data 211 and for each or individual implemented noise classes.) between the generated environment information (Fig. 4, label 404 determines the generated environment information or noise or background noise information.) and registration environment information (Fig. 4, label 408 indicates the registered environment information.) indicating an environment in which a registration utterance input corresponding to a previously registered registration audio signal of the user is received (Paragraph 59 discloses “The noise types may be any internal or external factor that an SR system can classify and that may create or influence the acoustic environment in which acoustic waves are created and captured and that affect the audio data. …” Paragraph 60 discloses recorded noise type audio signals are used to train a noise type model, label 408.);
	  adjusting an authentication criterion for authenticating the user based on the result of the comparison (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304. Label 308, Fig. 4, label 404 performs the comparison as indicated above.) between the generated environment information (Fig. 4, label 404 determines the generated environment information or noise or background noise information.) and registration environment information (Fig. 4, label 408 indicates the registered environment information.); and

wherein the adjusting of the authentication criterion (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304.) further comprises:
calculate a first average energy value of the at least one audio signal of the utterance section (Fig. 4, label SNR 420);
calculate a second average energy value of the at least one audio signal of the registration utterance (Paragraph 27 discloses a context database includes context score distributions indexed by context and having associated data with each context entry. The context database may include a development corpus for different combinations of noise types and/or SNRS.); and
adjust a threshold (Fig. 3, label 306,308, Fig. 4, label 414) corresponding to the authentication criterion (Fig. 3, label 308,312 generates ratios used by the threshold generation unit, 306, to generate a threshold for authenticating the speaker at label 304.) based on a result of a comparison between the first average energy value and the second average energy value (Paragraph 69 discloses “the context is selected by first determining the noise type label and as a sub-class, then selecting the context with the closest SNR to the SNR 420 and within the noise class of noise type label 411.”). 
Huang et al discloses a registration utterance input included in a previously registered registration audio signal of the user is received (Paragraph 59 discloses “a number of training (or pretraining) or learning trials are performed to provide samples of audio 
	Boregowda et al discloses the registration audio signal comprises at least one audio signal of a registration utterance section generated based on a previous speech of the user (Paragraph 57 discloses voice training data that is stored in a storage device such as a personal speech profile data (PSPD) storage device. “The PSPD storage device is configured to store voice training data (e.g., in voice training data storage block 254), which is based on speech captured during previous speech/voice sessions (e.g., the initial training sessions and/or subsequent sessions).” Such indicates the registration audio signal, such as speech profile data of the voice training data shown in Fig. 2, label 250 comprises at least one audio signal of registration utterance section such as speech captured during previous speech/voice sessions, wherein such audio signal is based on previous speech of the user such as profile data of users.) It would be obvious to one skilled in the art before the effective filing date of the application to modify the storage of training trails with samples of audio as disclosed by Huang et al by incorporating training data of profiles of users using the electronic apparatus as disclosed by Boregowda et al so to readily detect the user’s voice so to improve authentication or verification of a speech input as the authentic person and prevent impersonators.


Allowable Subject Matter
Claims 21-22 are allowed.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LINDA WONG/Primary Examiner, Art Unit 2655