Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 24 February 2021 have been fully considered but they are not persuasive.
Initially, it is noted that previously pending claim 3 merely required determining whether “a recognition function” should be activated according to environmental sensor data. Previously pending claim 1 recited a step of receiving a result of recognizing an identity, as well as a step of recognizing a voice. Since the previously claimed method merely received a result of recognizing an identity, but implemented the step of “recognizing a voice”, it was assumed that “the recognition function” of previously pending claim 3 referred to the “recognizing a voice” step, since receiving a result of recognizing an identity is not a “recognition function”.
Claim 1 as amended now requires judging an environmental state and activating an identity recognition function in response to the environmental state meeting a predetermined condition. As noted above, the previously pending claims merely required determining whether “a recognition function” should be activated, rather than specifically requiring activating an identity recognition function.
In light of this, Applicant’s characterization of Welbourne in view of amended claim 1 is reasonable because Welbourne discloses activating the voice recognition function in response to recognizing a face via the facial recognition module. However, Brunelli et al. (U.S. Patent 5,412,738) discloses a similar identity recognition method that specifically includes an activation subsystem (Fig. 2, attention module 9) that monitors environmental video data and judges whether the a certain amount .


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-8, 12, 14-19, and 23-25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Welbourne et al. (U.S. Patent No. 10,178,301, hereinafter “Welbourne”), in view of Brunelli et al. (U.S. Patent 5,412,738, hereinafter “Brunelli”).
	In regard to claim 1, Welbourne discloses a voice recognition method based on identity recognition (Fig. 2B), the method comprising:
	receiving a result of recognizing an identity (a speaker recognition module 112 receives an identity label 216-1, column 4, lines 53-58);
	determining an acoustic feature set corresponding to the recognized identity based on the result (speaker recognition module 112 utilizes the identity label 216-1 to select a trained speaker recognition model, column 4, lines 58-62 and column 2, lines 13-31); and
	recognizing a voice corresponding to the recognized identity from received sound based on the determined acoustic feature set (the speaker recognition module 112 performs speaker recognition 
	While Welbourne discloses judging an environmental state (Fig. 13, a confidence score is compared to a threshold based on environmental video data, column 17, lines 26-36), Welbourne does not disclose activating an identity recognition function in response to the environmental state meeting a predetermined condition.
	Brunelli discloses a voice recognition method based on identity recognition (verification of an identity, column 1, lines 9-46), the method comprising:
	judging an environmental state (Fig. 2, an attention module 9 monitors the environment to detect the presence of a person, column 6, lines 14-34); and
	activating an identity recognition function in response to the environmental state meeting a predetermined condition (checks are made to determine whether a captured image has stabilized and are consistent with the approximate dimensions of a face, and when this condition is met, visual recognition module 12 is activated, column 6, lines 35-42).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include the attention module of Brunelli to activate the identity recognition function disclosed by Welbourne, because this would “prevent recognition from starting unnecessarily, simply as a result of an object or a person passing or stopping momentarily in front of the system”, as taught by Brunelli (column 5, lines 3-18).

	In regard to claim 3, while Welbourne discloses the step of judging an environmental state comprises receiving sensor data from at least one environmental sensor (video data is received, column 17, lines 19-25), Welbourne does not disclose determining whether the environmental state meets the predetermined condition for activating an identity recognition function.

	receiving sensor data from at least one environmental sensor (Fig. 2, an attention module 9 monitors the environment to detect the presence of a person, column 6, lines 14-34); and
	determining whether the environmental state meets the predetermined condition (checks are made to determine whether a captured image has stabilized and are consistent with the approximate dimensions of a face, and when this condition is met, visual recognition module 12 is activated, column 6, lines 35-42).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include the attention module of Brunelli to receive sensor data from at least one environmental sensor and determine whether the environmental state met the predetermined condition, because this would “prevent recognition from starting unnecessarily, simply as a result of an object or a person passing or stopping momentarily in front of the system”, as taught by Brunelli (column 5, lines 3-18).

	In regard to claim 4, Welbourne discloses the step of receiving a result of recognizing an identity comprises:
	receiving the result from an external identification apparatus (a server 108 performs the facial recognition, column 3, lines 7-14 and column 20, lines 25-39),
	wherein the external identification apparatus recognizes the identity with at least one of the following techniques: face recognition, iris recognition, fingerprint recognition, password recognition, and login information recognition (facial recognition, column 3, lines 14-25).



	In regard to claim 6, Welbourne discloses the step of establishing a corresponding acoustic feature set for the identity comprises:
	receiving voice corresponding to the identity (capture audio associated with the user, column 6, lines 24-28);
	extracting acoustic features corresponding to the identity from the received voice (the audio data is used to update the speaker recognition database, column 6, lines 28-36); and
	establishing an acoustic feature set corresponding to the identity according to the extracted acoustic features (the audio data is associated with the identity in the speaker recognition database, column 6, lines 24-36).

	In regard to claim 7, Welbourne discloses the step of establishing a corresponding acoustic feature set for the identity comprises:
	receiving voice corresponding to the identity (capture audio associated with the user, column 6, lines 24-28);
	transmitting the received voice to a server (server 108 performs speaker recognition, column 3, lines 7-14 and column 20, lines 25-39);
	receiving an acoustic feature set corresponding to the identity from the server (the audio data is associated with the identity in the speaker recognition database, column 6, lines 24-36).



	extracting acoustic features from the received sound (speaker recognition module 112 processes the audio data using conventional speaker recognition techniques, column 3, lines 26-38 and column 4, lines 58-62);
	matching the extracted acoustic features with the acquired acoustic feature set corresponding to the recognized identity (using various machine learning techniques, the audio is compared to the speaker recognition model for the identity, column 7, lines 3-30); and
	recognizing the received sound as the voice of the corresponding to the recognized identity if the extracted acoustic features match the acquired determined acoustic feature set (the distance between the audio data and the speaker recognition model determines the recognized identity, column 7, lines 3-30 and column 4, lines 53-63).

	In regard to claim 12, Welbourne discloses a voice recognition apparatus based on identity recognition (Fig. 2B), the voice recognition apparatus comprising:
	a processor (Fig. 17A, 1704);
	a memory having instructions stored thereon (Fig. 17A, 1706), which when executed by the processor, cause the processor to:
	receive a result of recognizing an identity (a speaker recognition module 112 receives an identity label 216-1, column 4, lines 53-58);
	determine an acoustic feature set corresponding to the recognized identity based on the result (speaker recognition module 112 utilizes the identity label 216-1 to select a trained speaker recognition model, column 4, lines 58-62 and column 2, lines 13-31); and

	While Welbourne discloses judging an environmental state (Fig. 13, a confidence score is compared to a threshold based on environmental video data, column 17, lines 26-36), Welbourne does not disclose activating an identity recognition function in response to the environmental state meeting a predetermined condition.
	Brunelli discloses a voice recognition apparatus based on identity recognition (verification of an identity, column 1, lines 9-46), the voice recognition apparatus comprising a processor to:
	judge an environmental state (Fig. 2, an attention module 9 monitors the environment to detect the presence of a person, column 6, lines 14-34); and
	activate an identity recognition function in response to the environmental state meeting a predetermined condition (checks are made to determine whether a captured image has stabilized and are consistent with the approximate dimensions of a face, and when this condition is met, visual recognition module 12 is activated, column 6, lines 35-42).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include the attention module of Brunelli to activate the identity recognition function disclosed by Welbourne, because this would “prevent recognition from starting unnecessarily, simply as a result of an object or a person passing or stopping momentarily in front of the system”, as taught by Brunelli (column 5, lines 3-18).

	In regard to claim 14, while Welbourne discloses the step of judging an environmental state comprises receiving sensor data from at least one environmental sensor (video data is received, column 
	Brunelli discloses causing the processor to:
	receive sensor data from at least one environmental sensor (Fig. 2, an attention module 9 monitors the environment to detect the presence of a person, column 6, lines 14-34); and
	determine whether the environmental state meets the predetermined condition (checks are made to determine whether a captured image has stabilized and are consistent with the approximate dimensions of a face, and when this condition is met, visual recognition module 12 is activated, column 6, lines 35-42).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to include the attention module of Brunelli to receive sensor data from at least one environmental sensor and determine whether the environmental state met the predetermined condition, because this would “prevent recognition from starting unnecessarily, simply as a result of an object or a person passing or stopping momentarily in front of the system”, as taught by Brunelli (column 5, lines 3-18).

	In regard to claim 15, Welbourne discloses the instructions, when executed by the processor, cause the processor to:
	receive the result from an external identification apparatus (a server 108 performs the facial recognition, column 3, lines 7-14 and column 20, lines 25-39),
	wherein the external identification apparatus recognizes the identity with at least one of the following techniques: face recognition, iris recognition, fingerprint recognition, password recognition, and login information recognition (facial recognition, column 3, lines 14-25).


	create an identity and establishing a corresponding acoustic feature set for the identity (when a new identify is determined, captured audio data is used as training examples for the speaker recognition database, column 6, lines 24-36).

	In regard to claim 17, Welbourne discloses the instructions, when executed by the processor, cause the processor to:
	receive voice corresponding to the identity (capture audio associated with the user, column 6, lines 24-28);
	extract acoustic features corresponding to the identity from the received voice (the audio data is used to update the speaker recognition database, column 6, lines 28-36); and
	establish an acoustic feature set corresponding to the identity according to the extracted acoustic features (the audio data is associated with the identity in the speaker recognition database, column 6, lines 24-36).

	In regard to claim 18, Welbourne discloses the instructions, when executed by the processor, cause the processor to:
	receive voice corresponding to the identity (capture audio associated with the user, column 6, lines 24-28);
	transmit the received voice to a server (server 108 performs speaker recognition, column 3, lines 7-14 and column 20, lines 25-39);
	receive an acoustic feature set corresponding to the identity from the server (the audio data is associated with the identity in the speaker recognition database, column 6, lines 24-36).

	In regard to claim 19, Welbourne discloses the instructions, when executed by the processor, cause the processor to:
	extract acoustic features from the received sound (speaker recognition module 112 processes the audio data using conventional speaker recognition techniques, column 3, lines 26-38 and column 4, lines 58-62);
	match the extracted acoustic features with the acquired acoustic feature set corresponding to the recognized identity (using various machine learning techniques, the audio is compared to the speaker recognition model for the identity, column 7, lines 3-30); and
	recognize the received sound as the voice of the corresponding to the recognized identity if the extracted acoustic features match the acquired determined acoustic feature set (the distance between the audio data and the speaker recognition model determines the recognized identity, column 7, lines 3-30 and column 4, lines 53-63).

	In regard to claim 23, Welbourne discloses storing data required for processing voice corresponding to the created identity in a database in association with the created identity (a speaker recognition database is updated for the created identity, column 6, lines 24-36).

	In regard to claim 24, Welbourne discloses at least one of:
	a sound input apparatus configured to receive sound from outside and convert the sound into an electrical signal (microphones, column 3, lines 26-38);
	an identification apparatus configured to identify the identity (speaker recognition module and/or facial recognition module, column 3, lines 14-25); and


	In regard to claim 25, while Welbourne discloses determining whether the sensor data reaches a predetermined threshold (Fig. 13, a confidence score is compared to a threshold based on environmental video data, column 17, lines 26-36), Welbourne does not disclose determining that the environmental state meets the predetermined condition for activating the identity recognition function in response to this.
	Brunelli discloses determining whether the environmental state meets the predetermined condition according to the sensor data comprises: 
	determining whether the sensor data reaches a predetermined threshold (determining whether a certain amount of change is detected above a predetermined threshold, column 6, lines 14-34); and 
	determining that the environmental state meets the predetermined condition in response to determining that the sensor data reaches the predetermined threshold (when the amount of change is above the predetermined threshold, visual recognition module 12 is activated, column 6, lines 35-42).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to determine whether the sensor data reached a predetermined threshold and determine that the environmental state met the predetermined condition in response to this, because this would “prevent recognition from starting unnecessarily, simply as a result of an object or a person passing or stopping momentarily in front of the system”, as taught by Brunelli (column 5, lines 3-18).



Claim(s) 9-11 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Welbourne, in view of Brunelli, and further in view of Strope et al. (U.S. Patent No. 8,185,392, hereinafter “Strope”).
In regard to claims 9 and 20, Welbourne discloses calculating a score of the extracted acoustic features based on the determined acoustic feature set corresponding to the recognized identity (a “score” indicating how closely the input audio matches a trained model is determined, column 7, lines 3-30); and
	determining that the extracted features match the acoustic feature set corresponding to the recognized identity when the calculated probability is greater than a first threshold, wherein the first threshold is a probability threshold indicating whether the recognized identity is the identity corresponding to the extracted acoustic features (when the confidence score associated with the identity is greater than a threshold, the identity is output as a label for the input audio, column 15, lines 32-41).
	Welbourne and Brunelli do not expressly disclose the score is a maximum likelihood probability.
	Strope discloses a voice recognition method using extracted acoustic features that comprises:
	calculating a maximum likelihood probability of the extracted acoustic features (a likelihood that input voice features match an acoustic model is calculated, column 3, lines 26-36).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to utilize a maximum likelihood probability as the matching score because a maximum likelihood probability, by mathematical definition, determines a probability distribution such that under the assumed statistical model, the observed data is most probable. In the context of voice recognition, the maximum likelihood probability reflects the likelihood that a model correctly recognized an input voice, as taught by Strope (column 3, lines 26-36).

	In regard to claim 10, Welbourne disclose updating the acoustic feature set corresponding to the recognized identity with the extracted acoustic features (the speaker recognition database is updated based on the audio data, column 4, lines 53-62). However, Welbourne and Brunelli do not 
	Strope discloses updating an acoustic feature set when the calculated probability is greater than the first threshold and less than a second threshold (audio data that has a likelihood above a first threshold and below a second threshold is selected to update an acoustic model, column 3, lines 37-59).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to update the acoustic feature set when the calculated probability is greater than the first threshold and less than a second threshold, because the audio data with a likelihood above a first threshold is likely to be correct, while audio data with a likelihood above the second threshold would not need further refinement, as taught by Strope (column 3, lines 37-59).

	In regard to claim 11, Welbourne and Brunelli do not disclose recognizing an operation to be performed from the voice corresponding to the recognized identity.
	Strope discloses an operation to be performed from the voice corresponding to the recognized identity (voice queries are recognized and cause a search operation to be performed, column 1, lines 33-47).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to recognize an operation to be performed from the voice corresponding to the recognized identity, because it would allow the acoustic models to be improved by estimating the accuracy of the transcription based on the user’s actions following the operation, as taught by Strope (column 1, lines 20-32).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Colmenarez et al., Brady et al., and Ganong et al. disclose additional systems that activate an identity recognition function in response to environmental conditions.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616.  The examiner can normally be reached on W: 12-4, Th: 9-3, F: 9-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






BLA 3/26/21
/BRIAN L ALBERTALLI/Primary Examiner, Art Unit 2656