DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Arguments
Applicant's arguments filed 05/06/2022 have been fully considered but they are not persuasive. Regarding arguments on page 4 of the Remarks, Examiner notes that Hasegawa appears to teach the limitations of the claims, even if not performed in the same way as the present disclosure. Applicant argues that “even without going through a separate speaker registration process for the speaker recognition, the speaker recognition is possible only with the user voice inputs accumulated according to the voice recognition function”. However, the claims do not teach that a speaker registration process is prohibited. Examiner further notes that “providing” a model and “generating” a model are interpreted differently, as amended.
Regarding arguments on pages 4-5 of the Remarks, Examiner notes that para [0037] states that “the speaker identification device 100 generates a first identification model as a result of the learning processing.” The first identification model is then an updated model after the learned model is updated. Further, Examiner notes that Kajarekar is relied upon to teach the updating the models based on threshold level of similarity.

	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-8, and 13-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hasegawa (US 2021/0398540 A1), in view of Kajarekar et al. (US 2013/0144414 A1), hereinafter referred to as Kajarekar.

Regarding claim 1, Hasegawa teaches:
An electronic apparatus, comprising: 
a processor (Fig. 3, element 301, para [0062], where a CPU is used)configured to
perform a voice recognition function corresponding to each of a plurality of first user voice inputs to a microphone (para [0034], where a learned model is generated from learning data, associating voice inputs with speaker labels, and para [0067], where a microphone is used), 
identify utterance characteristics of each of the plurality of first user voice inputs (para [0042], where voice characteristics are acquired and used for classification),
obtain a plurality of voice groups in which the plurality of first user voice inputs are classified according to the identified utterance characteristics (para [0042], where one or more groups are classified into based on voice characteristics),
select a voice group corresponding to a predetermined user among the classified plurality of voice groups (para [0031], [0042], where speech sections are classified into groups on the basis of voice characteristics, and assigned an ID), the selected voice group including most data of the first user voice inputs among the plurality of voice groups (para [0116], where the speaker who has spoken most frequently is identified as the speaker), 
provide a speaker model corresponding to the predetermined user based on the utterance characteristics of the selected voice group (para [0034], [0042], where the identification model or learned model is learned from voice information, and where speech is classified based on the voice characteristics), the provided speaker model corresponding to an initial speaker model for the predetermined user (para [0035-37], where first learning data is used to train the model), 
perform speaker recognition of the user and the voice recognition function for a second user voice input to the microphone based on the provided speaker model (para [0039], where speaker identification is performed on conversation voice information),
update the provided speaker model corresponding to the predetermined user based on the second user voice input (para [0035-37], where the learned model is updated using learning data from the known and unknown speakers), and
perform the speaker recognition of the user and the voice recognition function for a third user voice input to the microphone based the updated speaker model (para [0038-39], where speaker recognition is performed using the updated model).  
Hasegawa does not teach:
update the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold, 
Kajarekar teaches:
update the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold (para [0030], [0049], where models with similarity above a threshold are merged, which is interpreted as correcting the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Hasegawa by using the model merging of Kajarekar (Kajarekar para [0049]) for the models of Hasegawa (Hasegawa para [0042]) in order to integrate two groups of speaker models together (Kajarekar para [0030]).

Regarding claim 6, Hasegawa in view of Kajarekar teaches:
The electronic apparatus of claim 1, wherein the processor is further configured to: 
identify whether the speaker model corresponding to the predetermined user is provided (Hasegawa para [0036-37], where learning data corresponding to other speakers is categorized as second learning data); and 
provide the speaker model corresponding to the predetermined user when the speaker model corresponding to the predetermined user is not provided (Hasegawa para [0037], where the identification model including the dummy speakers is generated).  

Regarding claim 7, Hasegawa teaches:
The electronic apparatus of claim 1, wherein the processor is configured to:
provide a plurality of speaker models for the user (Kajarekar para [0049], where multiple speaker models are generated, where similar models correspond to the same user); 
identify similarity of utterance characteristics between the plurality of speaker models (Kajarekar para [0049], where speaker models are compared for similarity); and 
merge two or more speaker models having the similarity equal to or greater than a threshold (Kajarekar para [0049], where similar speaker models are merged).  

Regarding claim 8, Hasegawa teaches:
A control method of an electronic apparatus, comprising: 
performing a voice recognition function corresponding to each of a plurality of first user voice inputs to a microphone (para [0034], where a learned model is generated from learning data, associating voice inputs with speaker labels, and para [0067], where a microphone is used); 
identifying utterance characteristics of each of the plurality of first user voice inputs (para [0042], where voice characteristics are acquired and used for classification), and obtaining a plurality of voice groups in which the plurality of first user voice inputs are classified according to the identified utterance characteristics (para [0042], where one or more groups are classified into based on voice characteristics),
selecting a voice group corresponding to a predetermined user among the classified plurality of voice groups (para [0031], [0042], where speech sections are classified into groups on the basis of voice characteristics, and assigned an ID), the selected voice group including most data of the first user voice inputs among the plurality of voice groups (para [0116], where the speaker who has spoken most frequently is identified as the speaker); and 
providing a speaker model corresponding to the predetermined user based on the utterance characteristics of the selected voice group (para [0034], [0042], where the identification model or learned model is learned from voice information, and where speech is classified based on the voice characteristics), the provided speaker model corresponding to an initial speaker model for the predetermined user (para [0035-37], where first learning data is used to train the model), 
performing speaker recognition of the user and the voice recognition function for a second user voice input to the microphone based on the provided speaker model (para [0039], where speaker identification is performed on conversation voice information).  
updating the provided speaker model corresponding to the predetermined user based on the second user voice input (para [0035-37], where the learned model is updated using learning data from the known and unknown speakers), and
performing the speaker recognition of the user and the voice recognition function for a third user voice input to the microphone based the updated speaker model (para [0038-39], where speaker recognition is performed using the updated model).  
Hasegawa does not teach:
updating the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold, 
Kajarekar teaches:
updating the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold (para [0030], [0049], where models with similarity above a threshold are merged, which is interpreted as correcting the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Hasegawa by using the model merging of Kajarekar (Kajarekar para [0049]) for the models of Hasegawa (Hasegawa para [0042]) in order to integrate two groups of speaker models together (Kajarekar para [0030]).

Regarding claim 13, Hasegawa in view of Kajarekar teaches:
The control method of claim 8, wherein the providing of the speaker model includes: 
identifying whether the speaker model corresponding to the predetermined user is provided (Hasegawa para [0036-37], where learning data corresponding to other speakers is categorized as second learning data); and 
providing the speaker model corresponding to the predetermined user when the speaker model corresponding to the predetermined user is not provided (Hasegawa para [0037], where the identification model including the dummy speakers is generated).  

Regarding claim 14, Hasegawa teaches:
The control method of claim 8, wherein the updating of the speaker model includes:
providing a plurality of speaker models for the user (Kajarekar para [0049], where multiple speaker models are generated, where similar models correspond to the same user); 
identifying similarity of utterance characteristics between the plurality of speaker models (Kajarekar para [0049], where speaker models are compared for similarity); and 
merging two or more speaker models having the similarity equal to or greater than a threshold (Kajarekar para [0049], where similar speaker models are merged).  

Regarding claim 15, Hasegawa teaches:
A non-transitory recording medium stored with a computer program (Fig. 3 element 305, para [0062], where a recording medium is used) including a code performing a control method of an electronic apparatus as a computer-readable code, wherein the control method of the electronic apparatus includes: 
performing a voice recognition function corresponding to each of a plurality of first user voice inputs to a microphone (para [0034], where a learned model is generated from learning data, and para [0067], where a microphone is used); 
identifying utterance characteristics of each of the plurality of first user voice inputs (para [0042], where voice characteristics are acquired and used for classification), and obtain a plurality of voice groups in which the plurality of first user voice inputs are classified according to the identified utterance characteristics (para [0042], where one or more groups are classified into based on voice characteristics),
selecting a voice group corresponding to a predetermined user among the plurality of voice groups (para [0031], [0042], where speech sections are classified into groups on the basis of voice characteristics, and assigned an ID), the selected voice group including most data of the first user voice inputs among the plurality of voice groups (para [0116], where the speaker who has spoken most frequently is identified as the speaker);  
providing a speaker model corresponding to the predetermined user based on the utterance characteristics of the selected voice group (para [0034], [0042], where the identification model or learned model is learned from voice information, and where speech is classified based on the voice characteristics), the provided speaker model corresponding to an initial speaker model for the predetermined user (para [0035-37], where first learning data is used to train the model), 
performing speaker recognition of the user and the voice recognition function for a second user voice input to the microphone based on the provided speaker model (para [0039], where speaker identification is performed on conversation voice information);
updating the provided speaker model corresponding to the predetermined user based on the second user voice input (para [0035-37], where the learned model is updated using learning data from the known and unknown speakers), and
performing the speaker recognition of the user and the voice recognition function for a third user voice input to the microphone based the updated speaker model (para [0038-39], where speaker recognition is performed using the updated model).  
Hasegawa does not teach:
updating the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold, 
Kajarekar teaches:
updating the provided speaker model corresponding to the predetermined user based on the second user voice input whose similarity with the provided speaker model is equal to or greater than a threshold (para [0030], [0049], where models with similarity above a threshold are merged, which is interpreted as correcting the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Hasegawa by using the model merging of Kajarekar (Kajarekar para [0049]) for the models of Hasegawa (Hasegawa para [0042]) in order to integrate two groups of speaker models together (Kajarekar para [0030]).

Claims 4 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hasegawa, in view of Kajarekar, and further in view of Fleizach et al. (US 9,495,129 B2), hereinafter referred to as Fleizach.

Regarding claim 4, Hasegawa in view of Kajarekar teaches:
The electronic apparatus of claim 1
Hasegawa in view of Kajarekar does not teach:
wherein the utterance characteristics include at least one of tone, strength, and speed of the plurality of input first user voice inputs.
Fleizach teaches:
wherein the utterance characteristics include at least one of tone, strength, and speed of the plurality of input first user voice inputs (col. 17 line 60 - col. 18 line 16, where voice characteristics may include pitch, speed, and volume).  
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (step, element, etc.) with other components; the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Regarding claim 11, Hasegawa in view of Kajarekar teaches:
The control method of claim 8
Hasegawa in view of Kajarekar does not teach:
wherein the utterance characteristics include at least one of tone, strength, and speed of the plurality of input first user voice inputs.
Fleizach teaches:
wherein the utterance characteristics include at least one of tone, strength, and speed of the plurality of input first user voice inputs (col. 17 line 60 - col. 18 line 16, where voice characteristics may include pitch, speed, and volume).  
The prior art contained a device (method, product, etc.) which differed from the claimed device by the substitution of some components (step, element, etc.) with other components; the substituted components and their functions were known in the art; one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2017/0069327 A1 para [0099] teaches a speaker model being updated with the verification utterance when the verification utterance is above a threshold similarity score.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658