DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Arguments
Applicant's arguments filed 01/29/2022 have been fully considered but they are not persuasive. Regarding arguments on page 10 of the Remarks, Examiner notes that the terms “acoustic unit”, “acoustic sub-unit”, and “pronunciation unit” are not clearly defined in the Specification, and can thus be interpreted broadly. In Aronowitz para [0026], the term “text” may include words, and para [0098] in the Specification appears to show examples that the “pronunciation units” may be words. Examiner also notes that Comerford’s feature vectors correspond to the “acoustic features”, the 65 codewords correspond to the “acoustic sub-features” and the words that are spoken correspond to the “acoustic units”.
Regarding arguments on pages 10-11 of the Remarks, Applicant has not shown how Comerford does not teach the claimed limitations. Comerford teaches generating feature vectors such as using mel cepstra. The mel cepstra are the voice parameters, and the mel cepstra feature vectors teach the machine-identifiable voice feature vectors.
Regarding arguments on pages 11-12 of the Remarks, Examiner notes that it is unclear whether the claim is intended to teach name, image, AND job title, or name, image, OR job title. For compact .

Claim Objections
Claims 1 and 5 objected to because of the following informalities:  the limitation: “wherein the user information comprises a user’s name, a user image, a user’s job title” does not specify “and” or “or” regarding the three pieces of information, which makes the scope unclear.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 9-10, 13, 15-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Comerford et al. (US 6,107,935 A), hereinafter referred to as Comerford, in view of Aronowitz (US 2015/0187356 A1), and further in view of Gannu et al. (US 2012/0262533 A1), hereinafter referred to as Gannu.

Regarding claim 1, Comerford teaches:
A user identification method, comprising: 
extracting one or more acoustic features from acquired voice (col. 8 lines 13-45, where feature vectors are generated from the speech input); 
acquiring user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, or a user’s job title (col. 8 lines 13-45, where speaker recognition is performed using the feature vectors, and col. 11 lines 23-43, where the user states their name for identity determination);  
outputting the user information based upon the user information matching the one or more acoustic features (col. 8 lines 46-61, where feedback of the recognition is given to the user); 
when determining that the acquired voice is a voice of a new user (col. 8 lines 46-61, where a new user is incorrectly recognized and the new user corrects the result, and col. 12 lines 35-43, line 61 – col. 13 line 12, where a new installation of a software package begins with speaker enrollment), generating a prompt message for the new user to input new user information (col. 8 lines 46-61, where the new user marks a menu on a display, and enrolls speaker-specific information, and col. 12 line 61 – col. 13 line 12, where the user is prompted to speak to the system, the new user information being the voice of the speaker);  
storing the one or more acoustic features and corresponding user information in a preset file (col. 3 line 56 - col. 4 line 11, col. 11 line 23-28, where the information relating to the enrollment is stored in a database) by:
dividing the one or more acoustic features into a plurality of acoustic sub-features based on acoustic units (col. 8 lines 26-45, 62 - col. 9 line 15, where about 65 codewords are determined from the feature vectors); and 

receiving an operation command that broadcasts a piece of textual content with an acoustic feature matching a user who has input his/her voice (col. 11 lines 23-63, where a user attempts to access a voice dialing system, and col. 7 lines 53-55, where the system is text independent); 
acquiring acoustic sub-features corresponding to the user who has input his/her voice (col. 8 lines 13-45, where feature vectors are generated from the speech input); 
determining a voice corresponding to the piece of textual content based on the acoustic sub-features corresponding to the user who has input his/her voice (col. 8 lines 13-45, where speaker recognition is performed using the feature vectors); and 
said determining that the acquired voice is a voice of a new user is based on lacking user information corresponding to the plurality of acoustic sub-features in the preset file (col. 12 line 61 – col. 13 line 12, where it is determined that there are no previous acoustic models in the file),
wherein the extracting one or more acoustic features from acquired voice comprises:
parameterizing the acquired voice into a plurality of voice parameters (Comerford col. 8 lines 13-25, where mel cepstral feature vectors are determined from the input voice); and 
converting the parameterized voice into machine-identifiable voice feature vectors (Comerford col. 8 lines 13-25, where mel cepstral feature vectors are determined from the input voice), wherein the voice parameters comprise one or more of pitch periods, Linear Predictive Coefficients (LPC), impulse response of a sound channel, self-correlation coefficients, sound channel area functions, LPCC features, MFCC features, Perceptual Linear Predictive (PLP), or difference cepstrum (Comerford col. 8 lines 13-25, where mel cepstral feature vectors are determined from the input voice).

acquiring user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, a user’s job title;
dividing the one or more acoustic features into a plurality of acoustic sub-features based on acoustic units by segmenting textual content, wherein the acoustic unit is a pronunciation unit; and 
outputting the textual content using the voice determined;
wherein 
the acoustic units comprise a plurality of pronunciation sub-units;
the determining a voice corresponding to the textual content comprises organizing the plurality of pronunciation sub-units into the voice determined corresponding to the outputting the textual content; 
Aronowitz teaches:
dividing the one or more acoustic features into a plurality of acoustic sub-features based on acoustic units (col. 8 lines 26-45, 62 - col. 9 line 15, where about 65 codewords are determined from the feature vectors) by segmenting textual content, wherein the acoustic unit is a pronunciation unit (para [0075], where splicing is used to break up the acoustic unit into sub-units using the text); and 
outputting the textual content using the voice determined (para [0042-43], where a development set consisting of synthesized samples of speech is provided or output);
wherein 
the acoustic units comprise a plurality of pronunciation sub-units (para [0028], where the X, Y, and Z represent one or more phrases or words, where the pronunciation sub-units are interpreted as words or phrases according to para [0098] of Applicant’s Specification);
the determining a voice corresponding to the textual content comprises organizing the plurality of pronunciation sub-units into the voice determined corresponding to the outputting the textual 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Comerford by using the synthesis of Aronowitz (Aronowitz para [0048]) using the voice samples of Comerford (Comerford col. 8 lines 13-45), in order to perform speaker verification to verify the identity of a target speaker (Aronowitz para [0045]).
Gannu teaches:
acquiring user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, a user’s job title (para [0036], where voice is used to identify a speaker and their image, name, and job title are displayed);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Comerford in view of Aronowitz by using the profile retrieval of Gannu (Gannu para [0036]) in the user identification of Comerford in view of Aronowitz (Comerford col. 8 lines 13-45) by using a speaker’s voice to identify them, in order to provide a graphics overlay in meetings that provides relevant information to peers (Gannu para [0036]).

Regarding claim 5, Comerford teaches:
A user identification apparatus, applied to an electronic device comprising: 
a feature extracting circuit, configured to extract one or more acoustic features from acquired voice (col. 8 lines 13-45, where feature vectors are generated from the speech input); 
an information acquiring circuit, configured to acquire user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, or a user’s 
an information outputting circuit, configured to output the user information based upon the information acquiring circuit acquiring the user information matching the one or more acoustic features (col. 8 lines 46-61, where feedback of the recognition is given to the user);  
a determining circuit, configured to determine that the acquired voice is a voice of a new user when the user information matching the one or more acoustic features extracted is not acquired (col. 8 lines 46-61, where a new user is incorrectly recognized and the new user corrects the result, and col. 12 lines 35-43, line 61 – col. 13 line 12, where a new installation of a software package begins with speaker enrollment); 
a prompting circuit, configured to generate a prompt message for the new user to input new user information (col. 8 lines 46-61, where the new user marks a menu on a display, and enrolls speaker-specific information, and col. 12 line 61 – col. 13 line 12, where the user is prompted to speak to the system, the new user information being the voice of the speaker); and 
a storage circuit, configured to store the one or more acoustic features and corresponding user information in a preset file, when the user information input by the new user based on the prompt message generated by the prompting circuit is received (col. 3 line 56 - col. 4 line 11, col. 11 line 23-28, where the information relating to the enrollment is stored in a database);
wherein the storage circuit comprises: 
a dividing sub-circuit, configured to divide the acoustic feature into a plurality of acoustic sub-features based on acoustic units (col. 8 lines 26-45, 62 - col. 9 line 15, where about 65 codewords are determined from the feature vectors); and 
a storage sub-circuit, configured to store the plurality of sub-features and corresponding user information in a preset file, wherein the preset file includes user information of each user who has input 
wherein the apparatus further comprises:  
a receiving circuit, configured to receive an operation command that broadcasts a piece of text content with an acoustic feature matching a user who has input his/her voice (col. 11 lines 23-63, where a user attempts to access a voice dialing system, and col. 7 lines 53-55, where the system is text independent); 
an acquiring circuit, configured to acquire acoustic sub-features corresponding to the user who has input his/her voice (col. 8 lines 13-45, where feature vectors are generated from the speech input); 
a voice determining circuit, configured to determine a voice corresponding to the piece of textual content based on the acoustic sub-features corresponding to the user who has input his/her voice (col. 8 lines 13-45, where speaker recognition is performed using the feature vectors); and 
the determining circuit is configured to determine that the acquired voice is a voice of a new user is based on lacking user information corresponding to the plurality of acoustic sub-features in the preset file (col. 12 line 61 – col. 13 line 12, where it is determined that there are no previous acoustic models in the file),
wherein the feature extracting circuit is further configured to:
parameterize the acquired voice into a plurality of voice parameters (Comerford col. 8 lines 13-25, where mel cepstral feature vectors are determined from the input voice); and 
convert the parameterized voice into machine-identifiable voice feature vectors (Comerford col. 8 lines 13-25, where mel cepstral feature vectors are determined from the input voice), wherein the voice parameters comprise one or more of pitch periods, Linear Predictive Coefficients (LPC), impulse response of a sound channel, self-correlation coefficients, sound channel area functions, LPCC features, 
Comerford does not teach:
an information acquiring circuit, configured to acquire user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, a user’s job title;
a dividing sub-circuit, configured to divide the acoustic feature into a plurality of acoustic sub-features based on acoustic units by segmenting textual content, wherein the acoustic unit is a pronunciation unit; and
a voice outputting circuit, configured to output the textual content using the voice determined;
wherein 
the acoustic units comprise a plurality of pronunciation sub-units;
the voice determining circuit is configured to determine the voice corresponding to the textual content by organizing the plurality of pronunciation sub-units into the voice determined corresponding to the outputting the textual content;
Aronowitz teaches:
a dividing sub-circuit, configured to divide the acoustic feature into a plurality of acoustic sub-features based on acoustic units by segmenting textual content, wherein the acoustic unit is a pronunciation unit (para [0075], where splicing is used to break up the acoustic unit into sub-units using the text); and
a voice outputting circuit, configured to output the textual content using the voice determined (para [0042-43], where a development set consisting of synthesized samples of speech is provided or output);
wherein 

the voice determining circuit is configured to determine the voice corresponding to the textual content by organizing the plurality of pronunciation sub-units into the voice determined corresponding to the outputting the textual content (para [0048], where the utterances are synthesized, which is considered organizing the sub-units, and para [0028], where the X, Y, and Z represent one or more phrases or words, where the pronunciation sub-units are interpreted as words or phrases according to para [0098] of Applicant’s Specification);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Comerford by using the synthesis of Aronowitz (Aronowitz para [0048]) using the voice samples of Comerford (Comerford col. 8 lines 13-45), in order to perform speaker verification to verify the identity of a target speaker (Aronowitz para [0045]).
Gannu teaches:
an information acquiring circuit, configured to acquire user information matching the one or more acoustic features, wherein the user information comprises a user’s name, a user image, a user’s job title (para [0036], where voice is used to identify a speaker and their image, name, and job title are displayed);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Comerford in view of Aronowitz by using the profile retrieval of Gannu (Gannu para [0036]) in the user identification of Comerford in view of Aronowitz (Comerford col. 8 lines 13-45) by using a speaker’s voice to identify them, in order to provide a graphics overlay in meetings that provides relevant information to peers (Gannu para [0036]).

Regarding claim 9, Comerford in view of Aronowitz and Gannu teaches:
An apparatus implementing the user identification method according to claim 1, comprising: 
a processing circuit (Comerford col. 7 lines 34-52, where a processor is used); and 
memory configured to store instructions executable by the processing circuit (Comerford col. 7 lines 34-52, where memory is used), 
wherein the processing circuit is configured to: 
implement operations of the user identification method (see rejection of claim 1).  

Regarding claim 10, Comerford in view of Aronowitz and Gannu teaches:
The apparatus according to claim 9, wherein the memory comprises a non-transitory computer-readable storage medium having computer instructions stored therein for execution by the processing circuit (Comerford col. 7 lines 34-52, where memory is used).  

Regarding claim 13, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 12, wherein the processing circuit is further configured to identify the user in a voice call through the apparatus (Comerford col. 11 lines 23-63, where a user attempts to access a voice dialing system and is verified by speaker recognition).  

Regarding claim 15, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 13, wherein the voice call is through one or more communication applications (Comerford col. 11 lines 7-22, where the voice dialing system is interpreted as a communication application).  

Regarding claim 16, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 15, wherein the instructions further comprise: 
determining whether the acquired voice is a voice of a new user based upon that the user information matching the acoustic feature being not acquired (Comerford col. 8 lines 46-61, where a new user is incorrectly recognized and the new user corrects the result); and 
generating a prompt message for the new user to input user information (Comerford col. 8 lines 46-61, where the new user marks a menu on a display, and enrolls speaker-specific information).  

Regarding claim 17, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 16, wherein the instructions further comprise storing the one or more acoustic features and corresponding user information in a preset file, when user information input by a 

Regarding claim 19, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 12, wherein the processing circuit is further configured to identify the user for security applications (Comerford col. 11 lines 44-63, where the system determines if the user is authorized).  

Regarding claim 20, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 19, wherein the security applications comprise user authentication to provide proper authorization to execute user commands (Comerford col. 11 lines 44-63, where the system determines if the user is authorized to access the voice dialing system).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Comerford, in view of Aronowitz, and Gannu, and further in view of Binder et al. (US 2014/0222436 A1), hereinafter referred to as Binder.

Regarding claim 14, Comerford in view of Aronowitz and Gannu teaches:
The apparatus of claim 13,
Comerford in view of Aronowitz and Gannu does not teach:
wherein the voice call is a telephone call, and wherein the apparatus comprises mobile terminal.
Binder teaches:
wherein the voice call is a telephone call, and wherein the apparatus comprises mobile terminal (para [0047], where the device is a phone).
.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2019/0007377 A1 para [0054] teaches using name, job title, and picture being sent to an emergency contact; US 9,992,642 B1 col. 6 lines 5-27 teaches sending bio information such as profile image, name, and job title to another user.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658