DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a sound receiver, configured to collect” in claim 8.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Objections
Claim 11, and therefore claim 12 which depends therefrom, are objected to because of the following informalities:  claim 12 recites “calculating a numbers of the clusters” but should recite “calculate a number of the clusters.” Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 6-8, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al., "SCAN: Learning Speaker Identity from Noisy Sensor Data," 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2017, pp. 67-78 (herein “Lu”) in view of Jain et al., (US 2021/0390959 A1, herein “Jain”).
Regarding claims 1 and 8, Lu teaches [a voice recognition method – claim 1/a electronic device – claim 8], comprising (Lu page 68, left column, a technique for identifying a speaker from observations of acoustic data/human voiceprints, where page 72 teaches the audio collection application runs on an Android based smartphone): 
[a sound receiver configured to – claim 8 (Lu page 72, Android smartphone, specifically a Motorola Nexus 6 which, as being a smartphone, would have an input microphone)] collect[ing] a plurality of voice signals (Lu pages 72-73, fig. 8, speaker diarization data processing task including receiving raw audio signals of participants in a meeting, where the audio is segmented into short clips respectively of utterances of single speakers (voice signals)); 
[a processor, electrically connected to the sound receiver and configured for: - claim 8 (Lu page 72, Android smartphone, specifically a Motorola Nexus 6, which would have a processor connected to its microphone)] extracting voiceprint features of each of the voice signals (Lu pages 72-73, fig. 8, for each short clip, i-vectors are extracted from MFCC features, the MFCC extraction occurring before the i-vector development); 
performing a data process on the voiceprint features, to convert the voiceprint features into a N-dimensional matrix, and N is an integer greater than or equal to 2 (Lu pages 72-73, fig. 8, i-vectors are developed for each short clip, the raw i-vectors having about 500 dimensions (N = 500) dimensional matrix); 
performing a feature normalization process on the N-dimensional matrix to obtain a plurality of voiceprint data (Lu pages 72-73, fig. 8, i-vectors are processed with PCA-based dimensionality reduction technique to reduce the i-vectors to only the most variable 200 dimensions); 
classifying the voiceprint data to generate a clustering result (Lu pages 72-73, fig. 8, the audio clips are grouped (classifying) into local clusters based on their i-vector features, also disclosed in sections 3 and 4 as intra-context clustering, where page 69-71 discuss the clustering results from the intra-context clustering).
While Lu teaches on page 72, upper left column that the end result of the processing disclosed includes making an entry into a database mapping a user’s identity to the global cluster from the input voice clips that are processed as disclosed above, Lu does not disclose “finding out a centroid of each cluster,” and thus, does not explicitly teach “finding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to each of the centroid.”
Jain teaches finding out a centroid of each cluster according to the clustering result, and registering the voiceprint data adjacent to each of the centroid (Jain paras. 69, and 73-74, an enrolled voiceprint is associated with a speaker identifier associated with a predetermined centroid and a threshold, which is updated and stored in an enrolled user database).
Therefore, taking the teachings of Lu and Jain together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the centroid determination and storage with voiceprint information for an enrolled user as disclosed in Jain at least because doing so would prevent false rejections of a speaker recognition algorithm (see Jain para. 12).
Regarding claims 6 and 13, Lu does not explicitly teach the limitations of claims 6 or 13. Jain teaches [wherein the step of classifying the voiceprint data further comprises: - claim 6/wherein when the processor is classifying the voiceprint data, the processor further – claim 13] dynamically adjust[ing/s] a classifying threshold value according to the voiceprint features to classify the voiceprint data to generate the clustering result (Jain paras. 141 and 163, thresholds for the voiceprints are adjusted for voice variations, where the thresholds are used with adapted centroids (clustering result) and for speaker recognition (classifying the voiceprint data)).
Therefore, taking the teachings of Lu and Jain together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the centroid determination and storage with voiceprint information for an enrolled user as disclosed in Jain at least because doing so would prevent false rejections of a speaker recognition algorithm (see Jain para. 12).
Regarding claims 7 and 14, Lu does not explicitly teach the limitations of claims 7 and 14. Jain teaches [wherein the step of registering the voiceprint data adjacent to the centroid further comprises: - claim 7/wherein the processor further – claim 14] record[ing/s] the voiceprint data adjacent to each of the centroid and an identification number of the voiceprint data [to complete the register – claim 14 only] (Jain paras. 129, and 158-160, fig.11B, speaker enrollment database stores centroid, threshold with the speaker embedding (voiceprint data) and speaker ID (identification number)).
Therefore, taking the teachings of Lu and Jain together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the centroid determination and storage with voiceprint information for an enrolled user as disclosed in Jain at least because doing so would prevent false rejections of a speaker recognition algorithm (see Jain para. 12).
Claims 2 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Jain, as set forth above regarding claims 1 and 8, from which claims 2 and 9 respectively depend, further in view of Qiao, (CN 109637547 A, with reference to the provided English Machine Translation (herein “Qiao”)).
Regarding claim 2, while Lu teaches after the step of classifying the voiceprint data to generate the clustering result, further comprising (Lu page 72, upper left column, after the processing including the clustering, making an entry into a database mapping a user’s identity to the global cluster (clustering result) from the input voice clips), Lu does not teach the remainder of the limitations of claim 2.
Qiao teaches performing a gender recognition process on the voiceprint data to obtain a gender data of each of the voiceprint data, and updating the clustering result according to the gender data (Qiao Abstract, pages 5-6 and Claims 1 and 3, before clustering (thus affecting the clustering result in an update) determine gender information of the audio data using a pre-trained gender identification model (process), and then dividing the audio data with the same sex information collected, then performing the clustering).
Therefore, taking the teachings of Lu and Qiao together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the gender determination as disclosed in Qiao at least because doing so would improve the accuracy of the extracted vocal print characteristic which improves the cluster accuracy (see Qiao page 6).
Regarding claim 9, while Lu teaches after generating the clustering result, the processor further configures to (Lu page 72, upper left column, after the processing including the clustering, making an entry into a database mapping a user’s identity to the global cluster (clustering result) from the input voice clips), Lu does not teach the remainder of claim 9.
Qiao teaches perform a gender recognition process on the voiceprint data to obtain a gender data of each of the voiceprint data, and to update the clustering result according to the gender data (Qiao Abstract, pages 5-6 and Claims 1 and 3, before clustering (thus affecting the clustering result in an update) determine gender information of the audio data using a pre-trained gender identification model (process), and then dividing the audio data with the same sex information collected, then performing the clustering).
Therefore, taking the teachings of Lu and Qiao together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the gender determination as disclosed in Qiao at least because doing so would improve the accuracy of the extracted vocal print characteristic which improves the cluster accuracy (see Qiao page 6).
Claims 3 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Jain, as set forth above regarding claims 1 and 8 from which claims 3 and 10 respectively depend, further in view of Ma et al., (US 11,189,263 B2, herein “Ma”).
Regarding claim 3, while Lu teaches wherein the step of performing the data process on the voiceprint feature (Lu pages 72-73, fig. 8, i-vectors are developed for each short clip, the raw i-vectors having about 500 dimensions (N = 500) dimensional matrix), Lu does not teach further comprises: using a t-distributed stochastic neighbor embedding (t-SNE) method to obtain the N-dimensional matrix.
Ma teaches further comprises: using a t-distributed stochastic neighbor embedding (t-SNE) method to obtain the N-dimensional matrix (Ma col. 7, ll. 22-42, dimensionality reduction on the i-vector is performed using t-SNE).
Therefore, taking the teachings of Lu and Ma together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the t-SNE processing as disclosed in Ma at least because doing so would allow for reducing dimensionality of a voice feature vector thus reducing required calculations in the clustering while also not needing to train the dimension reducing model in advance as t-SNE is an unsupervised process (see Ma col. 7, ll. 29-38).
Regarding claim 10, Lu does not explicitly teach the limitations of claim 10. Ma teaches the processor further configures to use a t-distributed stochastic neighbor embedding (t-SNE) method to perform a dimensionality reduction process to obtain the N-dimensional matrix (Ma col. 7, ll. 22-42, col. 22, ll. 53-60, a processor performing disclosed voice data processing including dimensionality reduction on the i-vector is performed using t-SNE).
Therefore, taking the teachings of Lu and Ma together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the t-SNE processing as disclosed in Ma at least because doing so would allow for reducing dimensionality of a voice feature vector thus reducing required calculations in the clustering while also not needing to train the dimension reducing model in advance as t-SNE is an unsupervised process (see Ma col. 7, ll. 29-38).
Claims 4 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Jain, as set forth above regarding claims 1 and 8 from which claims 4 and 11 respectively depend, further in view of Lerato L, Niesler T (2015) Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering. PLoS ONE 10(10): e0141756. doi:10.1371/journal.pone.0141756 (herein “Lerato”).
Regarding claims 4 and 11, Lu teaches the voiceprint data as disclosed above, and wherein the processor is classifying the voiceprint data, the processor further configured to perform (Lu page 72, Android smartphone, specifically a Motorola Nexus 6, which would have a processor), but does not explicitly teach the remainder of the limitations of claims 4 and 11. Lerato teaches [wherein the step of classifying the data further comprises: - claim 4] calculating a number of the clusters and adjacent slopes by an elbow method according to the data (Lerato pages 10-11, in an “L” method of clustering acoustic segments, the shape of a graph plotting cluster similarity versus number of clusters features a “knee” (elbow), which is used to determine an optimum number of clusters); 
generating the clustering result of classification by a hierarchical clustering algorithm according to the number of the clusters when the slope changes suddenly; and generating the clustering result of classification by the hierarchical clustering algorithm when the slope does not change suddenly (Lerato pages 10-11, the “L” method (corresponding to claimed elbow method) considers a best fit line Lc, a line where the “slope changes suddenly” and a line Rc, where the “slope does not change suddenly” and the clusters are separated by these two lines/regions of the graph).
Therefore, taking the teachings of Lu and Lerato together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the L method processing as disclosed in Lerato at least because doing so would allow for optimal clustering of data without needing a ground truth (thus unsupervised) (see Lerato page 10).
Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Lu in view of Jain in view of Lerato, as set forth above regarding claims 4 and 10 from which claims 5 and 12 respectively depend, further in view of Ye et al., (US 2018/0144742 A1 herein “Ye”).
Regarding claims 5 and 12, Lu does not explicitly teach the limitations of claims 5 and 12. Ye teaches wherein the hierarchical clustering algorithm is a balanced iterative reducing and clustering using hierarchies (BIRCH) method (Ye para. 65, clustering of voice data using the BIRCH hierarchical-based clustering algorithm).
Therefore, taking the teachings of Lu and Ye together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speaker identification technique and smartphone running the software to implement same of Lu to include the BIRCH clustering method disclosed in Ye at least because doing so would provide a way to remove outlier data (data greater than a preset distance from the center of a cluster) and thus reduce data computation requirements (see Ye paras 65-66).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Khoury et al., US 2021/0326421 A1, directed towards an unsupervised machine learning method of enrolling users based on their voice prints to identify various characteristics of the users.
Cai et al., US 2020/0294509 A1, directed towards clustering voice features and obtaining a voice print model therefrom.
Krupka et al., US 2019/0341055 A1, directed towards voice identification enrollment using clustering of voiceprints.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656