Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTIONContinued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/17/21 has been entered.
 
This office action is in response to correspondence 12/17/21 regarding application 16/692,696, in which claims 1, 11, and 20 were amended. Claims 1-20 are pending in the application and have been considered.

Response to Arguments
The arguments on pages 7-13 regarding the 35 U.S.C. 103 rejections of claims 1-3, 5-13, and 15-20 based on Jumelle, Do, Huang, and Dong have been considered but are moot in view of the new grounds for rejection.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 2, 5, 6, 11, 12, 15, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Jumelle et al. (“Speaker Clustering With Neural Networks and Audio Processing”. AIPCloud – http://aipcloud.io {maxime, taqiyeddine} @aipcloud.io, November 2017) in view of Wang et al. (“Phoneme-Specific Speech Separation”. ICASSP 2016, pages 146-150).

Consider claim 1, Jumelle discloses a method of processing voice data of a speech, the method comprising: extracting a speaker feature vector from the voice data of the speech received from a recording of a speaker (a speaker in a recording, Abstract, page 1, ii. Feature extraction extracts the Mel spectrogram, which is a feature vector, page 2); generating a speaker feature map by positioning the extracted speaker feature vector at a specific position on a multi-dimensional vector space (one second long snippets data points from the unknown input data are linearly separable, Fig 4, page 4); forming a plurality of clusters indicating features of voices of a plurality of speakers by grouping at least one speaker feature vector positioned on the speaker feature map (Fig 4, page 4, groups, page 6, Support Vector Clustering, Fig 6); and classifying the plurality of speakers according to the plurality of clusters (each color corresponds to a different speaker, Fig 4, page 4, Speaker identification, page 5). 

Wang discloses extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data (the concatenated feature vector for acoustic modeling of 4026 dimensions includes multiple features representing various frequency ranges and bands, see section 2.4 Acoustic Modeling, pages 147-148, which are extracted for recognition using a phoneme-specific trained model, and hence “for each phoneme”, section 2.3, page 147, from a corpus of utterances during the experiment, section 3 page 148).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data in order to improve results in low SNR conditions and reverberant environments, as suggested by Wang (Section 1. Introduction, page 146).


Consider claim 11, Jumelle discloses an electronic device for processing voice data of a speech, the electronic device comprising: receiving the speech from a recording of a speaker (a speaker in a recording, Abstract, page 1); at least one processor (a processor is implicit in calculating features such as a Mel spectrogram, page 2) configured to: extract a speaker feature vector from the voice data of the speech received (ii. Feature extraction extracts the Mel spectrogram, which is a feature vector, page 2), generate a speaker feature map by positioning the extracted speaker feature vector at a specific position on a multi-dimensional vector space (one second long snippets data points from the unknown 
Jumelle does not specifically mention extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data. 
Wang discloses extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data (the concatenated feature vector for acoustic modeling of 4026 dimensions includes multiple features representing various frequency ranges and bands, see section 2.4 Acoustic Modeling, pages 147-148, which are extracted for recognition using a phoneme-specific trained model, and hence “for each phoneme”, section 2.3, page 147, from a corpus of utterances during the experiment, section 3 page 148).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data for reasons similar to those for claim 1.


Consider claim 20, Jumelle discloses controlling an electronic device to receive a speech from a recording of a speaker (a speaker in a recording, Abstract, page 1); extract a speaker feature vector from the voice data of the speech (ii. Feature extraction extracts the Mel spectrogram, which is a feature 
Jumelle does not specifically mention extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data, and a computer program product comprising a non-transitory computer-readable recording medium comprising instructions. 
Wang discloses extracting a speaker feature vector from the voice data of the speech received from a speaker based on a frequency feature value of each of a plurality of frequency ranges for each phoneme of the voice data (the concatenated feature vector for acoustic modeling of 4026 dimensions includes multiple features representing various frequency ranges and bands, see section 2.4 Acoustic Modeling, pages 147-148, which are extracted for recognition using a phoneme-specific trained model, and hence “for each phoneme”, section 2.3, page 147, from a corpus of utterances during the experiment, section 3 page 148). The experiments in section 3 page 148 performed by Wang also strongly suggest, or imply the use of a computer, and therefore Wang is considered to suggest a computer program product comprising a non-transitory computer-readable recording medium comprising instructions.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by extracting a speaker feature vector from the 
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by including a computer program product comprising a non-transitory computer-readable recording medium comprising instructions which control an electronic device, in order to be able to run the experiments described by Wang (section 3 page 148). 

Consider claim 2, Jumelle discloses the extracting of the speaker feature vector comprises: training a deep neural network (DNN) model by applying the speaker feature vector extracted from the voice data to the DNN model as an input and applying a feature value with respect to an index of the speaker as an output (the CNN is trained on the TIMIT corpus dataset, and is trained as the feature extractor which operates on the Mel spectrogram, page 1 Introduction, pages 2-3, Feature extraction and Architecture of the CNN); and extracting the speaker feature vector by using a result of the training (outputs of specific layers are used by the Multi Layer Perceptrons to classify, page 1 Introduction).

Consider claim 5, Jumelle discloses the forming of the plurality of clusters comprises forming the plurality of clusters by grouping the at least one speaker feature vector positioned on the speaker feature map within a predetermined distance from each other (one second long snippets data points from the unknown input data are linearly separable, Fig 4, page 4). 

Consider claim 6, Jumelle discloses the forming of the plurality of clusters comprises: training a deep neural network (DNN) model by applying the speaker feature vector to the DNN model as an input value  and applying a similarity index among the plurality of speakers as an output value (the CNN is 
Consider claim 12, Jumelle discloses the at least one processor is further configured to train a deep neural network (DNN) model by applying the speaker feature vector extracted from the voice data to the DNN model as an input and applying a feature value with respect to an index of a speaker as an output (the CNN is trained on the TIMIT corpus dataset, and is trained as the feature extractor which operates on the Mel spectrogram, page 1 Introduction, pages 2-3, Feature extraction and Architecture of the CNN) and extract the speaker feature vector by using a result of the training (outputs of specific layers are used by the Multi Layer Perceptrons to classify, page 1 Introduction). 


Consider claim 15, Jumelle discloses the forming of the plurality of clusters comprises forming the plurality of clusters by grouping the at least one speaker feature vector positioned on the speaker feature map within a predetermined distance from each other (one second long snippets data points from the unknown input data are linearly separable, Fig 4, page 4). 

Consider claim 16, Jumelle discloses the forming of the plurality of clusters comprises: training a deep neural network (DNN) model by applying the speaker feature vector to the DNN model as an input value  and applying a similarity index among the plurality of speakers as an output value (the CNN is trained on the TIMIT corpus dataset, and is trained as the feature extractor which operates on the Mel .

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Jumelle et al. (“Speaker Clustering With Neural Networks and Audio Processing”. AIPCloud – http://aipcloud.io {maxime, taqiyeddine} @aipcloud.io, November 2017) in view of Wang et al. (“Phoneme-Specific Speech Separation”. ICASSP 2016, pages 146-150), in further view of Do et al. (2012/0271632).

Consider claim 3, Jumelle discloses receiving speech identification information (page 2, the label I denotes the speaker identifier) extracting the speaker feature vector from the voice data of the speech (ii. Feature extraction extracts the Mel spectrogram, which is a feature vector, page 2); mapping the extracted speaker feature vector to a cluster from among the plurality of clusters on the speaker feature map (Fig 4, page 4, groups, page 6, Support Vector Clustering, Fig 6); and registering the identification information to the cluster from among the plurality of clusters (clusters with each color corresponds to a different speaker, page 4, Fig 4). 
Jumelle and Wang do not specifically mention receiving, from a user, speech inputting identification information.
Do discloses receiving, from a user, speech inputting identification information (capturing an utterance from a speaker with a microphone, [0014], which is used to identify the speaker, [0030]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by receiving, from a user, speech inputting identification information in order to help determine “who” spoke, as suggested by Do ([0001]).


Consider claim 13, Jumelle discloses receiving speech identification information (page 2, the label I denotes the speaker identifier), wherein the at least one processor is further configured to: extract the speaker feature vector from the voice data of the speech (ii. Feature extraction extracts the Mel spectrogram, which is a feature vector, page 2); map the extracted speaker feature vector to a cluster from among the plurality of clusters on the speaker feature map (Fig 4, page 4, groups, page 6, Support Vector Clustering, Fig 6); and register the identification information to the cluster from among the plurality of clusters (clusters with each color corresponds to a different speaker, page 4, Fig 4). 
Jumelle does not specifically mention a speech inputter.
Do discloses a speech inputter (a microphone, [0014]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle by including a speech inputter for reasons similar to those for claim 3.

Claims 7, 8, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jumelle et al. (“Speaker Clustering With Neural Networks and Audio Processing”. AIPCloud – http://aipcloud.io {maxime, taqiyeddine} @aipcloud.io, November 2017) in view of Wang et al. (“Phoneme-Specific Speech Separation”. ICASSP 2016, pages 146-150), in further view of Huang et al. (2016/0180852).

Consider claim 7, Jumelle and Wang do not, but Huang discloses predicting a relationship among the plurality of speakers corresponding to the plurality of clusters based on distance and directionality information among the plurality of clusters on the speaker feature map (clusters detected from the 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle and Wang by predicting a relationship among the plurality of speakers corresponding to the plurality of clusters based on distance and directionality information among the plurality of clusters on the speaker feature map in order to deal better with overlapped speech, as suggested by Huang ([0006]).

Consider claim 8, Jumelle and Wang do not, but Huang discloses predicting a state of a speaker of the speech based on a relative positional relationship among a plurality of speaker feature vectors in one cluster of the plurality of clusters (clusters detected from the location information may roughly indicate the presence of the speakers, [0090]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle and Wang by predicting a state of a speaker of the speech based on a relative positional relationship among a plurality of speaker feature vectors in one cluster of the plurality of clusters for reasons similar to those for claim 7.

Consider claim 17, Jumelle and Wang do not, but Huang discloses the controller predicts a relationship among the plurality of speakers corresponding to the plurality of clusters based on distance and directionality information among the plurality of clusters on the speaker feature map (clusters detected from the location information may roughly indicate the presence of the speakers and their positions relative to one another, i.e. “a relationship”, [0090]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle and Wang by predicting a relationship among .

Claims 9, 10, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jumelle et al. (“Speaker Clustering With Neural Networks and Audio Processing”. AIPCloud – http://aipcloud.io {maxime, taqiyeddine} @aipcloud.io, November 2017) in view of Wang et al. (“Phoneme-Specific Speech Separation”. ICASSP 2016, pages 146-150), in further view of Huang et al. (2016/0180852), in further view of Dong et al. (2019/0124441).

Consider claim 9, Jumelle and Wang do not, but Huang discloses the predicting of the state of the speaker comprises: recognizing directionality by comparing a position of a first feature vector of the speech received from the user with a position of a second feature vector in the cluster on the speaker feature map, wherein the first feature vector is mapped to the cluster (clusters detected from the location information may roughly indicate the presence of the speakers, [0090]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle and Wang such that the predicting of the state of the speaker comprises: recognizing directionality by comparing a position of a first feature vector of the speech received from the user with a position of a second feature vector in the cluster on the speaker feature map, wherein the first feature vector is mapped to the cluster in order to deal better with overlapped speech, as suggested by Huang ([0006]).
Jumelle, Wang, and Huang do not specifically mention predicting a health state of the user based on the recognized directionality. 
Dong discloses predicting a health state of the user (health status information, [0019]).


Consider claim 10, Jumelle, Wang, and Huang do not, but Dong discloses outputting a message corresponding to the predicted health state of the user from among pre-stored messages (displaying health guidance instructions, [0021]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle, Wang, and Huang by outputting a message corresponding to the predicted health state of the user from among pre-stored messages for reasons similar to those for claim 9.


Consider claim 18, Jumelle and Wang do not, but Huang discloses the predicting of the state of the speaker comprises: recognizing directionality by comparing a position of a first feature vector of the speech received from the user with a position of a second feature vector in the cluster on the speaker feature map, wherein the first feature vector is mapped to the cluster (clusters detected from the location information may roughly indicate the presence of the speakers, [0090]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle and Do such that the predicting of the state of the speaker comprises: recognizing directionality by comparing a position of a first feature vector of the speech received from the user with a position of a second feature vector in the cluster on the speaker 
Jumelle, Wang, and Huang do not specifically mention predicting a health state of the user based on the recognized directionality. 
Dong discloses predicting a health state of the user (health status information, [0019]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle, Wang, and Huang by predicting a health state of the user based on the recognized directionality, for reasons similar to those for claim 9.

Consider claim 19, Jumelle, Wang, and Huang do not, but Dong discloses outputting a message corresponding to the predicted health state of the user from among pre-stored messages (displaying health guidance instructions, [0021]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Jumelle, Wang, and Huang by outputting a message corresponding to the predicted health state of the user from among pre-stored messages for reasons similar to those for claim 9.

Allowable Subject Matter
Claims 4 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

The following is a statement of the examiner’s reasons for indicating subject matter allowable over the prior art:

The prior art of record do not disclose “…generating a distinct sentence including phonemes, for which a difference in the frequency feature value between a plurality of speakers is equal to or greater than a predetermined critical value”. While Katz-Oz et al. (2018/0131692) discloses generating a sentence (which is inherently “including phonemes”) for authenticating a user’s identity ([0110]-[0111]) using a threshold ([0084]), the invention as a whole outlined by dependent claims 4 and 14, including the limitations of parent claims 1 and 11, is not considered to have been obvious to one of ordinary skill in the art before the filing date of the claimed invention, even in view of Jumelle, Wang, and Katz-Oz.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
9,792,900 Kaskari discloses generation of phoneme-experts for speech recognition
Haytham Fayek “Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s In-Between” April 21, 2016 Blog Post retrieved from “https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html” discloses computation of a Mel Spectrogram and frequency ranges of Mel scale filterbanks (see page 4)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jesse Pullias whose telephone number is 571/270-5135. The examiner can normally be reached on M-F 8:00 AM - 4:30 PM. The examiner’s fax number is 571/270-6135.

Examiner interviews are available via telephone, in-person, and video conferencing using a 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Andrew Flanders can be reached on 571/272-7516. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).


/Jesse S Pullias/
Primary Examiner, Art Unit 2655                                                          2/16/22