DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 5/18/2022 have been fully considered but they are not persuasive.  Applicant essentially argues that Dimitriadis fails to explicitly disclose processing of “a wake-up voice” to extract “voiceprint features” for clustering process (REMARKS, pages 11-15).  At the outset, examiner recognized that Dimitriadis’ system processes all kinds of audio signals available to it, regardless of audio content or intended use of the audio content, such as to “wake up the electronic device”.  As discussed in previous communications, Dimitriadis’s process involves with segmenting the incoming audio signal, which may contain speech or speech command or wake words of multiple speakers and/or background signals, into frames.  The frames are then processed to extract features or “voiceprint features” for use to classify the frames as belong to specific speakers or audio classes based on similarities between extracted features (see figures 4-5 and/or paragraphs 39-50; This is a well-understood technique in the speech processing field).  Although Dimiatriads does not teach processing of “a wake-up voice”, Dimiatriads’ processing method is INDEPENDENT of audio content or the intended use of the audio content, and is capable of processing “a wake-up voice” signal.  Examiner recognizes that the claimed “voiceprint features” are merely audio features extracted from the audio frames for the purpose of classifying the audio frames as belong to specific speakers or audio classes based on similarities between extracted features of the frames.  Although Dimiatriads’s application is not related to processing of “a wake-up voice” signal, Prasad was relied upon for teaching of processing “a wake-up voice” signal.  For the above reasons, examiner maintains all previous grounds of rejection.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 19-20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitriadis et al. (USPG 2018/0166067, hereinafter referred to as Dimitriadis) in view of Prasad et al. (USPN 9697828, hereinafter referred to as Prasad).

Regarding claims 1 and 19-20, Dimitriadis discloses a method, device, and nontransitory computer readable medium for user registration by means of a wake-up voice, which is applied to an electronic device, comprising: a housing, a processor, a memory, a circuit board and a power circuit; wherein, the circuit board is arranged inside space surrounded by the housing; the processor and the memory are arranged on the circuit board; the power circuit supplies power to each of circuits or components of the electronic device; the memory stores executable program codes; the processor executes a program corresponding to the executable program codes by reading the executable program codes stored in the memory (figures 12-13 and/or paragraphs 73-86), so as to cause the processor to perform operations of:
after obtaining a voice of a user each time (the training process in figure 5 is an on-going process, meaning, the system continue to train every time it receives data; absence of specifics in the claim, “a wake-up voice” is interpreted as any audio data), extracting and storing a first voiceprint feature corresponding to the wake-up voice (figure 5, audio data 400 represented by frames f1-fN; acoustic feature AF1-AFN are extracted from each audio frame); 
clustering the stored first voiceprint features to divide the stored first voiceprint features into at least one category, wherein, each of the at least one category comprises at least one first voiceprint feature, which belongs to the same user (figure 5, clustering; also see paragraphs 44-45, clustering based on how similar the extracted feature in a frame is to that of in other frames; clustering based on “means and variance” of the frames); 
assigning one category identifier to each of the at least one category (figure 5 and/or paragraphs 45-46, cluster identifier or CI), wherein one user corresponds to one category identifier (paragraph 46, “if there are four clusters 510 and the sound sources include a first speaker, a second speaker, silence, and music, then each cluster identifier corresponds to a different one of the first speaker, the second speaker, silence, and music”); and 
storing each category identifier in correspondence to at least one first voiceprint feature corresponding to this category identifier to complete invisible user registration (figure 5, storing by training RNN based on the cluster identifier; also see paragraph 50);
wherein, clustering the stored first voiceprint features to divide the stored first voiceprint features into at least one category, comprises: 
calculating a similarity between every two of the stored first voiceprint features by a clustering algorithm based on a similarity weight of each of to-be matched attributes of the stored first voiceprint features (paragraphs 43-44, clustering based on distance or similarity measure), wherein the attributes include at least one of a vibration frequency, a vibration period and amplitude of a sound wave spectrum (paragraph 42, “audio feature 504 of a frame 402 is a perceptual linear prediction (PLP) feature, or parameter, and its time derivative.  The PLP feature of a frame 402 estimates the critical band spectral resolution, the equal loudness curve, and the intensity-loudness power law reflected within the audio of the frame 402.  The PLP features of the frames 402 can be extracted via linear discriminant analysis (LDA) of the frames 402”); and 
dividing all of the first voiceprint features into at least one category based on the calculated similarities  (paragraph 44, clustering based on distance or similarity measure). 
Dimitriadis fails to explicitly disclose obtaining a wake-word voice, wherein the wake-up voice is used for waking up the electronic device.  However, Prasad teaches after obtaining a wake-up voice of a user each time, extracting and storing a first voiceprint feature corresponding to the wake-up voice, wherein the wake-up voice is used for waking up the electronic device (col. 11, lines 6-50; updating wakeup word every time wake word is detected; each wake word is used to activate a device).
Since Dimitriadis and Prasad are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of updating wake word each time a wake word is detected in an audio signal in order to improve wake word detection accuracy.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).


Claims 3-4 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitriadis in view of Prasad, and further in view of Divakaran et al. (USPG 2017/0160813, hereinafter referred to as Divakaran).

Regarding claims 3-4 and 23-24, Dimitriadis further discloses after obtaining a service instruction voice of a user each time, extracting and storing a second voiceprint feature corresponding to the service instruction voice (figure 5, audio data 400 can be short or long and can contain multiple utterances; a subsequent utterance can be considered a subsequent time the voice is received; absence of specifics in the claim, “a service instruction voice” is considered any audio data); matching the second voiceprint feature with each first voiceprint feature in each of the at least one category (figure 5, audio data 400 can be short or long and can contain multiple utterances; voiceprint feature extracted from a subsequent utterance is compared with voiceprint feature extracted from previous utterance as shown in figure 5 in order to determine how similar they are and to classify them based on the similarity or distance; also see paragraphs 44-46); and storing a category identifier of a successfully matched first voiceprint feature in correspondence to this service type (see paragraphs 44-46 and/or figure 5, clustering similar frames);  
 Dimitriadis fails to explicitly disclose, however, Divakaran teaches  LEGAL\43171722\1Preliminary Amendmentdetermining a service type corresponding to the service instruction voice (paragraphs 36 and 137, weather information and direction to restaurant examples); wherein, determining a service type corresponding to the service instruction voice comprises: identifying the service instruction voice to obtain service instruction voice identification information (paragraphs 36 and 137, weather information and direction to restaurant examples); performing semantic analysis on the service instruction voice identification information (paragraphs 36 and 137, weather information and direction to restaurant examples); and determining the service type corresponding to the service instruction voice based on a result of the semantic analysis (paragraphs 36 and 137, weather information and direction to restaurant examples).  
Since Dimitriadis and Divakaran are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of applying semantic analysis to determine user intent in the spoken text.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claims 7 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitriadis in view of Prasad, and further in view of Fisher et al. (USPN 6246987, hereinafter referred to as Fisher).

Regarding claims 7 and 27, Dimitriadis fails to explicitly disclose, however, Fisher teaches after obtaining a user registration instruction, acquiring a wake-up voice sample for N times in succession to obtain N wake-up voice samples, and outputting a request for requesting a second user identifier, wherein, N is an integer greater than 1 (process in figure 3, steps 62-76 and/or col. 5, line 27 to col. 6, line 67); receiving voice information fed back by the user for the request for requesting the second user identifier, and performing voice identification on the voice information to obtain voice identification information corresponding to the voice information (figure 3, steps 78-88 and/or col. 6, lines 21-62); and determining the voice identification information as the second user identifier, and storing the second user identifier in correspondence to voiceprint features of the obtained N wake-up voice samples, respectively (figure 3, steps 78-88 and/or col. 6, lines 21-62, storing in database 30).  
Since Dimitriadis and Fisher are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of enrolling new user in order to perform speaker identification and/or authentication.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claims 8 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitriadis in view of Prasad, further in view of Tripp et al. (USPG 2006/0020457, hereinafter referred to as Tripp), and further in view of Fisher.

Regarding claims 8 and 28, Dimitriadis further discloses wherein, the electronic device is an intelligent device, and the method further comprises: obtaining the wake-up voice of the user by: detecting voice information in real time (figure 1, audio input 400).  Dimitriadis fails to explicitly disclose, however, Tripp teaches after detecting voice information input by a user, when a silence duration reaches a preset voice pause duration, determining the voice information input by the user as target to-be-identified voice information (paragraph 28, determining speech events, such as start and stop speech events, based on “configurable duration”); performing voice identification on the target to-be-identified voice information to obtain target voice identification information (paragraph 28, based on detected speech event, speech data within detected speech event is classified as belonging to one speaker).
Since Dimitriadis and Tripp are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of detecting speech events.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).
The modified Dimitriadis still fails to explicitly disclose, however, Fisher further teaches LEGAL\43171722\1Preliminary Amendmentwhen the target voice identification information is the same as a preset wake-up word, determining the target to-be-identified voice information as the wake-up voice (process in figure 4, speaker identification/verification by comparing spoken utterance against speaker models to determine a match; also see speaker-independent recognition process in step 106 for recognizing password or digits).  
Since Dimitriadis and Fisher are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of matching spoken utterance against templates to determine if there is a match.  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Claims 9 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Dimitriadis in view of Prasad, further in view of Tripp, further in view of Fisher, and further in view of Beckhardt et al. (USPG 2018/0108351, hereinafter referred to as Beckhardt).

Regarding claims 9 and 29, the combination of Dimitriadis, Tripp, and Fisher discloses wherein the method further comprises: obtaining the wake-up voice of the user by: receiving a wake-up voice sent by the intelligent device (see claim 8 above); wherein, after detecting voice information input by a user, when a silence duration reaches a preset voice pause duration, the intelligent device determines the voice information input by the user as target to-be-identified voice information (see claim 8 above); performs voice identification on the target to-be-identified voice information to obtain target voice identification information (see claim 8 above); determines the target to-be-identified voice information as the wake-up voice when the target voice identification information is the same as a preset wake-up word (see claim 8 above).  The combination of Dimitriadis, Tripp, and Fisher still fails to explicitly disclose, however, Beckhardt teaches the electronic device is a cloud server communicatively connected to an intelligent device (figure 5 and paragraphs 21, 28, 93, 123, and 149, sending wakeword speech to the cloud server); and sending the wake-up voice to the cloud server (paragraphs 21, 28, 93, 123, and 149, sending wakeword speech to the cloud server).
Since the modified Dimitriadis and Beckhardt are analogous in the art because they are from the same field of endeavor, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to use the known technique of sending the wakeword speech to a cloud server  One of ordinary skill in the art would have recognized that the results of the combination were predictable since the use of that known technique provides the rationale to arrive at a conclusion of obviousness. See KSR International Co. v. Teleflex Inc., 82 USPQ2d 1385 (U.S. 2007).

Allowable Subject Matter
Claims 5-6 and 25-26 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Sidi et al. (USPG 2015/0025887) teach a blind diatrization of recorded calls with arbitrary number of speakers.  McLaren et al. (USPG 2016/0283185) teach a semi-supervised speaker diarization method.  Sundaranrajan (USPG 2017/0125024) teaches a method for identifying words and speakers in a continuous speech.  Kane (USPG 2008/0091425) teaches a voice print recognition method for voice identification and matching.  These references are considered pertinent to the claimed invention.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUYEN X VO whose telephone number is (571)272-7631.  The examiner can normally be reached on M-F, 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HUYEN X VO/Primary Examiner, Art Unit 2656