DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 09/27/2022 have been fully considered but they are not persuasive. Regarding arguments on pages 8-9 of the Remarks, Examiner notes that the previously cited Fan reference, in view of newly applied Kim reference, teach the argued limitations.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 20 recites the limitation "the competing speaker" in line 7.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, 13, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Fan et al. (US 10,923,111 B1), hereinafter referred to as Fan, in view of Kim et al. (US 2016/0093304 A1), hereinafter referred to as Kim.

Regarding claim 1, Fan teaches:
A system comprising: 
a processing unit (Fig. 11 element 1104, col. 28 lines 6-16, where a processor is used); and 
a memory storage device including program code (Fig. 11 element 1106, col. 28 lines 6-16, where memory is used) that when executed by the processing unit enables the system to: 
a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (col. 22 lines 17-39, where audio feature vectors of the beginning of an utterance are included, and col. 2 line 53 - col. 3 line 6, where the first portion corresponds to the target speaker); 
a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker (col. 22 lines 17-39, where audio feature vectors of an utterance are included, and col. 2 line 53 - col. 3 line 6, where the other portions correspond to other speakers); 
identify a new speech signal that includes mixed audio from a plurality of different speakers including at least the target speaker and the competing speaker (col. 13 line 63 - col. 14 line 16, where the second portion contains speech from a first and second speaker);
determine a mixed speaker speech vector representing a frame of the new speech signal (col. 3 line 66 – col. 4 line 31, where a second encoded feature vector represents the second portion); 
determine a respective similarity between each of the target speaker speech vectors of the plurality of target speaker speech vectors and the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, where similarity is determined between the first and second portions of audio using the feature vectors); 
generate a target speaker bias vector that is based on a combination of the plurality of target speaker speech vectors by deterministically combining each of the target speaker speech vectors of the plurality of target speaker speech vectors into the target speaker bias vector based on the respective similarity between each of the target speaker speech vectors and the mixed speaker speech vector with the target speaker speech vectors having more similarity to the mixed speaker speech vector being given more weight in the target speaker bias vector than target speaker speech vectors having less similarity to the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, col. 3 line 66 - col. 4 line 31, and col. 17 lines 1-18, where an attention mechanism applies weights based on similarity to an encoded feature vector based on the first and second portions of speech); and 
determine a respective similarity between each of the competing speaker speech vectors of the plurality of competing speaker speech vectors and the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, where similarity is determined between the first and second portions of audio using the feature vectors); and
generate a competing speaker bias vector that is based on a combination of the plurality of competing speaker speech vectors by deterministically combining each of the competing speaker speech vectors of the plurality of competing speaker speech vectors into the competing speaker bias vector based on the respective similarity between each of the competing speaker speech vectors and the mixed speaker speech vector with the competing speaker speech vectors having more similarity to the mixed speaker speech vector being given more weight in the competing speaker bias vector than competing speaker speech vectors having less similarity to the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, col. 3 line 66 - col. 4 line 31, and col. 17 lines 1-18, where an attention mechanism applies weights based on similarity to an encoded feature vector based on the first and second portions of speech), 
extract a frame of speech of the target speaker from the frame of the new speech signal based on a combination of (i) the target speaker bias vector, (ii) the competing speaker bias vector, and (iii) the frame of the new speech signal (col. 2 line 53 - col. 3 line 6, where the weighted output is used to determine output data including speech from the desired speaker without other speech/noise).  
Fan does not teach:
access a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker;
access a competing speaker profile comprising a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker;
Fan also does not teach determining similarity using a competing speaker vector instead of the target speaker vector.
Kim teaches:
access a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each target speaker speech vector of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (para [0038], where a speaker profile is generated and i-vectors representing speaker utterances are used);
access a competing speaker profile comprising a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker (para [0038], where a speaker profile is generated and i-vectors representing speaker utterances are used, and para [0049], where profiles for multiple speakers are created);
Kim also teaches that the audio input can be compared with some of all of the speaker profiles to determine the identity of the speaker (Kim para [0049], where comparison is performed with multiple speaker profiles).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Fan by using the multiple speaker profiles of Kim (Kim para [0049]) in the weighting of Fan (Fan col. 2 line 53 – col. 3 line 6), in order to determine whether or not to trigger a virtual assistant (Kim para [0015]).

Regarding claim 6, Fan in view of Kim teaches:
The system of claim 1, wherein a contribution of one of the target speaker speech vectors to the target speaker bias vector is directly proportional to the similarity of the one of the target speaker speech vectors to the mixed speaker speech vector representing the frame of the speech signal (Fan col. 2 line 53 - col. 3 line 6,  where weights are assigned based on similarity, an col. 20 lines 17-21, where the weights indicating similarity are directly proportional to the encoded features).  

Regarding claim 13, Fan in view of Kim teaches:
The method of claim 19, wherein a contribution of one of the target speaker speech vectors to the target speaker bias vector is directly proportional to the similarity of the one of the target speaker speech vectors to the mixed speaker speech vector representing the speech signal (Fan col. 2 line 53 - col. 3 line 6,  where weights are assigned based on similarity, an col. 20 lines 17-21, where the weights indicating similarity are directly proportional to the encoded features).  

Regarding claim 19, Fan teaches:
A method comprising: 
a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (col. 22 lines 17-39, where audio feature vectors of the beginning of an utterance are included, and col. 2 line 53 - col. 3 line 6, where the first portion corresponds to the target speaker); 
a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker (col. 22 lines 17-39, where audio feature vectors of an utterance are included, and col. 2 line 53 - col. 3 line 6, where the other portions correspond to other speakers); 
identifying a new speech signal that includes mixed audio from a plurality of different speakers including at least the target speaker and the competing speaker (col. 13 line 63 - col. 14 line 16, where the second portion contains speech from a first and second speaker);
determining a mixed speaker speech vector representing a frame of the new speech signal (col. 3 line 66 – col. 4 line 31, where a second encoded feature vector represents the second portion); 
determining a respective similarity between each of the target speaker speech vectors of the plurality of target speaker speech vectors and the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, where similarity is determined between the first and second portions of audio using the feature vectors); 
generating a target speaker bias vector that is based on a combination of the plurality of target speaker speech vectors by deterministically combining each of the target speaker speech vectors of the plurality of target speaker speech vectors into the target speaker bias vector based on the respective similarity between each of the target speaker speech vectors and the mixed speaker speech vector with the target speaker speech vectors having more similarity to the mixed speaker speech vector being given more weight in the target speaker bias vector than target speaker speech vectors having less similarity to the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, col. 3 line 66 - col. 4 line 31, and col. 17 lines 1-18, where an attention mechanism applies weights based on similarity to an encoded feature vector based on the first and second portions of speech); and 
determining a respective similarity between each of the competing speaker speech vectors of the plurality of competing speaker speech vectors and the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, where similarity is determined between the first and second portions of audio using the feature vectors); and
generating a competing speaker bias vector that is based on a combination of the plurality of competing speaker speech vectors by deterministically combining each of the competing speaker speech vectors of the plurality of competing speaker speech vectors into the competing speaker bias vector based on the respective similarity between each of the competing speaker speech vectors and the mixed speaker speech vector with the competing speaker speech vectors having more similarity to the mixed speaker speech vector being given more weight in the competing speaker bias vector than competing speaker speech vectors having less similarity to the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, col. 3 line 66 - col. 4 line 31, and col. 17 lines 1-18, where an attention mechanism applies weights based on similarity to an encoded feature vector based on the first and second portions of speech), 
extracting a frame of speech of the target speaker from the frame of the new speech signal based on a combination of (i) the target speaker bias vector, (ii) the competing speaker bias vector, and (iii) the frame of the new speech signal (col. 2 line 53 - col. 3 line 6, where the weighted output is used to determine output data including speech from the desired speaker without other speech/noise).  
Fan does not teach:
accessing a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each target speaker speech vector of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker;
accessing a competing speaker profile comprising a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker;
Fan also does not teach determining similarity using a competing speaker vector instead of the target speaker vector.
Kim teaches:
accessing a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each target speaker speech vector of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (para [0038], where a speaker profile is generated and i-vectors representing speaker utterances are used);
accessing a competing speaker profile comprising a plurality of competing speaker speech vectors configured to represent speech characteristics of a competing speaker, each of the plurality of competing speaker speech vectors representing a respective and different frame of speech of the competing speaker (para [0038], where a speaker profile is generated and i-vectors representing speaker utterances are used, and para [0049], where profiles for multiple speakers are created);
Kim also teaches that the audio input can be compared with some of all of the speaker profiles to determine the identity of the speaker (Kim para [0049], where comparison is performed with multiple speaker profiles).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Fan by using the multiple speaker profiles of Kim (Kim para [0049]) in the weighting of Fan (Fan col. 2 line 53 – col. 3 line 6), in order to determine whether or not to trigger a virtual assistant (Kim para [0015]).

Regarding claim 20, Fan teaches:
A method comprising: 
a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (col. 22 lines 17-39, where audio feature vectors of the beginning of an utterance are included, and col. 2 line 53 - col. 3 line 6, where the first portion corresponds to the target speaker); 
identifying a new speech signal that includes mixed audio from a plurality of different speakers including at least the target speaker and the competing speaker (col. 13 line 63 - col. 14 line 16, where the second portion contains speech from a first and second speaker);
determining a mixed speaker speech vector representing a frame of the new speech signal (col. 3 line 66 – col. 4 line 31, where a second encoded feature vector represents the second portion); 
determining a respective similarity between each of the target speaker speech vectors of the plurality of target speaker speech vectors and the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, where similarity is determined between the first and second portions of audio using the feature vectors); 
generating a target speaker bias vector that is based on a combination of the plurality of target speaker speech vectors by deterministically combining each of the target speaker speech vectors of the plurality of target speaker speech vectors into the target speaker bias vector based on the respective similarity between each of the target speaker speech vectors and the mixed speaker speech vector with the target speaker speech vectors having more similarity to the mixed speaker speech vector being given more weight in the target speaker bias vector than target speaker speech vectors having less similarity to the mixed speaker speech vector (col. 2 line 53 - col. 3 line 6, col. 3 line 66 - col. 4 line 31, and col. 17 lines 1-18, where an attention mechanism applies weights based on similarity to an encoded feature vector based on the first and second portions of speech); and 
extracting a frame of speech of the target speaker from the frame of the new speech signal based on a combination of (i) the target speaker bias vector and (ii) the frame of the new speech signal (col. 2 line 53 - col. 3 line 6, where the weighted output is used to determine output data including speech from the desired speaker without other speech/noise).  
Fan does not teach:
accessing a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each target speaker speech vector of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker;
Kim teaches:
accessing a target speaker profile comprising a plurality of target speaker speech vectors that each represent speech characteristics of a target speaker, each target speaker speech vector of the plurality of target speaker speech vectors representing a respective and different frame of speech of the target speaker (para [0038], where a speaker profile is generated and i-vectors representing speaker utterances are used);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Fan by using the multiple speaker profiles of Kim (Kim para [0049]) in the weighting of Fan (Fan col. 2 line 53 – col. 3 line 6), in order to determine whether or not to trigger a virtual assistant (Kim para [0015]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2020/0243094 A1 para [0309] teaches a diarizer using speaker identification to identify voices of various people; US 2017/0270919 A1 para [0134] teaches user profiles storing feature vectors corresponding to audio data.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658