Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
1.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2.	Claims 1-8 are rejected under 35 U.S.C. 103 as being unpatentable over Eddington, JR (2017/0309292) in view of Church et al. (2019/0156832).
As to claim 1, Eddington teaches a computing device, comprising: a processor; and a memory holding instructions executable by the processor (Fig. 1B, [0073]) to: receive an audio signal containing utterances spoken by a person (Fig. 1B, sensor array time domain signal inputs 1A, [0041]); extract magnitude features and phase information features from the signal (Figs. 1B and 5B, [0074, 0078]); input the magnitude features and the phase information features into a speaker location and speaker identification neural network (Fig. 5b items 52 and 54, [0074-0077]), wherein 
Church teaches utterances spoken by multiple persons (abstract, [0004]); determining whether there are more speaker changes in data and processing the vocal quality and characteristics data at the locations where the speaker change value from the meta-information indicated a possible speaker change and updating any speaker changes are stored in data store ([0060-0062]). It would have been obvious to utilize magnitude and phase feature taught by Eddington as part of the meta-information in determining a change in the person speaking in order to indicate that a speaker change at a location.
It would have been obvious before the effective filing date of the claims invention to incorporate the teachings of changing in the person speaking Church into the utilizing both the magnitude features and the phase information features teachings of Eddington for the purpose of indicating that a speaker change occurred at a location.

As to claim 3, Eddington teaches the magnitude features and the phase information features into a speaker location and speaker identification neural network (Fig. 5b items 52 and 54, [0074-0077]), wherein the neural network utilizes both the magnitude features and the phase information features ([0044, 0074, 0078]); and Church teaches determining whether there are more speaker changes in data and processing the vocal quality and characteristics data at the locations where the speaker change value from the meta-information indicated a possible speaker change and updating any speaker changes are stored in data store ([0060-0062]).
As to claim 4, Church teaches processing to create a digital audio stream of voices from at least two different speakers, processing resulting in the spoken words to which a speaker turn detection process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary (abstract, [0004]), 
As to claim 5, Church teaches the computing device of claim 1, wherein the instructions are executable to generate a transcription of a conversation between two or more persons of the multiple persons ([0046-0047, 0051]).
As to claim 6, Church teaches the computing device of claim 5, wherein the joint speaker location and speaker identification neural network is configured to identify the two or more persons of the multiple persons, and the instructions are executable to include in the transcription notations indicating an identity of one or more of the persons in the conversation (Fig. 8 and related text, items 800, 810, and 820, [0046, 0051, 0063-0064]).
As to claim 7, Eddington teaches the computing device of claim 1, wherein the joint speaker location and speaker identification neural network is trained using enrollment utterances (Fig. 5B items 52 and 54, [0005, 0012, 0042, 0074-0077]).  Eddington and Church do not explicitly disclose each of the enrollment utterances comprises both speaker vocal characteristics and speaker location information that are used to train the joint speaker location and speaker identification neural network. However, Eddington teaches using an array of sensors to improve the reception when the sensor signals are filtered using a weighted sum designed to amplify the target 
.

3.	Claims 9-16 are rejected under 35 U.S.C. 103 as being unpatentable over Eddington, JR (2017/0309292) in view of Foote (2002/0122113).
As to claim 9, Eddington teaches at a computing device, a method comprising: receiving an audio signal of an utterance spoken by a user (Fig. 1B, sensor array time domain signal inputs 1A, [0041]); extracting magnitude features and phase information features from the audio signal (Figs. 1B and 5B, [0074, 0078]); inputting the magnitude features and the phase information features into a speaker location and speaker identification neural network (Fig. 5b items 52 and 54, [0074-0077]); receiving from the speaker location and speaker identification neural network location information of the user (Fig. 5B, the outputs of the units 52 and 54). Eddington does not explicitly discuss utilizing the user location information to track a changing location of the user.
Foote teaches camera operations is to zoom and pan to follow the speaker ([0072]) and setting appropriate camera pan/zoom parameters to capture a podium speaker and motion above the podium signal the appropriate camera to move to a location preset to capture a podium speaker ([0148]); and relying on phase differences to estimate the direction and distance of an acoustic source, the microphone with the highest average magnitude indicating the rough direction of an acoustic source ([0159]).
It would have been obvious before the effective filing date of the claims invention to incorporate the teachings of Foote into the teachings of Eddington for the purpose of 
As to claim 10, Foote teaches the method of claim 9 wherein tracking the changing location of the user is performed without utilizing information from an enrollment utterance of the user ([0072, 0148, 0159, and 0162] – where Foote discussed utilizing camera operations is to zoom and pan to follow the speaker, setting appropriate camera pan/zoom parameters to capture a podium speaker and relying on phase differences to estimate the direction and distance of an acoustic source, the microphone with the highest average magnitude indicating the rough direction of an acoustic source).
As to claim 11, Eddington and Foote do not explicitly discuss the method of claim 9, further comprising utilizing information from an enrollment utterance of the user in addition to the user location information to track the changing location of the user. However, Eddington teaches the multi-level classification architecture uses more detailed sensor information locally while streaming lower bandwidth local classification data over a network for higher performance by providing automatic speech recognition and human speaker identification ([0077]); and Foote teaches camera operations is to zoom and pan to follow the speaker ([0072]). It would have been obvious that with speech recognition and human speaker identification can be utilized to identify who speaking and incorporate the teachings of Foote to track the changing location of the user, for example, capturing a podium speaker movement.

As to claim 13, Foote teaches the method of claim 12, further comprising following the user with the moveable camera as the changing location of the user relative to the computing device changes ([0072, 0148]).
As to claim 14, Foote teaches the method of claim 9, further comprising directing a moveable display of the computing device toward one location of the changing location of the user ([0072, 0148]).
As to claim 15, Foote teaches the method of claim 14, further comprising following the user with the moveable display as the changing location of the user relative to the computing device changes ([0072, 0148]).
As to claim 16, Eddington teaches the method of claim 9, wherein the joint speaker location and speaker identification neural network is trained using a plurality of utterances from a plurality of persons (Fig. 5B items 52 and 54, [0005, 0012, 0042, 0074-0077]), Eddington and Foote do not explicitly discuss each utterance of the plurality of utterances comprises both speaker vocal characteristics and speaker location information that are used to train the joint speaker location and speaker identification neural network. However, Eddington teaches using an array of sensors to improve the reception when the sensor signals are filtered using a weighted sum designed to amplify the target signal by weighting time delay differences of the signal arrival ([0005]), and an EME configured to process SLE outputs and MCAEC outputs to characterize active sources as transducer or non-transducer, to calculate enclosure .
Allowable Subject Matter
4.	Claims 17-20 allowed.

Double Patenting
5.	The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claim 1 rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 10,580,414 in view of Church et al. (2019/0156832). Although the claims at issue are not identical, they are not patentably distinct from each other because all the claimed limitations recited in the present application are transparently found in the U.S. Patent 10,580,414 with obvious wording variations.

U.S. Patent Application 16/802,993
U.S. Patent 10,580,414
A computing device, comprising:
A computing device, comprising:
a processor; and




receive an multi-channel audio signal containing utterances spoken by a user;
extract magnitude features and phase information features from the signal;
extract magnitude features and phase information features from the signal;
Input the magnitude features and the phase information features into a joint speaker location and speaker identification neural network, wherein the neural network utilizes both the  magnitude features and the phase information features to determine a change in the person speaking;
Input the magnitude features and the phase information features into a joint speaker location and speaker identification neural network, wherein the joint speaker location and speaker identification neural network is trained using a plurality of utterances from a plurality of persons, wherein each utterance of the plurality of utterances comprises both speaker vocal characteristics and speaker spatial information that are used to train the joint speaker location and speaker identification neural network;
receive, from the joint speaker location and speaker identification neural network, 


compare the user embedding to a plurality of enrollment embeddings extracted from the plurality of utterances that are each associated with an identity of a corresponding person;
based at least on the comparisons, match the user to an identity of one of the persons; and
Output the identity of the person.


Claim 1 of U.S. Patent No. 10,580,414 does not teach utilizes both the magnitude features and the phase information features to determine a change in the person speaking. Church teaches determining whether there are more speaker changes in data and processing the vocal quality and characteristics data at the locations where the speaker change value from the meta-information indicated a possible speaker change and updating any speaker changes are stored in data store ([0060-0062]). It would have been obvious to utilize magnitude and phase feature information in determining a change in the person speaking.  It would have been obvious before the effective filing date of the claims invention to incorporate the teachings of changing in the person speaking Church into the utilizing both the magnitude features and the 
The examiner also notes that claim 9 of the ‘993 Application respectively corresponds to Claim 12 of the ‘414 patent.
Conclusion
6.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Quynh H. Nguyen whose telephone number is (571)272-7489.  The examiner can normally be reached on Monday-Friday 7AM-3PM.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any response to this action should be mailed to:
                        Commissioner of Patents and Trademarks
                        P.O. Box 1450
                        Alexandria, VA  22313-1450

Or faxed to:

                    (571) 273-8300, for formal communications intended for entry and for 
                          Informal or draft communications, please label “PROPOSED” or “DRAFT.”
                             
 Hand-delivered responses should be brought to: 

                         Customer Service Window 
                         Randolph Building 
                         401 Dulany Street 
                         Alexandria, VA 22314



/QUYNH H NGUYEN/Primary Examiner, Art Unit 2652