Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the
first inventor to file provisions of the AIA .

Response to Amendment
The amendments filed on June 14th, 2022 have been entered. Examiner amendments
have been entered for original/previously presented claims 1-15 along with new claims 29-33 presented as system claims corresponding to method claims 1-5. Applicant’s amendments to the specifications and to the claims have overcome each objection previously set forth in the Non-Final Office Action mailed March 14th, 2022.

Response to Arguments
Applicant’s arguments, see paras. 5 on pg. 11 – para. 1 on pg. 14 of applicant
arguments/remarks, filed on June 14th, 2022, with respect to independent claim 1 have been fully considered and are persuasive. The rejections for claims 1-15 have been withdrawn. 
	It is further noted that Falkson does not anticipate the feature of “in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model:
processing at least a portion of the audio data using the outdated version of the TI
speaker recognition model to generate TI output; and
determining whether the particular user spoke the spoken input by comparing the
TI output with the speaker embedding of the particular user;…”.
EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or
additions be unacceptable to applicant; an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Scott W.
Higdon on July 29th, 2022.

The application has been amended as follows: 
1.	(Previously Presented)  A method implemented by one or more processors, the method comprising:
receiving, from a client device and via a network, an automated assistant request that includes:
a text independent (TI) speaker embedding for a particular user of the client device, and
audio data that captures spoken input of the particular user, wherein the audio data is captured via one or more microphones of the client device;
determining that the TI speaker embedding was generated using an outdated version of a TI speaker recognition model;
in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model:
processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output; and
determining whether the particular user spoke the spoken input by comparing the TI output with the speaker embedding of the particular user;
in response to determining the particular user spoke the spoken input:
performing one or more actions that are based on the audio data; 
processing previous audio data, that captures previous spoken input of the particular user, using an updated version of the TI speaker recognition model to generate an updated speaker embedding; and
transmitting, to the client device, the updated speaker embedding for the particular user to cause the client device to locally store the updated speaker embedding for transmission with future automated assistant requests. 

2.	(Original)  The method of claim 1, wherein the one or more processors that implement the method are at one or more computing devices that are remote from the client device, and further comprising:
in response to transmitting the updated speaker embedding for the particular user to the client device:
deleting all instances of the updated speaker embedding from the one or more computing devices.

3.	(Previously Presented)  The method of claim 1, wherein processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output comprises:
processing an additional portion of the audio data, that is in addition to an invocation phrase portion of the audio data, using the outdated version of the TI speaker recognition model to generate the TI output.

4.	(Previously Presented)  The method of claim 1, wherein processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output comprises:
processing an invocation phrase portion of the audio data and an additional portion of the audio data using the outdated version of the TI speaker recognition model to generate the TI output.

5.	(Previously Presented)  The method of claim 1, wherein processing the previous audio data that captures previous spoken input of the particular user using an updated version of the TI speaker recognition model to generate an updated speaker embedding comprises:
processing a plurality of instances of the previous audio data using the updated version of the TI speaker recognition model to generate the updated speaker embedding, wherein each of the instances of the previous audio data captures previous spoken input of the particular user.

6.	(Previously Presented)  The method of claim 5, further comprising:
selecting the plurality of instances of the previous audio data based on the plurality of instances of the previous audio data satisfying one or more criteria.

7.	(Original)  The method of claim 6, wherein the one or more criteria include one or multiple of: a length criterion for each of the plurality of instances of the previous audio data; and a diversity criterion for the plurality of instances of the previous audio data. 

8.	(Previously Presented)  The method of claim 5, further comprising:
replacing an instance of previous audio data in the instances of the previous audio data with the audio data that captures the spoken input of the particular user by:
determining a length of each instance of previous audio data in the plurality of previous audio data;
determining a length of the audio data that captures the spoken input of the particular user;
comparing the length of the audio data with the length of each instance of previous audio data; and
in response to determining, based on the comparing, that the audio data is longer than one or more instances of previous audio data, replacing the instance of previous audio data with the shortest length with the audio data.

9.	(Previously Presented)  The method of claim 1, wherein determining that the TI speaker embedding was generated using an outdated version of the TI speaker recognition model is based at least in part on a version identifier for the TI speaker embedding, the version identifier included in the automated assistant request.

10.	(Previously Presented)  The method of claim 1, wherein performing one or more actions that are based on the audio data comprises:
controlling one or more peripheral devices based on the audio data.

11.	(Previously Presented)  The method of claim 1, wherein performing one or more actions that are based on the audio data comprises:
generating responsive content that is customized for the particular user and that is based on the audio data; and
causing the client device to render output based on the responsive content.

12.	(Original)  The method of claim 11, wherein generating the updated speaker embedding is completed subsequent to causing the client device to render output based on the responsive content.

13.	(Previously Presented)  The method of claim 1, wherein the automated assistant request further includes a text dependent (TD) user measure generated locally at the client device using a TD speaker recognition model stored locally at the client device and using a TD speaker embedding stored locally at the client device, the TD speaker embedding being for the particular user, and wherein determining whether the particular user spoke the spoken input by comparing the TI output with the speaker embedding of the particular user further comprises:
determining a TI user measure by comparing the TI output with the speaker embedding; and
determining whether the particular user spoke the spoken input using both the TD user measure and the TI user measure.

14.	(Original)  The method of claim 13, wherein determining whether the particular user spoke the spoke the spoken input using both the TD user measure and the TI user measure comprises:
determining a particular user probability measure which indicates the probability the particular user spoke the spoken input by combining the TD user measure and the TI user measure; and
determining whether the particular user spoke the spoken input by determining whether the particular user probability measure satisfies a threshold.

15.	(Original)  The method of claim 11, wherein in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model further comprises:
determining that a first user profile and a second user profile are associated with the automated assistant request;
responsive to determining that a first user profile and the second user profile are associated with the automated assistant request:
initiating generating of first responsive content that is customized for the first user and that is in response to the spoken input;
initiating generating of second responsive content that is customized for the second user and that is in response to the spoken input;
prior to completion of generating the first responsive content and the second responsive content, processing at least the portion of audio data using the TI speaker recognition model to generate TI output;
determining whether the particular user is the first user, and whether the particular user spoke the spoken input by comparing the TI output with a speaker embedding of the first user; and
wherein in response to determining that the particular user spoke the spoken input further comprises transmitting, to the client device, the first responsive content without transmitting the second responsive content to the client device.

16-28.	(Canceled)  

29.	(New) A collection of one or more automated assistant servers comprising:
memory storing instructions, and 
one or more processors that execute the instructions to:
receive, from a client device and via a network, an automated assistant request that includes:
a text independent (TI) speaker embedding for a particular user of the client device, and
audio data that captures spoken input of the particular user, wherein the audio data is captured via one or more microphones of the client device;
determine that the TI speaker embedding was generated using an outdated version of a TI speaker recognition model;
in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model:
process at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output; and
determine whether the particular user spoke the spoken input by comparing the TI output with the speaker embedding of the particular user;
in response to determining the particular user spoke the spoken input:
perform one or more actions that are based on the audio data; 
process previous audio data, that captures previous spoken input of the particular user, using an updated version of the TI speaker recognition model to generate an updated speaker embedding; and
transmit, to the client device, the updated speaker embedding for the particular user to cause the client device to locally store the updated speaker embedding for transmission with future automated assistant requests. 

30.	(New)  The collection of one or more automated assistant servers of claim 29, wherein in executing the instructions one or more of the processors are further to:
in response to transmitting the updated speaker embedding for the particular user to the client device:
delete all instances of the updated speaker embedding from the collection of one or more automated assistant servers.

31.	(New)  The collection of one or more automated assistant servers of claim 29, wherein in processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output one or more of the processors are to:
process an additional portion of the audio data, that is in addition to an invocation phrase portion of the audio data, using the outdated version of the TI speaker recognition model to generate the TI output.

32.	(New)  The collection of one or more automated assistant servers of claim 29, wherein in processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output one or more of the processors are to:
processing an invocation phrase portion of the audio data and an additional portion of the audio data using the outdated version of the TI speaker recognition model to generate the TI output.

33.	(New)  The collection of one or more automated assistant servers of claim 29, wherein in processing the previous audio data that captures previous spoken input of the particular user using an updated version of the TI speaker recognition model to generate an updated speaker embedding one or more of the processors are to:
process a plurality of instances of the previous audio data using the updated version of the TI speaker recognition model to generate the updated speaker embedding, wherein each of the instances of the previous audio data captures previous spoken input of the particular user.




REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance: 
Previously presented/original claims 1-15 and new claims 29-33 are allowed. 
Claim 1 is allowable for disclosing A method implemented by one or more processors, the method comprising:
receiving, from a client device and via a network, an automated assistant request that includes:
a text independent (TI) speaker embedding for a particular user of the client device, and
audio data that captures spoken input of the particular user, wherein the audio data is captured via one or more microphones of the client device;
determining that the TI speaker embedding was generated using an outdated version of a TI speaker recognition model;
in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model:
processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output; and
determining whether the particular user spoke the spoken input by comparing the TI output with the speaker embedding of the particular user;
in response to determining the particular user spoke the spoken input:
performing one or more actions that are based on the audio data; 
processing previous audio data, that captures previous spoken input of the particular user, using an updated version of the TI speaker recognition model to generate an updated speaker embedding; and
transmitting, to the client device, the updated speaker embedding for the particular user to cause the client device to locally store the updated speaker embedding for transmission with future automated assistant requests. 

Falkson et al. (US Pub. No. 2018/0201226 A1) hereinafter Falkson
Teaches a voice biometric pre-purchase enrollment for autonomous vehicles. Furthermore, A method and system for authenticating a user of a vehicle are described. In one embodiment, a user profile is associated with a user of a vehicle. The user profile is used to compare a sample of the voice of the user with a database onboard the vehicle as well as a database stored on a network connected server in communication with the vehicle. A user profile may be associated with a number of vehicles. Users may have a number of levels of permissions associated with a number of different vehicles. A user may be enabled to access a number of vehicle features by providing a voice sample via microphones in and around the vehicle. A voiceprint associated with the user may be improved based upon audio collected by microphones in and around the vehicle, see abstract. 
	Moreover, Regarding claim 1, A method implemented by one or more processors (Para. 229, the systems and methods of this disclosure can be implemented in conjunction with a special purpose computer, a programmed microprocessor, where para. 227-228 discusses changes, additions, and omissions to this sequence can occur without materially affecting the operation of the disclosed embodiments, configuration, and aspects. Furthermore, figures 20A, 21A, and 22 are figures that relate to the collection of voice samples and to identify the user; however, either in parallel or in series as seen by the figures where scenarios are given on when to update between the client device and the server where either or both are allowed to have a biometric analysis system and receive audio samples to identify a user and update voiceprints at an indicated period of time, see para. 142 and 144), the method comprising:
receiving, from a client device and via a network, an automated assistant request that includes (Para. 218-219, At step 2202, the vehicle system may receive an audio sample… For example, a passenger in a backseat may make an oral command to an AI assistant. Such a request may be recorded as an audio sample… At step 2203, the audio sample may be sent via a network connection to a server):
a text independent (TI) speaker embedding for a particular user of the client
device (Para. 205, At step 2102, after starting the method, a user ID and a voiceprint version may be sent from a vehicle to a network connected server. The data may be transmitted in the form of a packet via a network connection where para. 135 indicates, Alternatively, a voice biometric system may be set up to analyze a voice of a user speaking any words or phrases and identify and verify that speaker i.e. text independent, Figure 22 explains to receive audio sample, element 2202, specifically from the client device i.e. vehicle with onboard biometric analysis that sends the userID and voiceprint version to the server as in 21A, element 2102), and
audio data that captures spoken input of the particular user, wherein the audio
data is captured via one or more microphones of the client device (Para. 218, At step 2202, the vehicle system may receive an audio sample. This may occur when a voice of a user of the vehicle, including a driver or a passenger, or a user of an application as discussed herein, or a caller calling into an entity associated with the vehicle manufacturer, is received via a microphone associated with the system);
determining that the TI speaker embedding was generated using an outdated version of a TI speaker recognition model (Para. 207-208, At step 2105, the server may determine whether the voiceprint version stored in the master database on the server is newer i.e. speaker embedding produced and determined to be of a previous version when compared);
However, in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model, Falkson proceeds to element 2106 of figure 21A, “Servicer sends updated voiceprint file to vehicle” i.e. automatically updates, it does not decide to use the outdated version. Initially connection was drawn out from figure 22 from element 2206 “User Voiceprint Updated?” to the “NO” connection leading to element 2209 which is “End”; however, there was no clear connection between figure 21A and figure 22 to draw the conclusion that an outdated version would be used if the User Voiceprint was not updated. Furthermore, para. 175, indicates that during operation of the system, the voiceprints may constantly or occasionally be updated based on recently received audio samples; however, it does not indicate that a determination is made that the model is outdated and proceeds to use the outdated version. Rather the outdated version is used until the system is updated without a determination. While the system may update periodically or when indicated, Figure 21A proceeds to update the model and the voiceprint automatically without using the outdated version. 

Srinivasan et al. (US Pat. No. 11,004,454 B1) hereinafter Srin, teaches a voice profile
updating configuration. Furthermore, techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data, see abstract. 
Specifically, Srin teaches clustering components to be able to update voice profiles, which may be done offline or periodically, see lines 60-65 on col. 21 and lines 25-32 on col. 22. It would not be ordinary skill in the art, to combine the teachings of Srin with the teachings of Falkson as neither indicate subject matter in which a determination is made that the speaker embedding was generated using the outdated version of the TI speaker recognition model and in response, processing the outdated version. Like Falkson and Srin, they automatically update the speaker embeddings and profile or do it in occasions while using the outdated version; however, the determination is not indicated as being made. 

Sundararajan et al (US Pub. No. 2017/0069312) hereinafter Sunda teaches a method
And system for remotely training and commanding the speech recognition system on a cockpit via a carry-on device in a connected aircraft, specifically, A method for implementing a speaker-independent speech recognition system with reduced latency is provided. The method includes capturing voice data at a carry-on-device from a user during a pre-flight check-in performed by the user for an upcoming flight; extracting features associated with the user from the captured voice data at the carry-on-device; uplinking the extracted features to the speaker-independent speech recognition system onboard the aircraft; and adapting the extracted features with an acoustic feature model of the speaker-independent speech recognition system, see abstract.
	Specifically, para. 17 teaches, FIG. 1 inside the cockpit 200 of the aircraft 175 and communicatively coupled to the speaker-independent speech recognition system 220 in the cockpit 200 of the aircraft 175. The carry-on-device 100, the external storage device 180, and a speech-enabled-cockpit system/device 250 form a system 10 for recognizing speech of a user in the aircraft 175 with reduced latency and reduced training. In one implementation of this embodiment, the system 10 includes the carry-on-device 100 and the speech-enabled-cockpit system/device 250 and there is no external storage device 180. Paras. 22-25 detail how the carry-on device extracts features from the speaker during the pre-flight checklist as to uplink i.e. update the independent speaker model onboard the aircraft. Therefore, the method to improve latency is by doing the process prior to accessing the onboard control system of the aircraft, rather than in response to making the determination that the model is outdated, still using the outdated version; therefore, it would be unreasonable to use a motivation to combine as the feature is not taught. 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Claims 2-15 are allowable as they are dependent on claim 1. 

Claim 29 is allowable as it is directed to a system corresponding to method claim 1. 

Claims 30-33 are allowable as they are dependent on claim 29.

Conclusion
Any inquiry concerning this communication or earlier communications from the
examiner should be directed to JONATHAN E AMAYA HERNANDEZ whose telephone number is (571)272-2484. The examiner can normally be reached Monday - Friday 7:30 am - 3:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/J.E.A./Examiner, Art Unit 2655       
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655