DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the
first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
Last line on para. 23 says “audio data audio data”, audio data is repeated twice.
Line 15 on para. 60, discloses, “one or more users may associated with the client
device.” Add “be” after may or replace may for “are”.  
Lines 3 on para. 74, discloses, “references ins one or more segments…”, delete “s” from “ins”.
Appropriate correction is required.

Claim Objections
Claim 8 objected to because of the following informalities:  
“audio data” repeated twice in line 8 of claim 8.
Appropriate correction is required.




Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form
the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3-5, 8-13, and 16-17 are rejected under 35 U.S.C. 102(a)(1) as being
anticipated by Falkson et al. hereinafter Falkson (US Pub No. 2018/0201226 A1).
Regarding claim 1, A method implemented by one or more processors (Para. 229, the systems and methods of this disclosure can be implemented in conjunction with a special purpose computer, a programmed microprocessor, where para. 227-228 discusses changes, additions, and omissions to this sequence can occur without materially affecting the operation of the disclosed embodiments, configuration, and aspects. Furthermore, figures 20A, 21A, and 22 are figures that relate to the collection of voice samples and to identify the user; however, either in parallel or in series as seen by the figures where scenarios are given on when to update between the client device and the server where either or both are allowed to have a 
receiving, from a client device and via a network, an automated assistant request that includes (Para. 218-219, At step 2202, the vehicle system may receive an audio sample… For example, a passenger in a backseat may make an oral command to an AI assistant. Such a request may be recorded as an audio sample… At step 2203, the audio sample may be sent via a network connection to a server):
a text independent (TI) speaker embedding for a particular user of the client
device (Para. 205, At step 2102, after starting the method, a user ID and a voiceprint version may be sent from a vehicle to a network connected server. The data may be transmitted in the form of a packet via a network connection where para. 135 indicates, Alternatively, a voice biometric system may be set up to analyze a voice of a user speaking any words or phrases and identify and verify that speaker i.e. text independent, Figure 22 explains to receive audio sample, element 2202, specifically from the client device i.e. vehicle with onboard biometric analysis that sends the userID and voiceprint version to the server as in 21A, element 2102), and
audio data that captures spoken input of the particular user, wherein the audio
data is captured via one or more microphones of the client device (Para. 218, At step 2202, the vehicle system may receive an audio sample. This may occur when a voice of a user of the vehicle, including a driver or a passenger, or a user of an application as discussed herein, or a caller calling into an entity associated with the vehicle manufacturer, is received via a microphone associated with the system);

in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model (Para. 209, At step 2106, in which the server has a newer version than the vehicle, the server may send an updated voiceprint file, along with the voiceprint version to the vehicle. The vehicle may receive the updated voiceprint file and voiceprint version and update its own onboard database; however, para. 204 indicates the method may begin and run periodically at particular times, or upon user demand, user preferences, or upon detecting a change in a voiceprint ID version and para. 221 the server may determine whether the voiceprint was updated. If not, the method may end at step 2209;):
processing at least a portion of the audio data using the outdated version of the
TI speaker recognition model to generate TI output (Para. 221, indicates at step 2206, the server may determine whether the voiceprint was updated. If not, the method may end at step 2209, where onboard analysis may still be used i.e. outdated version of the TI speaker recognition model generate TI output as analysis result para. 197, figure 20A);
determining whether the particular user spoke the spoken input by comparing
the TI output with the speaker embedding of the particular user (Para. 198, at step 2006 the verification decision may be made based solely on the onboard generated match score i.e. comparing the TI output with the speaker embedding of the particular user e.g. match onboard);

performing one or more actions that are based on the audio data (Para. 221 and
143, may be recorded as audio samples at any time, such as during phone calls, during the initiation of phone calls (e.g. “call mother”), during other voice commands (e.g. “give me directions to mother's”), during voice commands from outside the car (e.g. “unlock the car”), via an application input or 1800 number (e.g. “unlock the car for mother”);
processing previous audio data, that captures previous spoken input of the
particular user, using an updated version of the TI speaker recognition model to generate an updated speaker embedding (Para. 221 and 182-183, At step 2205, the audio sample may be further processed by the server in order to attempt to update the associated voiceprint file. At step 2206, the server may determine whether the voiceprint was updated where para. 182-183 discuss voice biometric analysis system as to generate the updated speaker embeddings where it may be present on a network connected server, see para 132 where the server may contain a newer version i.e. model used in biometric analysis system); and
transmitting, to the client device, the updated speaker embedding for the
particular user to cause the client device to locally store the updated speaker embedding for transmission with future automated assistant requests (Para. 209, At step 2106, in which the server has a newer version than the vehicle, the server may send an updated voiceprint file, 

Regarding claim 3, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
 wherein processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output comprises:
processing an additional portion of the audio data, that is in addition to an invocation phrase portion of the audio data, using the outdated version of the TI speaker recognition model to generate the TI output (Para. 208-210, At step 2105, the server may determine whether the voiceprint version stored in the master database on the server is newer… At step 2106, in which the server has a newer version than the vehicle, the server may send an updated voiceprint file, along with the voiceprint version to the vehicle. The vehicle may receive the updated voiceprint file and voiceprint version and update its own onboard database…to end at step 2109 where, para. 134 indicates, the voice biometric analysis may be one or more of text-dependent and/or text-independent i.e. in this case text-independent as the additional portion of the audio data is after the password or passphrase to be spoken i.e. text-dependent, after password or passphrase is the additional portion considered text-independent such as speaking any words or phrases, see para. 134).

Regarding claim 4, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein processing at least a portion of the audio data using the outdated version of the TI speaker recognition model to generate TI output comprises:
processing an invocation phrase portion of the audio data and an additional portion of the audio data using the outdated version of the TI speaker recognition model to generate the TI output (Para. 208-210, At step 2105, the server may determine whether the voiceprint version stored in the master database on the server is newer… At step 2106, in which the server has a newer version than the vehicle, the server may send an updated voiceprint file, along with the voiceprint version to the vehicle. The vehicle may receive the updated voiceprint file and voiceprint version and update its own onboard database…to end at step 2109 where, para. 134 indicates, the voice biometric analysis may be one or more of text-dependent and/or text-independent i.e. in this case text-independent as the additional portion of the audio data is after the password or passphrase to be spoken i.e. text-dependent, after password or passphrase is the additional portion considered text-independent such as speaking any words or phrases, see para. 134).




claim 5, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein processing the previous audio data that captures previous spoken input of the particular user using an updated version of the TI speaker recognition model to generate an updated speaker embedding comprises:
processing a plurality of instances of the previous audio data using the updated version of the TI speaker recognition model to generate the updated speaker embedding (Para. 144, The voice biometric analysis authentication server 1380 may be operable to perform voice biometric analysis and voice recognition analysis based on audio samples transmitted from the vehicle 1310 via the network 1320 i.e. processing audio samples plural where audio samples have instances i.e. frames where para. 175 indicates that the voiceprints may constantly or occasionally be updated based on recently received audio samples as through the updated biometric analysis system containing TI speaker recognition model as recognized by para. 134), wherein each of the instances of the previous audio data captures previous spoken input of the particular user (Para. 175, based on recently received audio samples, past tense; therefore, received where audio samples have frames i.e. instances of the previous audio data).

Regarding claim 8, Falkson teaches the method of claim 5 (see claim 5 above), additionally, Falkson discloses:
further comprising:
replacing an instance of previous audio data in the instances of the previous audio data with the audio data that captures the spoken input of the particular user (Lines 46- 59 on col. 4, 
determining a length of each instance of previous audio data in the plurality of
previous audio data (Lines 31 – 38 on col. 17, determines length of training utterances may be stored in audio files as different playback lengths in the range of 0.5-10 seconds);
determining a length of the audio data that captures the spoken input of the
particular user (Lines 5- 12 on col. 18, determines batch for a first training cycle may specify a total of 8 speakers to be represented in the batch with 7 utterances of 1.5 seconds each for each speaker i.e. determining a length of the audio data that captures the spoken input of particular speaker);
comparing the length of the audio data with the length of each instance of
previous audio data (Lines 1-5 on col. 18, the training system determines training criteria for a batch of training samples that is to be used in a particular training cycle. The training criteria can include, for example, a total number of speakers to be represented in the 
in response to determining, based on the comparing, that the audio data is
longer than one or more instances of previous audio data, replacing the instance of previous audio data with the shortest length with the audio data audio data (Lines 1-12 on col. 18, A response to the training criteria with a longer utterance of 1.5 seconds, it replaces with a second batch with the shortest length of 0.5 seconds each).

Regarding claim 9, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein determining that the TI speaker embedding was generated using an outdated version of the TI speaker recognition model is based at least in part on a version identifier for the TI speaker embedding, the version identifier included in the automated assistant request (Para. 182 -183, Upon collecting the audio sample, at step 1905, the audio sample may be processed by a voice biometric analysis system. This processing step may be performed via a third-party server or onboard a vehicle via an onboard voice biometric analysis system…Furthermore, at step 2105, the server may determine whether the voiceprint version stored in the master database on the server is newer. This determination may be made by determining a time and date the voiceprint was created or modified based on the version number or a timestamp i.e. version of server compared to version of onboarding included in the automated assistant request; that is how it is determined to be outdated).

claim 10, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein performing one or more actions that are based on the audio data comprises:
controlling one or more peripheral devices based on the audio data (Para. 136 and 143, For example, a passenger in the rear, passenger-side seat may speak a command to a vehicle artificial intelligence (“AI”) assistant…For example, an owner of the vehicle may set a number of user preferences, such as disabling the ability for a passenger in the backseat to adjust the radio, to control the air conditioner, etc. Alternatively, the system may detect a speaker is sitting in the driver's seat and give only that voice the ability to activate certain driving control features; such commands also present in para. 143 such as, such as during phone calls, during the initiation of phone calls (e.g. “call mother”), during other voice commands (e.g. “give me directions to mother's”), during voice commands from outside the car (e.g. “unlock the car”), via an application input or 1800 number (e.g. “unlock the car for mother”). 

Regarding claim 11, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein performing one or more actions that are based on the audio data comprises:
generating responsive content that is customized for the particular user and that is based on the audio data (Para. 143, Audio may be collected during operation of the vehicle and at any time any voice command is given to the vehicle. Furthermore, interior and exterior audio may be recorded as audio samples at any time, such as during phone calls, during the initiation of phone calls (e.g. “call mother”), during other voice commands (e.g. “give me directions to 
causing the client device to render output based on the responsive content (Para. 187, client device may be vehicle which the interface, AI assistant equipped, greets the user recognized as responsive content or application downloaded on phone, see para. 192 with devices 1600).

Regarding claim 12, Falkson teaches the method of claim 11 (see claim 11 above), additionally, Falkson discloses:
 wherein generating the updated speaker embedding is completed subsequent to causing the client device to render output based on the responsive content (At step 2205, the audio sample may be further processed by the server in order to attempt to update the associated voiceprint file. At step 2206, the server may determine whether the voiceprint was updated. If not, the method may end at step 2209 i.e. where the voiceprint was updated or not it may still render output…Furthermore, para 175 indicates the voiceprints may constantly or occasionally be updated based on recently received audio samples. A change in a voiceprint file may be reflected by a version number or timestamp. In some cases, a vehicle may periodically or by request send a packet containing information reflecting such a change i.e. updates not needed every single time and may occur with using the previous version). 

claim 13, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein the automated assistant request further includes a text dependent (TD) user measure generated locally at the client device using a TD speaker recognition model stored locally at the client device and using a TD speaker embedding stored locally at the client device (Para. 218-219, At step 2202, the vehicle system may receive an audio sample i.e. automated assistant request received; furthermore, Para. 131-134, The voice biometric analysis may be one or more of text-dependent and/or text-independent, where voiceprint created may be a spectrogram e.g. a measure of signal strength over time, A voice print may be used by a voice recognition software program, either onboard the vehicle, on a network connected server, or both, to one or more of identify, verify, or authenticate a user's voice; furthermore, A voice print may be a spectrogram of an audio sample of a voice of a user and acts as a digitized and stored model or template of a person's speech), the TD speaker embedding being for the particular user (Para. 131-134, A voice print may be used by a voice recognition software program, either onboard the vehicle, on a network connected server, or both, to one or more of identify, verify, or authenticate a user's voice), and wherein determining whether the particular user spoke the spoken input by comparing the TI output with the speaker embedding of the particular user further comprises:
determining a TI user measure by comparing the TI output with the speaker embedding (Para. 194, the onboard processor may compare the analysis results with voiceprints associated with user profiles stored onboard the vehicle and generate an ‘onboard match score’ at step 
determining whether the particular user spoke the spoken input using both the TD user measure and the TI user measure (Para. 134, The voice biometric analysis may be one or more of text-dependent and/or text-independent. For example, a user may set up a password or passphrase to be spoken. Alternatively, a voice biometric system may be set up to analyze a voice of a user speaking any words or phrases and identify and verify that speaker. For example, a user may simply utter any command to an AI assistant associated with the vehicle and that utterance may be used to identify and verify the identity of the speaker).

Regarding claim 16, A method implemented by one or more processors (Para. 229, the systems and methods of this disclosure can be implemented in conjunction with a special purpose computer, a programmed microprocessor, where para. 227-228 discusses changes, additions, and omissions to this sequence can occur without materially affecting the operation of the disclosed embodiments, configuration, and aspects. Furthermore, figures 20A, 21A, and 22 are figures that relate to the collection of voice samples and to identify the user; however, either in parallel or in series as seen by the figures where scenarios are given on when to update between the client device and the server where either or both are allowed to have a biometric analysis system and receive audio samples to identify a user and update voiceprints at an indicated period of time, see para. 142 and 144), the method comprising:
receiving, from a client device and via a network, an automated assistant request that includes (Para. 218-219, At step 2202, the vehicle system may receive an audio sample… For 
audio data that captures spoken input of a user, wherein the audio data is
captured at one or more microphones of the client device (Para. 218-219, At step 2202, the vehicle system may receive an audio sample… For example, a passenger in a backseat may make an oral command to an AI assistant. Such a request may be recorded as an audio sample… At step 2203, the audio sample may be sent via a network connection to a server), and
a text dependent (TD) user measure generated locally at the client device using a
TD speaker recognition model stored locally at the client device and using a TD speaker embedding stored locally at the client device, the TD speaker embedding being for a particular user (Para. 131-134, The voice biometric analysis may be one or more of text-dependent and/or text-independent, where voiceprint created may be a spectrogram e.g. a measure of signal strength over time, A voice print may be used by a voice recognition software program, either onboard the vehicle, on a network connected server, or both, to one or more of identify, verify, or authenticate a user's voice; furthermore, A voice print may be a spectrogram of an audio sample of a voice of a user and acts as a digitized and stored model or template of a person's speech);
processing at least a portion of the audio data using a text independent (TI) speaker recognition model to generate TI output (Para. 134, The voice biometric analysis may be one or more of text-dependent and/or text-independent… and a voiceprint is a model where para. 194 indicates the vehicle's onboard processor may begin processing the audio sample onboard the 
determining a TI user measure by comparing the TI output with a TI speaker embedding that is associated with the automated assistant request, and that is for the particular user (Para. 194, the onboard processor may compare the analysis results with voiceprints associated with user profiles stored onboard the vehicle and generate an ‘onboard match score’ at step 2004 i.e. determining measure of comparison from analysis results with voiceprints associated with users);
determining whether the particular user spoke the spoken input using both the TD user measure and the TI user measure (Para. 134, The voice biometric analysis may be one or more of text-dependent and/or text-independent. For example, a user may set up a password or passphrase to be spoken. Alternatively, a voice biometric system may be set up to analyze a voice of a user speaking any words or phrases and identify and verify that speaker. For example, a user may simply utter any command to an AI assistant associated with the vehicle and that utterance may be used to identify and verify the identity of the speaker);
in response to determining the spoken input is spoken by the particular user (Para. 221, If, on the other hand, the voiceprint was updated, the master database stored on the server may be updated at step 2207 reflecting the new voiceprint and a new voiceprint version ID may be generated. At this point, the server may optionally send an updated voiceprint to all vehicles associated with the associated User ID. At step 2209 the method may end):
generating responsive content that is responsive to the spoken input and that is

transmitting the responsive content to the client device to cause the client
device to render output based on the responsive content (Para. 142, the vehicle 1310 may comprise one or more of an onboard voice biometric analysis system 1311, an onboard database 1312, a transmitter 1313, a receiver 1314, and a microphone 1315… where the receiver 1314 may be operable to receive data from the network 1320 i.e. as to respond to “unlock the car for mother” as in para. 143 or for example, if the speaker is identified as a user of the vehicle with driving permissions, the driver seat may move to a correct position and the mirrors may move accordingly as in para. 187).

Regarding claim 17, Falkson teaches the method of claim 16 (see claim 16 above), additionally, Falkson discloses:
wherein the automated assistant request received from the client device via the network further includes the TI speaker embedding for the particular user (Para. 205, At step 2102, after starting the method, a user ID and a voiceprint version ma y be sent from a vehicle to a network connected server. The data may be transmitted in the form of a packet via a network connection where para. 135 indicates, Alternatively, a voice biometric system may be 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35
U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness
rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



The factual inquiries for establishing a background for determining obviousness under
35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


This application currently names joint inventors. In considering patentability of the
claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 2 is rejected under 35 U.S.C. 103 as being unpatentable over Falkson in view of
Schwindt et al. hereinafter Schwindt (US Pat. No. 10,957,330).
Regarding claim 2, Falkson teaches the method of claim 1 (see claim 1 above), additionally, Falkson discloses:
wherein the one or more processors that implement the method are at one or more computing devices that are remote from the client device, and further comprising:
in response to transmitting the updated speaker embedding for the particular user to the client device (para 209 – 210, Response to sending an updates voiceprint file to the vehicle/client device it may send updated voiceprint file to all associated vehicles):
Falkson fails to explicitly disclose:
deleting all instances of the updated speaker embedding from the one or more

In a related field of endeavor (e.g. systems and methods for control of vehicles; furthermore, use of voiceprints for authorization of commands, see abstract) schwindt discloses, At the conclusion of the flight, the reference voiceprint can be discarded such as being deleted from the system. In this manner, laws or other regulations prohibiting or limiting the storage of biometric information may be satisfied, see lines 18 – 22 on col. 5. It is related to privacy and regulatory rules of storing biometrics in a system; therefore, deleting that information from the computing device. 
Modifying Falkson method implemented by one or more processors to include the features of Schwindt discloses:
deleting all instances of the updated speaker embedding from the one or more
computing devices (e.g. Falkson’s method implemented by one or more processors, now modified by Schwindt where rather than storing the information or sending it to other vehicles associated it deletes the information, see lines 18 – 22 on col. 5, in response to transmitting the information to the client device).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Schwindt to the method of Falkson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two disclosures, for example systems and methods for control of vehicles; furthermore, use of voiceprints for authorization of commands. Further, doing so would have provided the users of Falkson, with the added benefits of meeting laws or other regulations prohibiting or limit the . 

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Falkson in view
of Cilingir et al. hereinafter Cilingir et al. (US Pub. No. 2018/0366124 A1).
Regarding claim 6, Falkson teaches the method of claim 5 (see claim 5 above), further comprising:
Falkson fails to explicitly disclose:
selecting the plurality of instances of the previous audio data based on the plurality of instances of the previous audio data satisfying one or more criteria.
In a related field of endeavor (e.g. training and using a speaker verification model, see abstract) Cilingir disclose the context determination and training merit evaluation circuit 302 that is configured to measure context data associated with collected TI speech utterances from a user in a given context, see para. 21. In some embodiments, the context data may include, for example, a location of the collection of the TI speech utterances, a date of the collection, properties of the microphone used for the collection, SNR, noise characteristics, reverberation characteristics, and health and emotional state of the user. Circuit 302 is also configured to estimate a training merit value associated with the TI speech utterances. The training merit value, which indicates the suitability of the speech utterances for model training purposes, is based on a speech quality analysis and a speaker state analysis, as will be described below. If the training merit value is not acceptable, as tested at operation 304, then the utterance is discarded for training purposes at operation 306, see para. 21.

selecting the plurality of instances of the previous audio data based on the plurality of instances of the previous audio data satisfying one or more criteria (e.g. Falkson’s method implemented by one or more processors, now modified by Cilingir to select the plurality of instances of the previous audio data based on it satisfying one or more criteria as seen in para. 21); and
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Cilinger to the method of Falkson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two
disclosures, for example training with previous audio data to improve the speaker verification model. Further, doing so would have provided the users of Falkson, with the added benefits of techniques described herein may allow for improved, TI speaker recognition compared to existing methods that require the user to dedicate an extended period of time for enrollment/training and which do not adequately account for context and environmental factors as recognized by Cilinger, see para. 15, i.e. more efficient use of the time where data is recorded according to these metrics to be able to train the model appropriately and not have the user spend an extended period of time training the model.




Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Falkson in view
of Cilingir and further in view Srinivasan et al. (US Pat. No. 11,004,454 B1) hereinafter Srin.
Regarding claim 7, Falkson in view of Cilinger teaches the method of claim 6 (see claim 6 above), additionally, using the same factual inquiries for establishing a background for determining obviousness as in claim 6, wherein the one or more criteria include one or multiple of: 
The modification of Falkson’s method implemented by one or more processors to include the features of Cilinger (as detailed above in the rejection of claim 6) further discloses:
a diversity criterion for the plurality of instances of the previous audio data (e.g. Falkson’s method implemented by one or more processors, now modified by Cilinger to include a diversity criterion for each of the plurality of instances of the previous audio data, see para. 24; The adaptation merit value may also be based on one or more of the following factors: an estimate of improvement of the TI speaker recognition model due to adaptation based on the additional TI speech utterances; an estimate of the quality of the user identity measurements 120; the increase in phoneme diversity that would result from the additional utterances; and the prediction accuracy of the current model on the new utterances (the lower the accuracy, the greater the difference between the new and old utterances which increases the importance of the new utterances i.e. diversity is a merit in other words criteria for selecting the previous audio data to be used).
Falkson in view of Cilinger fails to disclose:
a length criterion for each of the plurality of instances of the previous audio data 
In a related field of endeavor (e.g. speaker recognition and voice profiles, see abstract) Srin teaches The filtering component 1015 may filter out user recognition feature vectors that do not have a sufficient length. For example, the filtering component 1015 may identify one or more user recognition feature vectors (in the received user recognition feature vector data 1010/740) that have lengths that does not satisfy a minimum threshold length, and may remove those user recognition feature vectors from downstream processing. The lengths of the user recognition feature vectors and the minimum threshold length may be measured in time (e.g., seconds), amount of audio frames, or some other length measurement, see lines 50-60 on col. 22.
Modifying Falkson method implemented by one or more processors in view of Cilinger to include the features of Srin discloses:
a length criterion for each of the plurality of instances of the previous audio data (e.g. Falkson’s method implemented by one or more processors, now modified by Srin to include a length criterion for each of the plurality of instances of the previous audio data, see lines 50-60 on col. 22);
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Srin to the method of Falkson in view of Cilinger. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between .

Claims 14 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over
Falkson in view of Sharifi et al. hereinafter Sharifi (US Pat. No. 9,711,148).
Regarding claim 14, Falkson teaches the method of claim 13 (see claim 13 above): wherein determining whether the particular user spoke the spoke the spoken input using
both the TD user measure and the TI user measure comprises:
Falkson fails to explicitly disclose
determining a particular user probability measure which indicates the probability the particular user spoke the spoken input by combining the TD user measure and the TI user measure; and
determining whether the particular user spoke the spoken input by determining whether the particular user probability measure satisfies a threshold.
In a related field of endeavor (e.g. identifying a speaker based on dual models, see lines 31-39 on col. 2), Sharifi further discloses the speaker classifier 134 receives confidence data 
 Modifying Falkson method implemented by one or more processors to include the features of Sharifi discloses:

determining whether the particular user spoke the spoken input by determining whether the particular user probability measure satisfies a threshold (e.g. Falkson’s method implemented by one or more processors, now modified by Sharifi to include determining whether the particular user spoke the spoken input by determining if the particular user probability measure i.e. confidence and likelihood satisfies a given threshold, see Lines 9 – 15 on col. 10).
It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Sharifi to the method of Falkson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two
disclosures, for example using both text-dependent and text-independent models for speaker identification. Further, doing so would have provided the users of Falkson, with the added benefits of enhanced accuracy, in particular with the combination of two types of models and during the initial uses of the speaker identification system, see Lines 28 – 30 on col. 1 and Lines 45-48 on col. 6. 

Claim 18, is a method claim with similar subject matter as presented in claim 14 and is rejected under the same grounds as claim 14. 

Regarding claim 19, Falkson in view of Sharifi teaches the method of claim 18 (see claim 18 above), using the same factual inquiries for establishing a background for determining obviousness as in claims 14 and 18. 
Falkson fails to explicitly disclose:
wherein combining the TD user measure and the TI user measure comprises utilizing a first weight for the TD user measure in the combining and utilizing a second weight for the TI user measure in the combining.
In a related field of endeavor (e.g. identifying a speaker based on dual models, see lines 31-39 on col. 2), Sharifi further discloses the speaker classifier 134 may perform a weighted average of the confidence levels to obtain a final, combined, confidence level. For example, the confidence level associated with the text-dependent model may be weighted more heavily than the confidence level associated with the text-independent model (e.g., the confidence level of the text-dependent model could be multiplied by a weight of 0.75 while the confidence level of the text-independent model could be multiplied by weight of 0.25). Alternatively, the confidence level associated with the text-independent model may be weighted more heavily than the confidence level associated with the text-dependent model (e.g., the confidence level of the text-dependent model could be multiplied by a weight of 0.25 while the confidence level of the text-independent model could be multiplied by weight of 0.75), see Lines 27-42 on col. 9.

 wherein combining the TD user measure and the TI user measure comprises utilizing a first weight for the TD user measure in the combining and utilizing a second weight for the TI user measure in the combining (e.g. Falkson’s method implemented by one or more processors, now modified by Sharifi to include combining the TD and TI user measure where it comprises utilizing a first weight for the TD user measure and second weight for the TI user measure in the combination, see Lines 27-42 on col. 9).
Additionally, to the added benefits to Falkson, the weights are able to reflect higher accuracies in models depending on the situation/environment i.e. in the beginning stages of the speaker identification system, the text-dependent model may be more accurate than the text-independent due to the amount of samples present, see lines 43 – 61 on col. 9. 

Claims 15 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Falkson in
view of Koishida et al. hereinafter Koishida (US Pub. No. 2018/0233142).
Regarding claim 15, Falkson teaches the method of claim 11 (see claim 11 above), additionally, Falkson discloses:
wherein in response to determining that the speaker embedding was generated using the outdated version of the TI speaker recognition model further comprises:
determining that a first user profile and a second user profile are associated with the automated assistant request (Para. 193, After receiving the audio sample, the vehicle system may transmit, or attempt to transmit, the audio sample to a network connected server via an 
responsive to determining that a first user profile and the second user profile are associated with the automated assistant request (Para. 187, a user may approach a car and from ten yards away may speak “unlock the doors,” and the vehicle may respond by detecting the audio command, collecting an audio sample of the voice, sending the audio sample to a network connected server, processing the audio sample to identify and verify the speaker, receive a match score from the network, determine based on the results whether the doors should be unlocked, and/or unlock the doors. The vehicle may also greet the user by name and change a number of vehicle settings based on the identity of the speaker):
initiating generating of first responsive content that is customized for the first
user and that is in response to the spoken input (Para. 187, In terms with example of “unlock the car for mother”, processing the audio sample to identify and verify the speaker, receive a match score from the network, determine based on the results whether the doors should be unlocked, and/or unlock the doors. i.e. responsive in customization from the first user profile and the spoken input to unlock the doors);
initiating generating of second responsive content that is customized for the

prior to completion of generating the first responsive content and the second responsive content, processing at least the portion of audio data using the TI speaker recognition model to generate TI output (Para. 134, The voice biometric analysis may be one or more of text-dependent and/or text-independent… and a voiceprint is a model where para. 194 indicates the vehicle's onboard processor may begin processing the audio sample onboard the vehicle, performing a voice biometric analysis at step 2003 as to output analysis results i.e. TI output; furthermore, in order to determine how to generate the first and second responsive content it must process using the voice biometric analysis may be one or more of text-dependent and/or text-independent, where voiceprint created may be a spectrogram e.g. a measure of signal strength over time, A voice print may be used by a voice recognition software program, either onboard the vehicle, on a network connected server, or both, to one or more of identify, verify, or authenticate a user's voice as seen in para. 131-134);
determining whether the particular user is the first user, and whether the particular user spoke the spoken input by comparing the TI output with a speaker embedding of the first user (Para. 194, the onboard processor may compare the analysis results i.e. TI output with voiceprints associated with user profiles stored onboard the vehicle and generate an ‘onboard match score’ at step 2004 where para. 131 indicates, such users may be registered with a user profile comprising a voice print i.e. user profiles contain a voice print); and 

wherein in response to determining that the particular user spoke the spoken input further comprises transmitting, to the client device, the first responsive content without transmitting the second responsive content to the client device.
In a related field of endeavor (e.g. entity tracker and providing responsive content through interactions, see para. 2), Koishida further discloses in such an example, the device setting alternatively or additionally may authorize the first user to receive high-value information based on an organizational relationship between the first user and the high value information, and may authorize the other user to receive filtered content that does not include high-value information based on a different organizational relationship between the other user and the high-value information i.e. identification of the users is present and a response is demonstrated due to that identification. In such an example, the instructions alternatively or additionally may be further executable to receive a request from another user in the environment, identify content that the other user is authorized to receive from the intelligent assistant computer, and responsive to identifying the content, lower the blocking threshold. In such an example, the instructions alternatively or additionally may be executable to stop blocking subsequent responses to another user responsive to receiving a command from the first user instructing the intelligent assistant computer to respond to the other user, see para. 138. 
Modifying Falkson method implemented by one or more processors to include the features of Koishida discloses:

It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Koishida to the method of Falkson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two
disclosures, for example both generating responses according to identification of spoken inputs. Further, doing so would have provided the users of Falkson, with the added benefits of enabling natural user interface experiences, see para. 2; furthermore, where privacy settings are included as to determine how to output selected interpersonal content and with permissions indicated on users allowed to access interpersonal content as seen on para. 99 i.e. added benefits of security in relation to privacy. 

Regarding claim 23, A method implemented by one or more processors (Para. 229, the systems and methods of this disclosure can be implemented in conjunction with a special purpose computer, a programmed microprocessor, where para. 227-228 discusses changes, additions, and omissions to this sequence can occur without materially affecting the operation of the disclosed embodiments, configuration, and aspects. Furthermore, figures 20A, 21A, and 
receiving, from a client device and via a network, an automated assistant request that includes (Para. 218-219, At step 2202, the vehicle system may receive an audio sample… For example, a passenger in a backseat may make an oral command to an AI assistant. Such a request may be recorded as an audio sample… At step 2203, the audio sample may be sent via a network connection to a server):
audio data that captures spoken input, wherein the audio data is captured at
one or more microphones of the client device (Para. 218, At step 2202, the vehicle system may receive an audio sample. This may occur when a voice of a user of the vehicle, including a driver or a passenger, or a user of an application as discussed herein, or a caller calling into an entity associated with the vehicle manufacturer, is received via a microphone associated with the system);
determining that a first user profile and a second user profile are associated with the automated assistant request (Para. 193, After receiving the audio sample, the vehicle system may transmit, or attempt to transmit, the audio sample to a network connected server via an onboard communication system at step 2008. Such a network connected server may be enabled to perform a voice biometric analysis of the audio sample and compare the analysis with voiceprints for any users associated with the vehicle or other voiceprints registered with 
responsive to determining that the first user profile and the second user profile are associated with the automated assistant request (Para. 187, a user may approach a car and from ten yards away may speak “unlock the doors,” and the vehicle may respond by detecting the audio command, collecting an audio sample of the voice, sending the audio sample to a network connected server, processing the audio sample to identify and verify the speaker, receive a match score from the network, determine based on the results whether the doors should be unlocked, and/or unlock the doors. The vehicle may also greet the user by name and change a number of vehicle settings based on the identity of the speaker.:
initiating generating of first responsive content that is customized for the first
user and that is responsive to the spoken input (Para. 187, In terms with example of “unlock the car for mother”, processing the audio sample to identify and verify the speaker, receive a match score from the network, determine based on the results whether the doors should be unlocked, and/or unlock the doors. i.e. responsive in customization from the first user profile and the spoken input to unlock the doors);
initiating generating of second responsive content that is customized for a
second user and that is responsive to the spoken input (Para. 187, In terms with example of “unlock the car for mother”, unlocking the door for second profile i.e. mother and determining their permissions in regards to their user profile i.e. the driver seat may move to a correct position and the mirrors may move accordingly);

determining that the first user spoke the spoken input by comparing a first user speaker embedding corresponding to the first user profile and the TI output (Para. 194, the onboard processor may compare the analysis results with voiceprints associated with user profiles stored onboard the vehicle and generate an ‘onboard match score’ at step 2004 where para. 131 indicates, such users may be registered with a user profile comprising a voice print i.e. user profiles contain a voice print);
in response to determining the first user spoke the spoken input (Para. 221, If, on the other hand, the voiceprint was updated, the master database stored on the server may be updated at step 2207 reflecting the new voiceprint and a new voiceprint version ID may be generated. At this point, the server may optionally send an updated voiceprint to all vehicles 
Falkson fails to explicitly disclose:
transmitting, to the client device, the first responsive content without
transmitting the second responsive content to the client device.
In a related field of endeavor (e.g. entity tracker and providing responsive content through interactions, see para. 2), Koishida further discloses in such an example, the device setting alternatively or additionally may authorize the first user to receive high-value information based on an organizational relationship between the first user and the high value information, and may authorize the other user to receive filtered content that does not include high-value information based on a different organizational relationship between the other user and the high-value information. In such an example, the instructions alternatively or additionally may be further executable to receive a request from another user in the environment, identify content that the other user is authorized to receive from the intelligent assistant computer, and responsive to identifying the content, lower the blocking threshold. In such an example, the instructions alternatively or additionally may be executable to stop blocking subsequent responses to another user responsive to receiving a command from the first user instructing the intelligent assistant computer to respond to the other user, see para. 138. 
Modifying Falkson method implemented by one or more processors to include the features of Koishida discloses:

It would have been obvious to one of ordinary skill in the art at the time the invention
was filed to apply the teachings of Koishida to the method of Falkson. Doing so would have been predictable to one of ordinary skill in the art given the similar nature between the two
disclosures, for example both generating responses according to identification of spoken inputs. Further, doing so would have provided the users of Falkson, with the added benefits of enabling natural user interface experiences, see para. 2; furthermore, where privacy settings are included as to determine how to output selected interpersonal content and with permissions indicated on users allowed to access interpersonal content as seen on para. 99 i.e. added benefits of security in relation to privacy. 




Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s
disclosure.
Ge et al. hereinafter Ge (US Pub. No. 2018/0158463 A1) discloses a method for

Xie (US Pub. No. 2019/0341041 A1) discloses the method of an electronic apparatus
includes obtaining first voice data and second voice data. The method includes allowing a background speaker model that is generated based on voice signals of a plurality of speakers to be trained based on the second voice data. The method includes determining whether the first voice data is voice data corresponding to a first registered speaker associated with the electronic apparatus. The method includes allowing a first registered speaker model that is configured to recognize a voice of the first registered speaker to be trained based on the first voice data and the background speaker model, based on determining that the first voice data is voice data corresponding to the first registered speaker, see abstract. Specifically pre-processing the audio in such criteria for training the models such as de-noising, noise reduction, and the like, quality of the first voice data and the second voice data may be improved, and thus, the quality of speaker model training may also be improved, see para 57.

Any inquiry concerning this communication or earlier communications from the
examiner should be directed to JONATHAN E AMAYA HERNANDEZ whose telephone number is (571)272-2484. The examiner can normally be reached Monday - Thursday 7:30 am - 5:30 pm.

using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s
supervisor, Andy Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.E.A./
Examiner, Art Unit 2655                                                                                                                                                                                           
/ANDREW C FLANDERS/Supervisory Patent Examiner, Art Unit 2655