DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed 12/17/2020, applicant has submitted an amendment filed 3/17/2021.
Claim(s) 1-3, 6-10, 12-18, 20, has/have been amended.  Claim(s) 4, 5, 11, 19, has/have been cancelled.  
Response to Arguments
As per the 103 rejections and allowable subject matter:
Applicant did not incorporate the allowable subject matter of claims 5, 11, and 19 into the independent claims, because what was indicated to be allowable was the combination of all limitations in claims 5, 11, and 19, and their respective independent claims, where the independent claims, as currently amended, do not incorporate all of the limitations of the previous independent claims, and where the limitations added to the independent claims do not include all of the limitations of claims 5, 11, and 19.
Applicant incorporated where a response is provided to a user by a closest into the independent claims.  The Office Action mailed 12/17/2020 stated that this feature was taught by the prior art (see page 50, citing 2017/0083285, see also page 52, citing 2016/0329051).
Claim 5 (and similarly claims 11 and 19) further included: 
in one device also passively enrolls the user in a plurality of speaker ID devices in different locations (one of the plurality of speaker ID devices is the speaker ID device of claims 1, 7, and 13)
and
where a message intended for the user is proactively delivered by the closest one of the plurality of speaker ID devices (i.e. claims 5, 11, and 19 did not recite where a response to a user speech is provided by the closest device, claims 5, 11, and 19 recited where a message for the user is proactively delivered [i.e. without being triggered by any prompting or speech input from the user])
Therefore, since the current independent claims do not include all of the limitations of the claims that were previously indicated to be directed to allowable subject matter, the previous indication of allowable subject matter does not apply to the present independent claims, and new prior art rejections necessitated by amendment are presented below.

As per Claims 13-18 and 20:
Applicant deleted “including one or more non-transitory machine readable mediums encoded with” form the preamble of claim 13 (lines 1-2).  Claims 13-18 and 20 are now directed to a non-statutory computer-program/software per se.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it 

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-3, 6, 7-10, 12, 13-18, 20, are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

As per Claims 1, 7, and 13, the original Specification (the original Specification of Parent Application 15/457,738, hereafter original Specification) does not describe “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” in the sense that the first command phrase sample and the one or more previously detected command phrase samples themselves (i.e. not data derived from the first command phrase sample and the one or more previously detected command phrase samples) are used to identify the user as a speaker of a second user speech sample.
original Specification (paragraph numbers corresponding to the numbering for the Specification of Parent Application 15/457,738) describes the following:
“when one of the occupants, say speaker 140, utters a keyword/command phrase (or command or phrase or speech sample) 150 (such as "Alexa, what is the square root of 173?"), the device 120 learns to recognize and identify speaker 140 using the command 150 and other such commands, phrases, or speech samples” (paragraph 21, which does not describe that multiple commands/phrases/samples are used to identify the speaker of any one of the samples, as opposed to analyzing each of the commands/phrases/samples to determine their respective speaker)
“In some embodiments, the user's command phrase samples are saved for possible future use (e.g., re-enrolling the user with a larger set of command phrase samples). At this point, further speech samples by the user will be identified by the text-independent speaker identification circuit as coming from the user and without need to actively enroll the user” (paragraph 36, which describes where speech samples are saved for future enrollment purposes, not for use in the identifying of a speaker of any one sample)
“Example 5 includes the subject matter of any of Examples 1 through 4, further including identifying, by the text-independent speaker ID circuit, the user as the speaker of a second speech sample spoken by the user after the enrolling of the user in the text-independent speaker ID circuit” (paragraph 67, which describes identifying the user as a speaker of a second speech sample but does not describe that multiple command phrase samples are used in the identifying of the user of the singular second speech sample)

“The learning includes an enrollment (or training) period during which a person is added to a set or database of known speakers by analyzing specific samples of speech known to have come from that person. During the enrollment (which may be, for example, one or more minutes, such as two or three minutes, of normal speech from the person), characteristics (e.g., qualities, signatures, distinctive features) are identified and saved by the system (for example, in a database on a nonvolatile storage system, such as a disk drive or a solid-state drive) for later retrieval and comparison with new samples of speech that may come from the person” (paragraph 7, which indicates that samples are used, during enrollment, to derive characteristics that are used in identifying a speaker of new samples of speech, not where the samples themselves are used to identify the speaker)
Therefore, while the original Specification has written description for “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” in claims 1, 7, and 13 in the sense that the identifying of the user as a speaker of a second user speech sample is based on information that is based on the first command phrase sample and the one or more previously detected command phrase samples (i.e. a speaker of a sample is identified based on enrollment data that is based on the command phrase samples), the original Specification does not support have written description for “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” in claims 1, 7, and 13 in the sense that the samples themselves are used in identifying the speaker of the second user speech sample, and therefore in this sense (that the samples themselves are used in identifying the speaker of the second user speech sample), claims 1, 7, and 13 include new matter.

Claim 2 recites “determining that the command phrase sample threshold is satisfied if the first command phrase sample includes at least one minute of speech and the one or more of the previously detected command phrase samples includes at least one minute of speech” where “the first command phrase sample includes at least one minute of speech” and “determining that the command phrase sample threshold is satisfied if the first command phrase sample includes at least one minute of speech and the one or more of the previously detected command phrase samples includes at least one minute of speech” is not described in the original Specification (the original Specification of Parent Application 15/457,738, hereafter original Specification)
As per “the first command phrase sample includes at least one minute of speech”, the original Specification describes where a user speech sample includes a wake/keyword phrase sample and a command phrase sample, where it is unusual for a user to speak for a minute in order to issue an instruction to a system (usually it takes at most a few seconds, and Applicant’s Specification also describes that a command phrase takes about 3 seconds [see paragraph 19 of this Application’s Specification]).
Additionally, as per “determining that the command phrase sample threshold is satisfied if the first command phrase sample includes at least one minute of speech and the one or more of the previously detected command phrase samples includes at least one minute of speech”, the original Specification appears to describe sufficient sampling collectively add up to one minute or more of speech, and not where sufficient sampling is determined when both the first command phrase sample and the set of previous command phrase samples are each one minute or more of speech (i.e. the original Specification appears to only describe where sufficient command phrase sampling is based on one condition, where the first command phrase sample and the previous command phrase samples collectively add up to one minute or more of speech, not based on two conditions, where one condition is where the first command phrase sample is one minute or more of speech, and another condition where the previously detected command phrase samples are one minute or more of speech)

Claims 8 and 14 includes the same issue as claim 2.

The dependent claims include the issues of their respective parent claims.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-3, 6, 7-10, 12, 13-18, 20, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or 

As per Claims 1, 7, 13, as discussed in the 112(a) rejections, above, the original Specification has written description for “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” in claims 1, 7, and 13 in the sense that the identifying of the user as a speaker of a second user speech sample is based on information that is based on the first command phrase sample and the one or more previously detected command phrase samples (i.e. a speaker of a sample is identified based on enrollment data that is based on the command phrase samples), the original Specification does not support have written description for “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” in claims 1, 7, and 13 in the sense that the samples themselves are used in identifying the speaker of the second user speech sample, and therefore in this sense (that the samples themselves are used in identifying the speaker of the second user speech sample), claims 1, 7, and 13 include new matter.
As a result, it is not clear if “identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” is supposed to be interpreted as where identifying the user as a speaker of a second user speech sample is based on the command phrase samples themselves (which is not supported) or where identifying the in some indirect way on the command phrase samples (which is supported).

Due to the written description issue in claims 2, 8, and 14, it is also unclear if Applicant meant to claim where “the command phrase sample threshold is satisfied if the first command phrase sample includes at least one minute of speech and the one or more previously detected command phrase samples include at least one minute of speech” (i.e. where there are two conditions where both of the first command phrase sample and the one or more previously detected command phrase samples must each be one minute or more of speech).

Claim 12 recites “the command phrase sample threshold to enroll a second user in the text-independent speaker identification program” (first/”determine if the…” limitation of the “text-independent enrollment circuit”) which lacks antecedent basis as a complete phrase.  Claim 7 recites “a command phrase sample threshold to enroll a user in a text-independent speaker identification program”.  No part of claims 7 and 12 preceding “the command phrase sample threshold to enroll a second user in the text-independent speaker identification program” establishes that any command phrase sample threshold is for enrolling the second user.

Claim 20 includes the same issue as claim 12.

The dependent claims include the issues of their respective parent claims.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 13-18 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter.

Claims 13-18 and 20 are directed to non-statutory software per se.  Claim 13 was amended to recite “A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least:…” (“including non-transitory machine readable mediums” was deleted) and thus requires no more than software instructions to meet claim 13’s limitations.  Claims 14-18 and 20 similarly further define the instructions without introducing any hardware elements into the claims and are therefore also directed to non-statutory software per se.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foerster et al. (US 2016/0104483), hereafter Foerster, in view of Kim et al. (US 2016/0077794) hereafter Kim, Sharifi et al. (US 9,711,148), hereafter Sharifi, Johnson et al. (US 9,548,979), hereafter Johnson, and Alvarez Guevara (US 2017/0025125), hereafter A-G.

As per Claim 1, Foerster suggests a method comprising:… identifying the user as a speaker of a second user speech sample…, the second user speech sample obtained by a first speaker identification device or a second speaker identification device, determining a first… and a second… and causing a closest one of the first speaker identification device or the second speaker identification device to… (paragraphs 3, 5, 19, 22-28, 39, 40, 42, 45, 46, 48, 53; [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified below, i.e. all other paragraphs and Figures not specifically referenced for any particular limitation are eligible to provide context and additional support]
“a method comprising:… identifying the user as a speaker of a second user speech sample…, the second user speech sample obtained by a first speaker identification device or a second speaker identification device”: Figures 1, 3; Paragraphs 3, 5, 19, 39, 40, 42, 45, 46, 48; one of the multiple computing devices can be interpreted as “a first speaker identification device” that performs speaker identification on audio data received by the microphone of the “first speaker identification device”, where the audio data is suggested to be a “sample” of speech spoken by the user [see e.g. “OK 
“determining a first… and a second… and causing a closest one of the first speaker identification device or the second speaker identification device to…”: Paragraphs 3, 5, 22-28: The “first speaker identification device”/first-computing-device determines a “first” loudness score, and “the second speaker identification device”/second-computing-device determines a “second” loudness score, and, based on the loudness score of the closest “first speaker identification device” being the highest, “the first speaker identification device” processes the audio data “second user speech sample”, where the “second user speech sample” is a [HOTWORD] [QUERY] input including a question)
Foerster does not, but Kim suggests a method comprising:… identifying the user as a speaker of a second user speech sample…, the second user speech sample obtained by a first speaker identification device or a second speaker identification device, determining a first proximity of the user to the first speaker identification device and a second proximity of the user the second speaker identification device; comparing the first proximity and the second proximity; and causing a closest one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Paragraphs 20-21, 39, 43, 65;
Paragraph 43 describes where a user can utter a trigger phrase and a command or question in sequence without waiting for a confirmation [e.g. “Hey Siri, what’s the weather today?”, similar to the [HOTWORD][QUERY] utterance in Foerster] and then a virtual assistant responds to the question [where the response is at least suggested to be an accurate answer to the question].  Paragraphs 20-21 more specifically describes an example where a response to a question is a natural language answer to the user.
Paragraph 65 describes determining relative user proximity to various devices to ensure that the nearest device is most likely to trigger, and distances between users and multiple devices can be used to determine which device should trigger upon detection of a trigger phrase [where determining “which device should trigger” at least suggests that one of a plurality of devices should trigger/react-to-the-speech-input while other devices do not] and where two devices can compare determined distances to determine that a user may be nearer to a device.  Paragraph 42 describes examples of triggers [e.g. hey Siri] which are comparable to [HOTWORD]s in Foerster.
Kim thus suggests where the processing of the [HOTWORD][QUERY] “second speech sample” [where the QUERY is a question] by the closest/”first speaker identification device” in Foerster leads to a response to the question in the “second 
Kim also suggests where, instead of using loudness scores, Foerster’s computing devices determine respective distances to the speaker/user and then compare the distances in order to determine that the “first speaker identification device” is the closest, thereby determining that the “first speaker identification device” and not the other “speaker identification devices” should process the “second user speech sample” audio data and output a response the question in the “second user speech sample”)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speech question processing with another because the prior art teaches the claimed invention except for the substitution of spoken question processing which does not necessarily provide, to a user, a response to a spoken question with spoken question processing which does.  Kim teaches that spoken question processing which provides, to a user, a response to a spoken question was known in the art.  One of ordinary skill in the art could have substituted one type of speech question processing with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of closest device determination with another because the prior art teaches the claimed invention except for the substitution of closest device determination which does not necessarily include computing devices comparing determined proximities between the computing devices and a user with closest device determination which does.  Kim teaches that closest device determination which includes computing devices comparing determined proximities between the computing devices and a user was known in the art.  One of ordinary skill in the art could have substituted one type of closest device determination with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing a response to the [QUESTION] to the user (as per Kim) where the closest computing device is determined by the computing devices comparing determined distances/proximities between each device and the user (as per Kim)
Foerster, in view of Kim, do not, but Sharifi suggests a method comprising: parsing a first user speech sample associated with a user into a first… phrase sample and a first command phrase sample;… in a speaker identification program; identifying the user as a speaker of a second user speech sample based on the first command phrase sample…, the second user speech sample obtained by a first speaker identification device or a second speaker identification device, determining a first proximity of the user to the first speaker identification device and a second proximity of the user the second speaker identification device; comparing the first proximity and the second proximity; and causing a closest one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
The cited portions of Sharifi suggest using a multi-word query/command [QUERY] portion “parsed” from a [KEYWORD][QUERY] utterance to train a text-independent speaker identification model for a registered user [see particularly col. 10, line 63 – col. 11, line 10] which suggests that future text-independent speaker identifications of future [KEYWORD] [QUERY] utterances are performed using the “updated”/additionally-trained text-independent model.
Sharifi thus suggests where the speaker identification performed on audio data in the computing devices in Foerster [where the audio data is suggested by Foerster to be [HOTWORD][QUERY] utterances] use text-independent models that are trained based on previous command/query [QUERY] portions “parsed” from [HOTWORD][QUERY] utterances, and thus suggests “parsing a first user speech sample associated with a user into a first… phrase sample and a first command phrase sample;… in a speaker identification program; identifying the user as a speaker of a second user speech sample based on the first command phrase sample…,” [i.e. parsing a previous [HOTWORD][QUERY] utterance spoken by a user into a multi-word [HOTWORD] portion and a multi-word command [QUERY] portion so that each portion can be processed with corresponding process[es], training the user’s text-independent speaker identification model in the “first speaker identification device’s” “speaker identification program” using the multi-word command [QUERY] portion of the previous [HOTWORD][QUERY] utterance and then identifying the user as the speaker of the “current” [HOTWORD][QUERY] utterance “second user speech sample” based on the text-independent speaker identification model which is based on the previous utterance’s multi-word command [QUERY] portion, i.e. which is “based on the first command phrase sample”]
For the “parsing a first user speech sample associated with a user into a first wake phrase sample and a first command phrase sample” limitation in particular:
In Sharifi, Col. 7, lines 3-14 describe identifying a keyword portion of the audio signal 122.  Col. 7, lines 15-27 and col. 8, lines 16-40 describe where the text-dependent analyzer analyzes a keyword portion of the audio signal 122 and where the text-independent analyzer module analyzes “only the portion of the audio signal 122 subsequent to the portion of the audio signal that corresponds to the keyword” and in other embodiments “may analyze the entire audio signal 122”.  Col. 10, line 63 – col. 11, line 10 describes where additional training of the text-independent model can be performed using “the remainder of the audio signal 122” [at least suggested to be the 
These portions at least suggest where the speaker recognition engine distinguishes the two [KEYWORD] and [QUERY] components from each other and has the two components as a keyword portion and a remaining/subsequent/query portion such that the relevant analyses/training can be performed with the appropriate portion[s].  These portions thus suggest “parsing a first user speech sample associated with a user into a first… phrase sample and a first command phrase sample” [receiving a multi-word keyword and multi-word command utterance from the registered user that spoke the registration phrase, identifying the keyword, and determining a multi-word keyword portion and a multi-word command portion])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker identification function with another because the prior art teaches the claimed invention except for the substitution of a speaker identification function which does not include training a text-independent speaker identification model based on a multi-word command portion parsed from an utterance including a multi-word hotword/keyword and a multi-word command/query and which does not include performing, based on the trained text-independent speaker identification model, speaker identification on a multi-word-hotword/keyword-and-multi-word-command/query utterance with a speaker 
a method comprising: parsing a first user speech sample associated with a user into a first… phrase sample and a first command phrase sample; determining if a command phrase sample threshold is satisfied based on the first command phrase sample and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enrolling the user in a speaker identification program; identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples, the second user speech sample obtained by a first speaker identification device or a second speaker identification device, determining a first proximity of the user to the first speaker identification device and a second proximity of the user the second speaker identification device; comparing the first proximity and the second proximity; and causing a closest one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (col. 2, lines 45-57; col. 3, line 52 - col. 4, line 20; col. 12, lines 17-48; Col. 5, lines 26-42; Col. 17, lines 46-61;
Johnson describes, in Col. 3, line 52 – col. 4, line 20, where a user’s spoken multi-word command can be used to create or enhance an existing voice print or voice profile of the user’s voice [similar to how Sharifi trains a text-independent model based on the query part of an utterance whose speaker is identified, where the query is suggested to be a multi-word command in some embodiments] and where a user may 
Johnson thus suggests where, instead of optionally training the identified user’s text-independent model based on only the multi-word command portion of a “current” utterance [the most recent utterance that can be used to train the text-independent model, not the “current” utterance that is analyzed using the trained text-independent model] in response to identifying the “current” utterance’s speaker [i.e. the identified user of the most recent utterance that can be used to train the text-independent model], the training of the text-independent speaker identification model [described in Sharifi] collects utterances of the same multi-word command over time [as the identified user uses the system] and determines if the “current” utterance’s multi-word command portion and previous utterances of the same multi-word command spoken by the same user are collectively of sufficient quantity or quality, and passively enrolls a text-
Johnson thus suggests “determining if a command phrase sample threshold is satisfied based on the first command phrase sample and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enrolling the user in a speaker identification program; identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” [where the “first speaker identification device” computing device, prior to performing speaker identification on the “second user speech sample” using a text-independent speaker identification model, determines if the “first command phrase sample” which is a multi-word command [QUERY] portion parsed from a previous [HOTWORD] [QUERY] utterance and other previous utterances of the same multi-word command [QUERY] spoken by the same user are of sufficient quality and quantity, and passively enrolls a text-independent model that is generated based on those utterances of the multi-word command [QUERY] when the utterances of the multi-word command are of sufficient quantity and quality, where the speaker identification performed on the “second user speech sample” uses the passively-enrolled text-independent model that is based on the “first command phrase sample” and based on “the one or more previously detected command phrase samples”]
Applicant’s Specification [paragraph 37] describes the concept of saving command phrase samples “for possible future use [e.g. re-enrolling the user with a 
Therefore, it would have been obvious to one of ordinary skill in the art at the effective filing to perform a simple substitution of one type of spoken-command-based speaker identification update with another because the prior art teaches the claimed invention except for the substitution of spoken-command-based speaker identification update which does not enroll speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity with spoken-command-based speaker identification update which does.  Johnson teaches that spoken-command-based speaker identification update which enrolls speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity was known in the art.  One of ordinary skill in the art could have substituted one type of spoken-command-based speaker identification update with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing a response to the [QUESTION] to the user (as per Kim) where the closest computing device is 
Foerster, in view of Kim, Sharifi, and Johnson, do not, but A-G suggests a method comprising: parsing a first user speech sample associated with a user into a first wake phrase sample and a first command phrase sample (paragraphs 3-4;
A-G [in paragraphs 3-4] describes where an utterance can include both a command [e.g. “DRIVE HOME”] and a hotword [OK COMPUTER], where a hotword “wakes a device up from a sleep state” [and is thus a “wake phrase”].
A-G suggests where the [HOTWORD/KEYWORD] in the previous [HOTWORD][QUERY] utterance including a multi-word [HOTWORD/KEYWORD] and a 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speech phrase with another because the prior art teaches the claimed invention except for the substitution of a speech phrase which is not necessarily a wake phrase with a speech phrase which is.  A-G teaches that speech phrases that are wake phrases were known in the art.  One of ordinary skill in the art could have substituted one type of speech phrase with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing a response to the [QUESTION] to the user (as per Kim) where the closest computing device is determined by the computing devices comparing determined distances/proximities between each device and the user (as per Kim) where the speaker identification function includes a training function that trains a text-independent speaker identification model based on a multi-word [COMMAND] portion parsed from an utterance including a multi-word [HOTWORD/KEYWORD] and a multi-word [COMMAND], and where the speaker identification function performs speaker identification on the [HOTWORD] [QUESTION] utterance based on the trained text-independent speaker identification model (as per Sharifi) where the training of the text-

As per Claim 6, Sharifi suggests wherein the user is a first user and further including parsing a third user speech sample associated with a second user into a second wake phrase sample and a second command phrase sample, determining if the command phrase sample threshold is satisfied based on the second command phrase sample and one or more previously detected command phrase samples associated with the second user, and in response to determining that the command phrase sample threshold is satisfied, enrolling the second user in the speaker identification program (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Foerster, Kim, Sharifi, Johnson, and A-G suggest the limitations of claim 1 for the reasons discussed in the rejection of claim 1.
Claim 6 is directed to steps of claim 1 being performed for a second person.
Sharifi, in col. 5, lines 5-28 describes where any of multiple different users may issue a query or command, and col. 7, line 64 – col. 8, line 15 describes where 
Therefore, Sharifi suggests where the same steps of claim 1 suggested by Foerster, Kim, Sharifi, Johnson, and A-G are performed for a different/”second” user)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker identification processing with another because the prior art teaches the claimed invention except for the substitution of speaker identification processing which is not necessarily performed for each of a plurality of people with speaker identification processing which is.  Sharifi suggests that speaker identification processing which is performed for each of a plurality of people was known in the art.  One of ordinary skill in the art could have substituted one type of speaker identification processing with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing a response to the [QUESTION] to the user (as per Kim) where the closest computing device is determined by the computing devices comparing determined distances/proximities between each device and the user (as per Kim) where the speaker identification function includes a training function that trains a text-independent .

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foerster, in view of Kim, Sharifi, Johnson, and A-G, as applied to claim 1 above, and further in view of Rockenbeck et al. (US 2002/0116190), hereafter Rockenbeck.

As per Claim 3, Foerster, in view of Kim, Sharifi, Johnson and A-G do not, but Rockenbeck suggests further including determining that the command phrase sample threshold is satisfied in response to the first command phrase sample and the previously detected command phrase samples including at least twenty command phrase samples (“model adapter 326 determines if there has been enough speech to warrant adapting the acoustic model. In one embodiment, five minutes of 
Rockenbeck, like Sharifi, describes adapting/training a model, and, like Johnson, describes where updating is performed if there has been enough speech has been collected.
Rockenbeck more specifically describes where “enough [speech] to warrant performing another adaptation of [a] model” is “five minutes of speech”.
Rockenbeck thus suggests where [in the combination discussed in the rejection of claim 1] sufficient quantity or quality of utterances of the same multi-word command spoken by the same user are 5 minutes of speech of the same multi-word command.
5 minutes is “at least one minute of speech”.
Given the amount of time needed to utter a multi-word keyword phrase and a multi-word command [a few seconds, see e.g. Johnson where the command example is “Check account balance”], 5 minutes of the same multi-word command is also suggested to be more than 20 samples [since 20 samples in 5 minutes would require each sample to be 15 seconds each, which is a long time to utter a single query].  
Rockenbeck thus suggests “determining that the command phrase sample threshold is satisfied in response to the first command phrase sample and the previously detected command phrase samples including at least twenty command phrase samples” [where the “first speaker identification device” computing device, prior to performing speaker identification on the “second user speech sample” using a text-independent speaker identification model, determines if the “first command phrase 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one criterion for sufficient samples with another because the prior art teaches the claimed invention except for the substitution of a criterion for sufficient samples which is not necessarily at least 20-samples/one-minute with a criterion for sufficient samples which is.  Rockenbeck suggests that a criterion for sufficient samples which is at least 20-samples/one-minute was known in the art.  One of ordinary skill in the art could have substituted one criterion for sufficient samples with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing a response to the [QUESTION] to the user (as per Kim) where the closest computing device is determined by the computing .

Claims 7, 10, 12, 13, 16, 17, 18, 20, is/are rejected under 35 U.S.C. 103 as being unpatentable over Foerster et al. (US 2016/0104483), hereafter Foerster, in view of Kim et al. (US 2016/0077794) hereafter Kim, Sharifi et al. (US 9,711,148), hereafter Sharifi, and Johnson et al. (US 9,548,979), hereafter Johnson.

As per Claim 7, Foerster suggests A speaker identification device comprising:… and a… speaker identification circuit to: identify the user as a speaker of a second user speech sample… ; determine a first… and a second…; and instruct a closest one of the… or the second speaker identification device to… (paragraphs 3, 5, 19, 22-28, 39, 40, 42, 45, 46, 48, 53; [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified below, i.e. all other paragraphs and Figures not specifically referenced for any particular limitation are eligible to provide context and additional support]
“A speaker identification device comprising:… and a… speaker identification circuit to: identify the user as a speaker of a second user speech sample…”: Figures 1, 3; Paragraphs 3, 5, 19, 39, 40, 42, 45, 46, 48, 53; one of the multiple computing devices can be interpreted as “a first speaker identification device” that performs speaker identification on audio data received by the microphone of the “first speaker identification device”, where the audio data is suggested to be a “sample” of speech spoken by the user [see e.g. “OK computer” by the user holding the computing device 106 in his/her hand] where any other one of the multiple computing devices which also performs speaker identification can be interpreted as “a second speaker identification device”, where the “second user speech sample” is a [HOTWORD] [QUERY] input including a question.  Paragraph 53 describes a hardware and circuitry implementations which suggests where each of the functions in the computing device are implemented using respective circuits, where a circuit implementing the speaker identification function and other circuits that perform other functions can collectively be interpreted as “a… speaker identification circuit” 
“determine a first… and a second…; and instruct a closest one of the… or the second speaker identification device to…”: Paragraphs 3, 5, 22-28, 53: The “first 
Foerster does not, but Kim suggests A speaker identification device comprising:…a delivery circuit; and a… speaker identification circuit to: identify the user as a speaker of a second user speech sample…; determine a first proximity of the user to the speaker identification device and a second proximity of the user to a second speaker identification device; perform a comparison of the first proximity and the second proximity; and instruct a closest one of the delivery circuit or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Paragraphs 20-21, 39, 43, 65;
Paragraph 43 describes where a user can utter a trigger phrase and a command or question in sequence without waiting for a confirmation [e.g. “Hey Siri, what’s the weather today?”, similar to the [HOTWORD][QUERY] utterance in Foerster] and then a 
Paragraph 65 describes determining relative user proximity to various devices to ensure that the nearest device is most likely to trigger, and distances between users and multiple devices can be used to determine which device should trigger upon detection of a trigger phrase [where determining “which device should trigger” at least suggests that one of a plurality of devices should trigger/react-to-the-speech-input while other devices do not] and where two devices can compare determined distances to determine that a user may be nearer to a device.  Paragraph 42 describes examples of triggers [e.g. hey Siri] which are comparable to [HOTWORD]s in Foerster.
Kim thus suggests where the processing of the [HOTWORD][QUERY] “second speech sample” [where the QUERY is a question] by the closest/”first speaker identification device” in Foerster leads to a response to the question in the “second speech sample” being output by a “delivery circuit” of the closest computing device “first speaker identification device” that processes the “second speech sample”
Kim also suggests where, instead of using loudness scores, Foerster’s computing devices determine respective distances to the speaker/user and then compare the distances in order to determine that the “first speaker identification device” is the closest, thereby determining that the “first speaker identification device” and not 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speech question processing with another because the prior art teaches the claimed invention except for the substitution of spoken question processing which does not necessarily provide, to a user via a delivery circuit, a response to a spoken question with spoken question processing which does.  Kim teaches that spoken question processing which provides, to a user via a delivery circuit, a response to a spoken question was known in the art.  One of ordinary skill in the art could have substituted one type of speech question processing with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing, via a delivery circuit, a response to the [QUESTION] to the user (as per Kim)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of closest device determination with another because the prior art teaches the claimed invention except for the substitution of closest device determination which does not necessarily include computing devices comparing determined proximities between the computing devices 
Foerster, in view of Kim, do not, but Sharifi suggests A speaker identification device comprising: a text-independent… circuit to:… in the text-independent speaker identification program, a delivery circuit; and a text-independent speaker identification circuit to: identify the user as a speaker of a second user speech sample based on the first command phrase sample…; determine a first proximity of the user to the speaker identification device and a second proximity of the user to a second speaker identification device; perform a comparison of the first proximity and the second proximity; and instruct a closest one of the delivery circuit or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
The cited portions of Sharifi suggest using a multi-word query/command [QUERY] portion of a [KEYWORD][QUERY] utterance to train a text-independent speaker identification model for a registered user [see particularly col. 10, line 63 – col. 11, line 10] which suggests that future text-independent speaker identifications of future [KEYWORD] [QUERY] utterances are performed using the “updated”/additionally-trained text-independent model.  Col. 13, lines 5-31 further more specifically describe hardware and circuitry implementations [suggesting a “text-independent… circuit” to perform the training of the text-independent speaker identification model]
Sharifi thus suggests where the speaker identification performed on audio data in the computing devices in Foerster [where the audio data is suggested by Foerster to be [HOTWORD][QUERY] utterances] use text-independent models that are trained based on previous command/query [QUERY] portions of [HOTWORD][QUERY] utterances, where the training of the text-independent models is performed by a “text-independent” model training “circuit” and thus suggests “a text-independent… circuit to:… in the text-independent speaker identification program,… and a text-independent speaker identification circuit to: identify the user as a speaker of a second user speech sample based on the first command phrase sample…” [i.e. training, using a text-independent speaker identification model training circuit, the user’s text-independent speaker identification model used in the “first speaker identification device’s” “speaker 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker identification function with another because the prior art teaches the claimed invention except for the substitution of a speaker identification function which does not include training, using a text-independent model training circuit, a text-independent speaker identification model based on a multi-word command portion of an utterance including a multi-word hotword/keyword and a multi-word command/query and which does not include performing, based on the trained text-independent speaker identification model, speaker identification on a multi-word-hotword/keyword-and-multi-word-command/query utterance with a speaker identification function which does.  Sharifi suggests that a speaker identification function which includes training, using a text-independent model training circuit, a text-independent speaker identification model based on a multi-word command portion of an utterance including a multi-word hotword/keyword and a multi-word command/query and which includes performing, based on the trained text-independent speaker identification model, speaker identification on a multi-word-hotword/keyword-and-multi-word-command/query utterance was known in the art.  One 
Foerster, in view of Kim and Sharifi, do not, but Johnson suggests A speaker identification device comprising: a text-independent enrollment circuit to: determine if a command phrase sample threshold to enroll a user in a text- independent speaker identification program is satisfied based on a first command phrase sample in a first user speech sample associated with the user and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enroll the user in the text-independent speaker identification program, a delivery circuit; and a text-independent speaker identification circuit to: identify the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples; determine a first proximity of the user to the speaker identification device and a second proximity of the user to a second speaker identification device; perform a comparison of the first proximity and the second proximity; and instruct a closest one of the delivery circuit or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (col. 2, lines 45-57; col. 3, line 52 - col. 4, line 20; col. 12, lines 17-48; Col. 5, lines 26-42; Col. 17, lines 46-61;
Johnson describes, in Col. 3, line 52 – col. 4, line 20, where a user’s spoken multi-word command can be used to create or enhance an existing voice print or voice profile of the user’s voice [similar to how Sharifi trains a text-independent model based on the query part of an utterance whose speaker is identified, where the query is suggested to be a multi-word command in some embodiments] and where a user may be “automatically enrolled or registered in the voice biometric authentication program after a certain number and quality of samples are collected and used to generate the characteristics, voice profile, and/or voice print. In some embodiments the number and quality of samples must meet or exceed an enrollment threshold”.  Col. 2, lines 45-57 describes passively enrolling a user in the authentication program based on samples that are collected and analyzed, where passive enrollment is where the user is enrolled 
Johnson thus suggests where, instead of optionally training the identified user’s text-independent model based on only the multi-word command portion of a “current” utterance [the most recent utterance that can be used to train the text-independent model, not the “current” utterance that is analyzed using the trained text-independent model] in response to identifying the “current” utterance’s speaker [i.e. the identified user of the most recent utterance that can be used to train the text-independent model], the training of the text-independent speaker identification model [described in Sharifi] collects utterances of the same multi-word command over time [as the identified user uses the system] and determines if the “current” utterance’s multi-word command portion and previous utterances of the same multi-word command spoken by the same user are collectively of sufficient quantity or quality, and passively enrolls a text-independent model that is generated based on those utterances of the multi-word command when the utterances of the multi-word command are of sufficient quantity and quality.
Johnson thus suggests “a text-independent enrollment circuit to: determine if a command phrase sample threshold to enroll a user in a text- independent speaker identification program is satisfied based on a first command phrase sample in a first user speech sample associated with the user and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enroll the user in the text-independent speaker identification program,… and a text-independent speaker identification circuit to: identify the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples” [where the “first speaker identification device” computing device, prior to performing speaker identification on the “second user speech sample” using a text-independent speaker identification model, determines, using the text-independent model training circuit, if the “first command phrase sample” which is a multi-word command [QUERY] portion of a previous [HOTWORD] [QUERY] utterance and other previous utterances of the same multi-word command [QUERY] spoken by the same user are of sufficient quality and quantity, and passively enrolls, using the text-independent model training circuit, a text-independent model that is generated based on those utterances of the multi-word command [QUERY] when the utterances of the multi-word command are of sufficient quantity and quality, where the speaker identification performed, by the text-independent speaker identification circuit discussed above in the portions of this rejection of claim 7 based on Foerster and Sharifi, on the “second user speech sample” uses the passively-enrolled text-independent model that is based on the “first command phrase sample” and based on “the one or more previously detected command phrase samples”]

Therefore, it would have been obvious to one of ordinary skill in the art at the effective filing to perform a simple substitution of one type of spoken-command-based speaker identification update with another because the prior art teaches the claimed invention except for the substitution of spoken-command-based speaker identification update which does not enroll speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity with spoken-command-based speaker identification update which does.  Johnson teaches that spoken-command-based speaker identification update which enrolls speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity was known in the art.  One of ordinary skill in the art could have substituted one type of spoken-command-based speaker identification update with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the 

As per Claim 10, Foerster, in view of Kim, do not, but Sharifi suggests a text-dependent enrollment circuit to enroll the user in a text-dependent speaker identification program using one or more samples of a keyword phrase from the user (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;

Sharifi thus suggests where the “first speaker identification device” further includes “a text-dependent enrollment circuit to enroll the user in a text-dependent speaker identification program using one or more samples of a keyword phrase from the user” [the “first speaker identification device” further includes a hardware circuit that uses a multi-word keyword portion of the user’s registration phrase to generate the user’s text-dependent model and that registers/”enrolls” the user’s text-dependent model into “a text-dependent speaker identification program” of the “first speaker identification device”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the 
	
As per Claim 12, Foerster, in view of Kim, do not, but Sharifi suggests wherein the user is a first user and the text-independent enrollment circuit is to: determine if the command phrase sample threshold to enroll a second user in the text-independent speaker identification program is satisfied based on a second command phrase sample associated with the second user and one or more previously detected command phrase samples associated with the second user, and in response to determining that the command phrase sample threshold is satisfied, enroll the second user in the text-independent speaker identification program (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Foerster, Kim, Sharifi, and Johnson, suggest the limitations of claim 7 for the reasons discussed in the rejection of claim 7.
Claim 12 is directed to functions of claim 7 being performed for a second person.

Therefore, Sharifi suggests where the same functions of claim 7 suggested by Foerster, Kim, Sharifi, Johnson, are performed for a different/”second” user)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker identification processing with another because the prior art teaches the claimed invention except for the substitution of speaker identification processing which is not necessarily performed for each of a plurality of people with speaker identification processing which is.  Sharifi suggests that speaker identification processing which is performed for each of a plurality of people was known in the art.  One of ordinary skill in the art could have substituted one type of speaker identification processing with another to obtain the predictable results of a system including a plurality of computing devices, where each computing device receives a [HOTWORD] [QUESTION] utterance from a user, where each computing device performs, using a speaker identification function, speaker identification on the [HOTWORD] [QUESTION] utterance, and causes the closest computing device to process the [HOTWORD] [QUESTION] utterance (as per Foerster) where the processing of the [HOTWORD] [QUESTION] utterance includes providing, via a delivery circuit, a response to the [QUESTION] to the user (as per Kim) where the closest computing device is determined by the computing devices comparing 

As per Claim 13, Foerster suggests A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least:… identify the user as a speaker of a second user speech sample…, the second user speech sample obtained by one of a first speaker identification device or a second speaker identification device; determine a first… and a second… and cause a closest one of the one of the first speaker identification device or the second speaker identification device to… (paragraphs 3, 5, 19, 22-28, 39, 40, 42, 45, 46, 48, 53; [all paragraphs and Figures are cited for each 
“A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least:… identify the user as a speaker of a second user speech sample…, the second user speech sample obtained by one of a first speaker identification device or a second speaker identification device”: Figures 1, 3; Paragraphs 3, 5, 19, 39, 40, 42, 45, 46, 48; one of the multiple computing devices can be interpreted as “a first speaker identification device” that performs speaker identification on audio data received by the microphone of the “first speaker identification device”, where the audio data is suggested to be a “sample” of speech spoken by the user [see e.g. “OK computer” by the user holding the computing device 106 in his/her hand] where any other one of the multiple computing devices which also performs speaker identification can be interpreted as “a second speaker identification device”, where the “second user speech sample” is a [HOTWORD] [QUERY] input including a question.  Figure 3 and paragraphs 39, 40, 42, 45, 46, and 48 describe where computing devices “that can be used to implement the techniques described here” can include instructions and a processor that executes the instructions to “perform one or more methods, such as those described above” [which suggests where the functions of the computing devices are implemented using a computer program product comprising instructions executed by a processor]
“determining a first… and a second…; and causing a closest one of the first speaker identification device or the second speaker identification device to …”: 
Foerster does not, but Kim suggests A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least:… identify the user as a speaker of a second user speech sample…, the second user speech sample obtained by one of a first speaker identification device or a second speaker identification device; determine a first proximity of the user to the first speaker identification device and a second proximity of the user to the second speaker identification device, perform a comparison of the first proximity and the second proximity; and cause a closest one of the one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Paragraphs 20-21, 39, 43, 65;
Paragraph 43 describes where a user can utter a trigger phrase and a command or question in sequence without waiting for a confirmation [e.g. “Hey Siri, what’s the weather today?”, similar to the [HOTWORD][QUERY] utterance in Foerster] and then a virtual assistant responds to the question [where the response is at least suggested to 
Paragraph 65 describes determining relative user proximity to various devices to ensure that the nearest device is most likely to trigger, and distances between users and multiple devices can be used to determine which device should trigger upon detection of a trigger phrase [where determining “which device should trigger” at least suggests that one of a plurality of devices should trigger/react-to-the-speech-input while other devices do not] and where two devices can compare determined distances to determine that a user may be nearer to a device.  Paragraph 42 describes examples of triggers [e.g. hey Siri] which are comparable to [HOTWORD]s in Foerster.
Kim thus suggests where the processing of the [HOTWORD][QUERY] “second speech sample” [where the QUERY is a question] by the closest/”first speaker identification device” in Foerster leads to a response to the question in the “second speech sample” being output by the closest computing device “first speaker identification device” that processes the “second speech sample”
Kim also suggests where, instead of using loudness scores, Foerster’s computing devices determine respective distances to the speaker/user and then compare the distances in order to determine that the “first speaker identification device” is the closest, thereby determining that the “first speaker identification device” and not the other “speaker identification devices” should process the “second user speech sample” audio data and output a response the question in the “second user speech sample”)

	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of closest device determination with another because the prior art teaches the claimed invention except for the substitution of closest device determination which does not necessarily include computing devices comparing determined proximities between the computing devices and a user with closest device determination which does.  Kim teaches that closest device determination which includes computing devices comparing determined proximities between the computing devices and a user was known in the art.  One of 
Foerster, in view of Kim, do not, but Sharifi suggests A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least: …in the speaker identification program, identify the user as a speaker of a second user speech sample based on the first command phrase sample…, the second user speech sample obtained by one of a first speaker identification device or a second speaker identification device; determine a first proximity of the user to the first speaker identification device and a second proximity of the user to the second speaker identification device, perform a comparison of the first proximity and the second proximity; and cause a closest one of the one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, 
The cited portions of Sharifi suggest using a multi-word query/command [QUERY] portion of a [KEYWORD][QUERY] utterance to train a text-independent speaker identification model for a registered user [see particularly col. 10, line 63 – col. 11, line 10] which suggests that future text-independent speaker identifications of future [KEYWORD] [QUERY] utterances are performed using the “updated”/additionally-trained text-independent model.
Sharifi thus suggests where the speaker identification performed on audio data in the computing devices in Foerster [where the audio data is suggested by Foerster to be [HOTWORD][QUERY] utterances] use text-independent models that are trained based on previous command/query [QUERY] portions of [HOTWORD][QUERY] utterances, and thus suggests “…in the speaker identification program, identify the user as a speaker of a second user speech sample based on the first command phrase sample…” [i.e. training the user’s text-independent speaker identification model in the “first speaker identification device’s” “speaker identification program” using a previous multi-word command [QUERY] portion of a previous [HOTWORD][QUERY] utterance and then identifying the user as the speaker of the “current” [HOTWORD][QUERY] utterance “second user speech sample” based on the text-independent speaker identification model which is based on the previous utterance’s multi-word command [QUERY] portion, i.e. which is “based on the first command phrase sample”])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker 
Foerster, in view of Kim and Sharifi, do not, but Johnson suggests A computer program product comprising instructions that, when executed by one or more processors, cause the one or more processors to at least: determine if a command phrase sample threshold to enroll a user in a speaker identification program is satisfied based on a first command phrase sample in a first user speech sample associated with the user and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enrolling the user in the speaker identification program, identify the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples,  the second user speech sample obtained by one of a first speaker identification device or a second speaker identification device; determine a first proximity of the user to the first speaker identification device and a second proximity of the user to the second speaker identification device, perform a comparison of the first proximity and the second proximity; and cause a closest one of the one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (col. 2, lines 45-57; col. 3, line 52 - col. 4, line 20; col. 12, lines 17-48; Col. 5, lines 26-42; Col. 17, lines 46-61;
Johnson describes, in Col. 3, line 52 – col. 4, line 20, where a user’s spoken multi-word command can be used to create or enhance an existing voice print or voice profile of the user’s voice [similar to how Sharifi trains a text-independent model based on the query part of an utterance whose speaker is identified, where the query is suggested to be a multi-word command in some embodiments] and where a user may be “automatically enrolled or registered in the voice biometric authentication program after a certain number and quality of samples are collected and used to generate the characteristics, voice profile, and/or voice print. In some embodiments the number and quality of samples must meet or exceed an enrollment threshold”.  Col. 2, lines 45-57 describes passively enrolling a user in the authentication program based on samples that are collected and analyzed, where passive enrollment is where the user is enrolled without actively responding to a request for information [which suggests where the automatic enrollment in col. 3, line 52 – col. 4, line 20 is passive because it is automatic and in response to collecting a number and quality of samples].  Col. 12, lines 17-48 further describes where a user may be passively enrolled passively by speaking a particular command a certain number of times [i.e. enrollment based on multiple utterances of the same particular command spoken by the same user].
Johnson thus suggests where, instead of optionally training the identified user’s text-independent model based on only the multi-word command portion of a “current” utterance [the most recent utterance that can be used to train the text-independent 
Johnson thus suggests “determine if a command phrase sample threshold to enroll a user in a speaker identification program is satisfied based on a first command phrase sample in a first user speech sample associated with the user and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enrolling the user in the speaker identification program, identify the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples,” [where the “first speaker identification device” computing device, prior to performing speaker identification on the “second user speech sample” using a text-independent speaker identification model, determines if the “first command phrase sample” which is a multi-word command [QUERY] portion of a previous [HOTWORD] [QUERY] utterance and other previous utterances of the same 
Applicant’s Specification [paragraph 37] describes the concept of saving command phrase samples “for possible future use [e.g. re-enrolling the user with a larger set of command phrase samples]”.  “Enrolling” is, therefore, not limited to only adding a new user that the system previously could not identify, and also includes updating/training data that is used to identify an already-registered user.)
Therefore, it would have been obvious to one of ordinary skill in the art at the effective filing to perform a simple substitution of one type of spoken-command-based speaker identification update with another because the prior art teaches the claimed invention except for the substitution of spoken-command-based speaker identification update which does not enroll speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity with spoken-command-based speaker identification update which does.  Johnson teaches that spoken-command-based speaker identification update which enrolls speaker identification data based on multiple utterances of the same command in response to determining that the multiple utterances of the same command are of sufficient quantity was known in the art.  One of 

wherein the instructions, when executed, cause the one or more processors to parse the first user speech sample into a keyword phrase sample and the first command phrase sample (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Col. 7, lines 3-14 describe identifying a keyword portion of the audio signal 122.  Col. 7, lines 15-27 and col. 8, lines 16-40 describe where the text-dependent analyzer analyzes a keyword portion of the audio signal 122 and where the text-independent analyzer module analyzes “only the portion of the audio signal 122 subsequent to the portion of the audio signal that corresponds to the keyword” and in other embodiments “may analyze the entire audio signal 122”.  Col. 10, line 63 – col. 11, line 10 describes where additional training of the text-independent model can be performed using “the remainder of the audio signal 122” [at least suggested to be the [QUERY] portion of the audio signal 122].  Col. 4, line 59 – col. 5, line 28 describes where a spoken utterance by a user can include a phrase [KEYWORD] and a multiple word [QUERY], where the query can be a command.  Figure 1 and Col. 6, lines 51-53 and col. 3, line 62 – col. 4, line 15 further more specifically describes where the query including a keyword and a query is spoken by a registered user.
These portions at least suggest where the speaker recognition engine distinguishes the two [KEYWORD] and [QUERY] components from each other and has the two components as a keyword portion and a remaining/subsequent/query portion 
These portions thus suggest “wherein the instructions, when executed, cause the one or more processors to parse the first user speech sample into a keyword phrase sample and the first command phrase sample” [where the “first speaker identification device” further parses a previous [HOTWORD][QUERY] utterance spoken by a user into a multi-word [HOTWORD] portion and a multi-word command [QUERY] portion so that each portion can be processed with corresponding process[es], where the multi-word command [QUERY] portion is used to train the text-independent speaker identification model, where the parsing includes receiving a multi-word [HOTWORD] and multi-word command [QUERY] utterance from the registered user that spoke a registration phrase, identifying the keyword, and determining a multi-word keyword portion and a multi-word command portion])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Foerster, Kim, Sharifi, and Johnson suggest the limitations of claim 13 and Sharifi further suggests parsing a [KEYWORD][QUERY] utterance into a [KEYWORD] portion and a [QUERY] portion and then using the [QUERY] portion to train the text-independent speaker identification model).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding the parsing function 

	As per Claim 17, Foerster, in view of Kim, do not, but Sharifi suggests wherein the instructions, when executed, cause the one or more processors to identify the user as the speaker of the keyword phrase sample (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Col. 7, lines 15-27 describes where the text-dependent analyzer module analyzes “the portion of the audio signal 122 that corresponds to the keyword [i.e. the “keyword phrase sample” portion of the audio signal 122 in col. 8, lines 16-40 and suggested, in part, by col. 4, line 59 – col. 5, line 28] and determines confidence levels for one or more text-dependent models [similar to how the text-independent analyzer determines confidence levels for text-independent models].  Col. 7, line 50 – col. 8, line 15 describes where the confidence levels correspond to likelihoods that a particular speaker spoke the utterance and where a user account “John” has a high confidence level [similar to the high confidence level for “John” in col. 8, lines 41-57].  Col. 8, line 58 – col. 9, line 26 and col. 9, line 65 – col. 10, line 8 describe where confidence levels based on text-dependent models include a high score for “John” and are used to determine the identity of the speaker [at least suggested to be the speaker of the audio 
These portion suggest “wherein the instructions, when executed, cause the one or more processors to identify the user as the speaker of the keyword phrase sample” [where the “first speaker identification device” further determines a high confidence level for the-user-who-is-a-registered-user which indicates a high likelihood that the registered user spoke the keyword portion of the audio signal 122, where the confidence level is generated based on the registered user’s text-dependent model])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Foerster, Kim, Sharifi, and Johnson suggest the limitations of claims 13 and 16 and Sharifi further suggests identifying a user as the speaker of a keyword phrase sample).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding the keyword phrase speaker identification suggested by 

As per Claim 18, Foerster, in view of Kim, do not, but Sharifi suggests wherein the instructions, when executed, cause the one or more processors to associate the first command phrase sample with the user (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Col. 3, line 62 – col. 4, line 21 describes where the registered user speaks a query and where the system identifies a speaker and then uses the utterance of that speaker to further train the text-dependent and/or text-independent models”.  Col. 7, lines 3-27 and col. 7, lines 50-63 describes analyzing a keyword portion of the received audio signal 122 to determine a confidence level associated with one or more text-dependent models, where the confidence level corresponds to likelihoods that a particular speaker spoke the utterance.  Col. 8, lines 16-40 describe determining a confidence level for one or more text-independent models based on analyzing the audio signal 122, including, in one embodiment, analyzing the entire audio signal 122.  Col. 8, line 58 – col. 9, line 26 describes combining the confidence levels from the text-
Sharifi suggests “wherein the instructions, when executed, cause the one or more processors to associate the first command phrase sample with the user” [as discussed in the rejection of claim 13, a previous multi-word command [QUERY] portion of a previous [HOTWORD][QUERY] utterance used to train the user’s text-independent speaker identification model in the “first speaker identification device’s” “speaker identification program” is interpreted as “the first command phrase sample”, and Sharifi in col. 10, line 63 – col. 11, line 10 suggests that the remainder/[QUERY] portion of the previous [HOTWORD][QUERY] utterance is used to train the text-independent model associated with “that speaker” “Once the speaker recognition engine has identified a particular speaker, and col. 8, lines 16-40 describe where a text-independent analyzer outputs a confidence level for “the subsequent portion of the audio signal”, and so Sharifi suggests where the “first speaker identification device” further determines a high confidence level corresponding to the user for the multi-word command [QUERY] portion “first command phrase sample” which indicates that the user is the speaker of the “first command phrase sample”, which leads to the “first command phrase sample” being used to train the user’s text-independent model])


As per Claim 20, Foerster, in view of Kim, do not, but Sharifi suggests wherein the user is a first user and the instructions, when executed, cause the one or more processors to: determine if the command phrase sample threshold to enroll a second user in the speaker identification program is satisfied based on a second command phrase sample associated with the second user and one or more previously detected command phrase samples associated with the second user, and in response to determining that the command phrase sample threshold is satisfied, enroll the second user in the speaker identification program (Figure 1; col. 3, line 32 - col. 4, line 21; col. 4, line 59 - col. 5, line 28; col. 6, lines 51-53; col. 7, lines 3-27; col. 7, line 50 - col. 8, line 15; col. 8, line 16 - col. 9, line 27; col. 9, line 65 - col. 10, line 8; col. 10, line 63 - col. 11, line 10; col. 13, lines 5-31;
Foerster, Kim, Sharifi, and Johnson, suggest the limitations of claim 13 for the reasons discussed in the rejection of claim 13.
Claim 20 is directed to steps of claim 13 being performed for a second person.
Sharifi, in col. 5, lines 5-28 describes where any of multiple different users may issue a query or command, and col. 7, line 64 – col. 8, line 15 describes where confidence levels are obtained for all 3 of Matt, John, and Dominik.  Sharifi thus suggests where the speaker identification and enrollment process can be performed for “a second user” [i.e. Matt/Domink instead of John].
Therefore, Sharifi suggests where the same steps of claim 13 suggested by Foerster, Kim, Sharifi, Johnson, are performed for a different/”second” user)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speaker identification processing with another because the prior art teaches the claimed invention except for the substitution of speaker identification processing which is not necessarily performed for each of a plurality of people with speaker identification processing which is.  Sharifi suggests that speaker identification processing which is .

Claims 9 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Foerster, in view of Kim, Sharifi, Johnson, as applied to claims 7 and 13 above, and further in view of Rockenbeck et al. (US 2002/0116190), hereafter Rockenbeck.

Claims 9 and 15 are similar to claim 3 and so are rejected under similar rationale.

Allowable Subject Matter
The following is a statement of reasons for the indication of allowable subject matter:  
	As per Claim(s) 2 (and similarly claims 8 and 14), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1-2, including (i.e. in combination with the remaining limitations in claim[s] 1-2) determining that the command phrase sample threshold is satisfied if the first command phrase sample includes at least one minute of speech and the one or more previously detected command phrase samples include at least one minute of speech.
	As discussed in the 112 rejections above, this subject matter is not supported by written description, and the closest subject matter that is supported can be rejected based on the prior art combination applied to reject claims 3, 9, and 15 (since Rockenbeck describes where “enough [speech] to warrant performing another adaptation of [a] model” is “five minutes of speech” which is greater than one minute of speech).

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

Claims 1, 3, 6, rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 6, 7, 8, 9, 12, of U.S. Patent No. 10,236,001, hereafter Parent Patent 2, in view of Alvarez Guevara (US 2017/0025125), hereafter A-G. 

As per Claim 1, Claim 9 of Parent Patent 2 (interpreted as incorporating the limitations of claim 8 of Parent Patent 2) teaches A method comprising: parsing a first user speech sample associated with a user into a first… phrase sample and a first command phrase sample; (Claims 8-9 of Parent Patent 2 teach a speaker identification device with elements configured to perform various functions which suggests a corresponding method comprising the various functions, and the function of the “speech parsing circuit” in claim 8 suggests a corresponding method step of “parsing the received compound speech sample into a command phrase sample and a keyword phrase sample”)
determining if a command phrase sample threshold is satisfied based on the first command phrase sample and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enrolling the user in a speaker identification program; (the functions of the last limitation of claim 8 and claim 9 suggest corresponding method steps including determining if the command phrase sample and one or more earlier command phrase samples are sufficient, and then enrolling the user in the text-independent speaker ID circuit, where determining that the command phrase samples satisfy “sufficient command phrase sampling” is suggested to be based on a threshold that defines what is sufficient and what is not sufficient) 
identifying the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples, the second user speech sample obtained by a first speaker identification device or a second speaker identification device, (the functions in Claim 9 of Parent Patent 2 suggest corresponding method steps including identifying the user as a speaker of a second speech sample, where the identifying of the user as the speaker of the second speech sample occurs after the user is enrolled which suggests that the identifying the user as the speaker of the second speech sample is based on enrollment data, where the enrollment data is suggested to be based on the command phrase samples because claim 9 recites where the enrolling of the user uses the command phrase samples, where the second user speech sample is at least suggested to be obtained by the speaker identification device of claim 8 so that the text-independent speaker ID circuit in the speaker identification device can identify the speaker of the second speech sample)
determining a first proximity of the user to the first speaker identification device and a second proximity of the user the second speaker identification device; comparing the first proximity and the second proximity; and causing a closest one of the first speaker identification device or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Claim 12 of Parent Patent 2 describes functions performed for a “query” which are, like the identifying of the user as the speaker of a second speech sample, performed “after the user is…enrolled”.  Claim 12 of Parent Patent 2 thus suggests where the functions of claim 12 [including responding to the query, where a query is suggested to be a speech sample] are additional performed for the second speech sample in claim 9, where determining which of a plurality of speaker ID devices is closest logically involves determining proximities between the user and the speaker ID devices and comparing the proximities to determine which proximity is the closest)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because Claims 8, 9, and 12 of Parent Patent 2 included each element claimed, although not necessarily in a single claim, with the only difference between the claimed invention and Claims 8, 9, and 12 of Parent Patent 2 being the lack of actual combination of the elements in a single claim (Claims 8-9 include enrollment based on sufficient sampling and identifying a speaker of a second speech sample after enrollment, and Claim 12 of Parent Patent 2 includes, after enrollment, determining 
Claims 1-2 of Parent Patent 2 do not, but A-G suggests parsing a first user speech sample associated with a user into a first wake phrase sample and a first command phrase sample (paragraphs 3-4, 23-28;
The claims of Parent Patent 2 do not specifically teach where the keyword phrase is a wake phrase.
A-G [in paragraphs 3-4] teach where an utterance can include both a command [e.g. “DRIVE HOME”] and a hotword [OK COMPUTER], where a hotword “wakes a device up from a sleep state” [and is thus a “wake phrase”].
A-G thus suggests where the keyword phrase in the compound speech sample is, instead a “wake phrase”)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of speech phrase with 

Claims 7, 9, 10, 12, 13, 15, 16, 17, 18, 20, are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 6, 7, 8, 9, 12, 13, of Parent Patent 2. Although the claims at issue are not identical, they are not patentably distinct from each other because the claims of this application are rendered obvious by the claims of Parent Patent 2.

As per Claim 7, Claim 9 of Parent Patent 2 (interpreted as incorporating the limitations of claim 8 of Parent Patent 2) teaches A speaker identification device (preamble of Claim 8 of Parent Patent 2) 
comprising: a text-independent enrollment circuit to: determine if a command phrase sample threshold to enroll a user in a text- independent speaker identification program is satisfied based on a first command phrase sample in a first user speech sample associated with the user and one or more previously detected command phrase samples associated with the user; in response to determining that the command phrase sample threshold is satisfied, enroll the user in the text-independent speaker identification program (2nd limitation in the body of Claim 8 of Parent Patent 2, last limitation of claim 8 of Parent Patent 2, and Claim 9 of Parent Patent 2
…and a text-independent speaker identification circuit to: identify the user as a speaker of a second user speech sample based on the first command phrase sample and the one or more previously detected command phrase samples; (the functions in Claim 9 of Parent Patent 2 include identifying the user as a speaker of a second speech sample, where the identifying of the user as the speaker of the second speech sample occurs after the user is enrolled which suggests that the identifying the user as the speaker of the second speech sample is based on enrollment data, where the enrollment data is suggested to be based on the command phrase samples because claim 9 recites where the enrolling of the user uses the command phrase samples, where the second user speech sample is at least suggested to be obtained by the speaker identification device of claim 8 so that the text-independent speaker ID circuit in the speaker identification device can identify the speaker of the second speech sample)
Claims 8-9 of Parent Patent 2 do not, but Claim 12 of Parent Patent 2 suggests a delivery circuit…determine a first proximity of the user to the speaker identification device and a second proximity of the user to a second speaker identification device; perform a comparison of the first proximity and the second proximity; and instruct a closest one of the delivery circuit or the second speaker identification device to output a response to the second user speech sample based on the comparison of the first proximity and the second proximity (Claim 12 of Parent Patent 2 describes functions performed for a “query” which are, like the identifying of the user as the speaker of a second speech sample, performed “after the user is…enrolled”.  Claim 12 of Parent Patent 2 thus suggests where the functions of claim 12 [including responding to the query, where a query is suggested to be a speech sample] are additional performed for the second speech sample in claim 9, where determining which of a plurality of speaker ID devices is closest logically involves determining proximities between the user and the speaker ID devices and comparing the proximities to determine which proximity is the closest)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because Claims 8, 9, and 12 of Parent Patent 2 included each element claimed, although not necessarily in a single claim, with the only difference between the claimed invention and Claims 8, 9, and 12 of Parent Patent 2 being the lack of actual combination of the elements in a single claim (Claims 8-9 include enrollment based on sufficient sampling and identifying a speaker of a second speech sample after enrollment, and Claim 12 of Parent Patent 2 includes, after enrollment, determining which speaker ID device is closest to the user and responding to speech using the closest speaker ID device).  One of ordinary skill in the art could have combined the elements as claimed by known methods (by adding the functions of claim 12 of Parent Patent 2 to the functions performed on the second speech sample in claim 9 of Parent .

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
2007/0106517 teaches “If the samples are acceptable, the voice biometric engine determines, at step 46F, whether or not more samples are required in order to create a voiceprint. If more samples are required, then system 10 resets the process and returns to step 46B. Here, instructions are sent back to the customer over the data channel for what should be included in the next embedded audio sample that is sent. Once the voice biometric engine has captured sufficient samples to create an enrollment record, the enrollment record is created at step 46G. Following the successful creation of the enrollment record, system 10 writes an appropriate status message to the database at step 46H. At that point, system processing terminates and any additional processing continues within the customer's system according to their own business rules” (paragraph 68).  Figure 1 also seems to indicate that the voice biometric engine is server-side

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249.  The examiner can normally be reached on M-F 9:00AM -5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602.  The fax phone 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






EY 4/29/2021
/ERIC YEN/Primary Examiner, Art Unit 2658