DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Final Office Action mailed 5/27/2021, applicant has submitted an amendment and Request for Continued Examination filed 9/27/2021.
Claim(s) 1, 7, 12, 33, 34, 35, has/have been amended.  Claim(s) 2-4, 6, 8, 11, 13-16, 18-19, 21-22, 24-26, 28, and 30, has/have been cancelled.  New Claim(s) 37-39 has/have been added.
Response to Arguments
New prior art rejections necessitated by amendment are presented below.

The following amendments are OPTIONAL:
In claims 5 and 7, “the speech rendering” (line 2 of claim 5, line 2 of claim 7, lines 3-4 of claim 7 can be amended to –the text to speech rendering—(for language consistency, because “speech rendering” in claim 1 was amended to “text to speech rendering”)

Claim Objections
Claims 33, 35, are objected to because of the following informalities:    
In claim 33, “wherein the speech rendering of the word or phrase defining the second text based command is a text to speech rendering of the word or phrase 
Claim 35 recites “The method of claim 34, and wherein” where the “and” in this phrase appears to be unnecessary.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 10, 27, and 37, are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 10 recites “the second candidate directive invoking vocal utterance” (twice, once in line 4 and once in the 2nd to last line) which lacks antecedent basis (“a second candidate directive invoking vocal utterance” was previously deleted from claim 1 and “moved” to claim 33).
the second candidate directive invoking vocal utterance” in line 4 of claim 10 to recite --a second candidate directive invoking vocal utterance—should suffice to resolve this issue.

As per Claim 27, “the computer system” in line 2 lacks antecedent basis.  
Amending “the computer system” at the start of line 2 to –a computer system—and amending “a computer system” (starting at the end of line 2 and ending at the start of line 3) to recite –the computer system—should suffice to resolve this issue.

Claim 37 recites “wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command includes information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command” which recites where an action includes information, and not where an action contains a step.  
Applicant appears to have intended to claim “wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command includes communicating information to a user indicating that the word or phrase of the candidate directive invoking vocal 
It is therefore not clear if Applicant meant to claim “includes information” or “includes communicating information”.

The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claim 32 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  

Claim 32 is not further limiting.  “a/the speech rendering of a/the word or phrase defining a/the second text based command” in claim 1 was amended to “text to speech rendering of the word or phrase defining the second text based command”

Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 12, 32, and 37, is/are rejected under 35 U.S.C. 103 as being unpatentable over Agapi et al. (US 2008/0046250), hereafter Agapi, in view of Ittycheriah et al. (US 6,185,530), hereafter Ittycheriah, and Blandin et al. (US 2017/0169816), hereafter Blandin, and Patch (US 2015/0025885).

As per Claim 1, Agapi suggests A method comprising: receiving voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system; responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… a word or phrase defining a second text based command, communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… (Figures 1-2; paragraphs 8-9, 24, 27-29, 31-33, 35, 38-44, 46-51, 56-57 [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified 
“A method comprising: receiving voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system”: Figure 1; paragraphs 24, 27, 29, 31-32, 39-40, 42-43, 46-49, 51, 56-57; receiving, by a computer system, a spoken new user-defined voice phrase command that is, and thus defines, a candidate to be a voice command [i.e. a candidate sequence of sounds that form an utterance of a voice phrase command] that is to be used [when the sequence of sounds that form an utterance of the voice phrase command is, in the future, matched to a command phrase version of the new user-defined voice command which is to be included in a set of user-defined commands which are each suggested to be a word/phrase/sentence] to invoke a directive to execute [prompt the computer system to perform functions corresponding to] the command phrase version of the new user-defined voice phrase command which is to be included in the set of user-defined commands, thereby causing the computer system to perform “a first computer function”/programmatic-action[s] associated with the command phrase version of the new user-defined voice command which is to be included in the set of user-defined commands [i.e. the “first… command” is the command phrase version of the new user-defined voice phrase command which is to be included in the set of user-defined commands, where words/phrases/sentences are commonly/conventionally represented, in data form, using a sequence of “text” characters, such that the command phrase version of the new user-defined voice command is suggested to be “a first text based command” and the existing commands 
“responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… a word or phrase defining a second text based command, communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar 
Agapi does not, but Ittycheriah suggests communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…the word or phrase defining the second text based command (Figure 3; Col. 2, lines 45-59; Col. 6, lines 13-31; Col. 7, lines 11-26; Col. 7, line 36 – col. 8, line 4; Col. 8, lines 62-67; Col. 9, lines 11-67;
Agapi teaches where likelihood of confusion leads to a warning being presented that the new command may be confused with a preexisting command [paragraph 33] but does not specifically state that “a preexisting command” is specifically the preexisting command that is determined to be confusable with the new user-defined voice phrase command [as opposed to merely indicating that the new command may be confused with “some” command without specifically specifying which command may be confused with the new command]
In Ittycheriah:
Col. 2, lines 45-59 describes a user inputting at least one new word and computing acoustic similarity measures between the at least one word and at least a portion of existing words, and indicating results associated with the at least one measure and prompting the user to input an alternative word, and, if no measure is within a threshold range, adding the at least one new word to the vocabulary.  Col. 6, lines 13-31 describes where a new word [or command] is input [which suggests where words that may be confused with a new word/command may also be commands]  Col. 7, lines 11-26 describes comparing the baseform of the new word and baseforms of the existing words, and displaying a message to the user indicating that input word has previously been confused with certain existing words.  Col. 7, line 36 – col. 8, line 4 describes comparing a lefeme sequence of the new word and lefeme sequences of existing words, and, based on the comparison, a list/set of confusable words [words with pronunciation sequences too close to the new word] is identified and made 
Ittycheriah thus suggests where the warning provided by Agapi’s system more specifically identifies which existing command [the multi-word phrase “second text based command”, where, as discussed above, one type of command in Agapi’s system is suggested be a multi-word phrase and is suggested to be text since words/phrases/sentences, in data form, are commonly/conventionally a sequence of text characters] is determined, by Agapi’s system, to be confusingly similar to the new user-defined voice phrase command [i.e. “communicating, to the user, information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…the word or phrase defining the second text based command”].  To be clear, Ittycheriah suggests where the warning includes both the command phrase version of the new user-defined voice phrase command [the command phrase version of the new user-defined voice phrase command is suggested to be a “first text based command”] and the existing multi-word phrase “second text based command” [similar to how Ittycheriah suggests displaying both the new word and 
While Ittycheriah describes “it would be desirable to provide a method and apparatus for relieving the recognizer from performing acoustic confusability checks” [col. 2, lines 24-38] it is fairly clear, based on the other portions of the background [which describe where a speech recognizer resolves situations where there are confusable words, for example based on context] and from the fact that Ittycheriah describes performing acoustic similarity/confusion comparisons that Ittycheriah is not teaching away from performing acoustic similarity comparisons to determine acoustic confusability.)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic confusability notification provided to a user with another because the prior art teaches the claimed invention except for the substitution of an acoustic confusability notification provided to a user which does not necessarily identify a new command and which command may be confused with the new entity with an acoustic confusability notification provided to a user which does.  Ittycheriah suggests that an acoustic confusability notification provided to a user which identifies a new command and which 
	Agapi, in view of Ittycheriah, do not, but Blandin suggests responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a text to speech rendering of a word or phrase defining a second text based command, communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command (Paragraphs 36 and 40;
	The combination [thus far] is as discussed above in the portion of this rejection of claim 1 based on Ittycheriah, including where the user is provided a warning that the command phrase version of the new user-defined voice phrase command [suggested to be a “first text based command”] may be confused with a preexisting multi-word phrase command [suggested to be a “second text based command”], where the user is notified of the command phrase version of the new user-defined voice command and which 
Agapi suggests comparing a new user-defined voice phrase command and existing voice commands and where commands can be text phrases, but does not specifically teach where the comparison is performed by comparing a text-to-speech rendering of a word to input audio.  
In Blandin, paragraph 36 describes where speech analysis receives audio from an event [e.g., in its full form, segmented form, individual utterances, etc.].  Paragraph 40 describes spotting keywords in the event audio, including by converting keywords into audio signals using TTS and correlating the keywords with event audio using acoustic similarity measures to spot the occurrence of keywords within the event audio.
Blandin thus suggests where Agapi’s system performs the comparison between the spoken new user-defined voice phrase command [an “individual utterance” of a new voice command] and the existing commands [which are one or more words] by performing text-to-speech on the existing commands [where the existing commands, including the multi-word phrase “second text based command” which is determined to be confusable with the new user-defined voice phrase command are suggested to be “text based commands”, as discussed above in the portion of this rejection of claim 1 based on Agapi] such that “a text to speech rendering of a word or phrase defining a second text based command” is generated by TTS to be compared with the spoken new user-defined voice phrase command, and the comparison between the spoken new user-defined voice phrase command and the existing commands is also performed by comparing the TTS-generated audio signals of the existing commands to the spoken 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic similarity comparison with another because the prior art teaches the claimed invention except for the substitution of an acoustic similarity comparison which does not necessarily 
Agapi, in view of Ittycheriah and Blandin, do not, but Patch suggests wherein the first text based command specifies a command operator and a first resource and wherein the second text based command specifies the command operator and a second resource (paragraphs 28, 109, 149;
	Paragraph 109 of Patch describes where a user can speak [e.g. “Add Location Kitchen”] to add a location to a list of locations [e.g. “Kitchen”] where the locations can be spoken as part of commands.  Paragraph 109 of Patch more specifically describes where the added location [“Kitchen”] can be referenced in commands [e.g. “From Kitchen” and “Go Kitchen”] and where the list includes other locations that are at least suggested to be able to be referenced in the same commands [e.g. bathroom and 
	Patch “wherein the first text based command specifies a command operator and a first resource and wherein the second text based command specifies the command operator and a second resource”: where the command phrase version of the new user-defined voice phrase command [suggested to be “first text based command”] and the existing multi-word phrase command [suggested to be a “second text based command”] are each two-part commands that include the same “command operator”, where the “first text based command” references a “first resource” [e.g. one file name/folder name/location] and the “second text based command” references a “second resource” [e.g. another file name/folder name/location])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one pair of commands with another because the prior art teaches the claimed invention except for the substitution of a pair of commands which is not necessarily one command that includes a command operator and a first resource and another command that includes the command operator and a second resource with a pair of commands which is.  Patch suggests that a pair of commands which is one command that includes a command operator and a 

	As per Claim 5, Agapi, in view of Ittycheriah, do not, but Blandin suggests wherein the method includes for performing the determining electronically synthesizing the speech rendering of the word or phrase defining the second text based command (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands electronically synthesizes the TTS-generated audio signals of the existing commands [including electronically synthesizing the TTS-generated audio of the existing multi-word phrase “second text based command”])


As per Claim 12, Agapi, in view of Ittycheriah and Blandin, do not, but Patch suggests wherein the first resource and the second resource are selected from the group consisting of file resources and directory resources (paragraphs 28, 109, 149;

	As discussed in the rejection of claim 1: Patch “wherein the first text based command specifies a command operator and a first resource and wherein the second text based command specifies the command operator and a second resource”: where the command phrase version of the new user-defined voice phrase command [suggested to be “first text based command”] and the existing multi-word phrase command [suggested to be a “second text based command”] are each two-part commands that include the same “command operator”, where the “first text based command” references a “first resource” [e.g. one file name/folder name/location] and the “second text based command” references a “second resource” [e.g. another file name/folder name/location]

Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one pair of commands with another because the prior art teaches the claimed invention except for the substitution of a pair of commands which is not necessarily one command that includes a command operator and a first file/directory resource and another command that includes the command operator and a second file/directory resource with a pair of commands which is.  Patch suggests that a pair of commands which is one command that includes a command operator and a first file/directory resource and another command that includes the command operator and a second file/directory resource was known in the art.  One of ordinary skill in the art could have substituted one pair of commands with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the input speech of the new user-defined voice phrase command to existing voice commands, and provides the user a warning that the new user-defined voice phrase command may be confused 

As per Claim 32, Agapi, in view of Ittycheriah, do not, but Blandin suggests wherein the speech rendering of the word or phrase defining the second text based command is a text to speech rendering of the word or phrase defining the second text based command (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands generates “text to speech renderings” audio signals of the existing commands [including a “text to speech rendering” of the existing multi-word phrase “second text based command”])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic similarity comparison with another because the prior art teaches the claimed invention except for the substitution of an acoustic similarity comparison which does not necessarily compare a TTS-generated audio with input audio with an acoustic similarity comparison which does.  Blandin teaches that acoustic similarity comparison which does compares 

As per Claim 37, Agapi suggests wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… includes information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… (Figures 1-2; paragraphs 8-9, 24, 27-29, 31-33, 35, 38-44, 46-51, 56-57 [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified below, i.e. all other paragraphs and Figures not specifically referenced for any particular limitation are eligible to provide context and additional support]
“responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… a word or phrase defining a 
Agapi does not, but Ittycheriah suggests wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…the word or phrase defining the second text based command includes information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…the word or phrase defining the second text based command (Figure 3; Col. 2, lines 45-59; Col. 6, lines 13-31; Col. 7, lines 11-26; Col. 7, line 36 – col. 8, line 4; Col. 8, lines 62-67; Col. 9, lines 11-67;
Agapi teaches where likelihood of confusion leads to a warning being presented that the new command may be confused with a preexisting command [paragraph 33] but does not specifically state that “a preexisting command” is specifically the preexisting command that is determined to be confusable with the new user-defined voice phrase command [as opposed to merely indicating that the new command may be confused with “some” command without specifically specifying which command may be confused with the new command]
In Ittycheriah:


While Ittycheriah describes “it would be desirable to provide a method and apparatus for relieving the recognizer from performing acoustic confusability checks” 
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic confusability notification provided to a user with another because the prior art teaches the claimed invention except for the substitution of an acoustic confusability notification provided to a user which does not necessarily identify a new command and which command may be confused with the new entity with an acoustic confusability notification provided to a user which does.  Ittycheriah suggests that an acoustic confusability notification provided to a user which identifies a new command and which entity may be confused with the new command was known in the art.  One of ordinary skill in the art could have substituted one type of acoustic confusability notification provided to a user with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the input speech of the new user-defined voice phrase command to existing voice commands, and provides the user a warning that the new user-defined voice phrase command may be confused with an existing phrase command (as per Agapi) where the warning identifies both the new user-defined voice phrase command and the 
	Agapi, in view of Ittycheriah, do not, but Blandin suggestswherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command includes information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the text to speech rendering of the word or phrase defining the second text based command (Paragraphs 36 and 40;
	The combination [thus far] is as discussed above in the portion of this rejection of claim 1 based on Ittycheriah, including where the user is provided a warning that the command phrase version of the new user-defined voice phrase command [suggested to be a “first text based command”] may be confused with a preexisting multi-word phrase command [suggested to be a “second text based command”], where the user is notified of the command phrase version of the new user-defined voice command and which preexisting voice command is confusable with the command phrase version of the new user-defined voice command.  
Agapi suggests comparing a new user-defined voice phrase command and existing voice commands and where commands can be text phrases, but does not specifically teach where the comparison is performed by comparing a text-to-speech rendering of a word to input audio.  

Blandin thus suggests where Agapi’s system performs the comparison between the spoken new user-defined voice phrase command [an “individual utterance” of a new voice command] and the existing commands [which are one or more words] by performing text-to-speech on the existing commands [where the existing commands, including the multi-word phrase “second text based command” which is determined to be confusable with the new user-defined voice phrase command are suggested to be “text based commands”, as discussed above in the portion of this rejection of claim 1 based on Agapi] such that “a text to speech rendering of a word or phrase defining a second text based command” is generated by TTS to be compared with the spoken new user-defined voice phrase command, and the comparison between the spoken new user-defined voice phrase command and the existing commands is also performed by comparing the TTS-generated audio signals of the existing commands to the spoken new user-defined voice command, such that when the spoken new user-defined voice command [“candidate directive invoking vocal utterance”, as discussed above] is determined to be confusable with an existing multi-word phrase command [“confusingly similar to… a word or phrase defining a second text based command”], the spoken new user-defined voice phrase command audio is determined to be confusable-with/acoustically-similar-to the TTS-generated audio signal of the existing multi-word 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic similarity comparison with another because the prior art teaches the claimed invention except for the substitution of an acoustic similarity comparison which does not necessarily compare a TTS-generated audio with input audio with an acoustic similarity comparison which does.  Blandin teaches that acoustic similarity comparison which does compares a TTS-generated audio with input audio was known in the art.  One of ordinary skill in the art could have substituted one type of acoustic similarity comparison with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the input speech of the new user-.

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agapi, in view of Ittycheriah, Blandin, and Patch, as applied to Claim 1, above, and further in view of Gong (US 2003/0167167).

As per Claim 7, Agapi, in view of Ittycheriah, do not, but Blandin suggests wherein the method includes for performing the determining electronically synthesizing the speech rendering of the word or phrase defining the second text based command, and wherein the method includes electronically synthesizing the speech rendering of the word or phrase defining the second text based command… (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands electronically synthesizes the TTS-generated audio signals of the existing commands [including electronically synthesizing the TTS-generated audio of the existing multi-word phrase “second text based command”])

Agapi, in view of Ittycheriah, Blandin, and Patch do not, but Gong suggests wherein the method includes electronically synthesizing the speech rendering of the word or phrase defining the second text based command in dependence on profile data of a user, the profile data specifying vocal tendencies of the user (Paragraphs 23, 50, 63-65, 69, 76;

In Gong, paragraph 23 describes a user interacting with an engine by speaking, and analyzing speech that provides a profile of the affective and physiological states of the user based on characteristics of the user’s speech, such as pitch range and breathiness.  Paragraph 50 describes vocal analysis data received to determine a user’s affective state, where examples of vocal analysis data include pitch range, volume, and breathiness in the speech of the user.  Paragraphs 63-65 describes text-to-speech used to generate speech of a verbal expression [at least suggested to be a verbal expression provided by the agent to the user.  Paragraph 69 describes modifying speech style to generate an appropriate affect for the verbal expression, where speech style may include speech rate, pitch, etc.  Paragraph 76 describes where a voice for an agent is selected and also where a processor may match the user’s speech style characteristics [e.g. speech rate, pitch average, pitch range, and articulation]
These portions suggest where an output voice can be TTS-generated and can also match the speech style characteristics derived from a user’s input speech [pitch range is one of the characteristics derived by analyzing the user’s input speech in paragraph 23, and is also a characteristic that is matched in paragraph 76, and a user’s input speech also directly and most accurately reflects “the user’s speech style characteristics” as the most recent representation of the user’s speech provided to the system].  Gong thus suggests where TTS generates speech based on speech 
Gong thus suggests where the TTS-generated audio of the existing commands [including the TTS-generated audio of the “second text based command”/existing-multi-word-phrase-command-that-may-be-confused-with-the-new-user-defined-voice-phrase-command] is generated/”electronically synthesized” “in dependence on profile data of the user, the profile data specifying vocal tendencies of the user” [as per claim 7, i.e., using speech characteristics that match speech characteristics derived by analyzing the spoken new user-defined voice phrase command, where the speech characteristics including pitch range that are derived from the spoken new user-defined voice phrase command can be interpreted as “profile data of the user” which specifies that a user’s speech has particular voice characteristics and thus logically specifies voice characteristics that a user “tends” to have since a user’s voice in a particular situation/condition commonly has roughly the same characteristics])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of voice generation with another because the prior art teaches the claimed invention except for the substitution of voice generation which is not necessarily based on speech characteristics of a user’s speech with voice generation which is.  Gong suggests that voice generation which is based on speech characteristics of a user’s speech was known in the art.  One of ordinary skill in the art could have substituted one type of voice generation with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the .

Claims 23 and 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agapi, in view of Ittycheriah, Blandin, and Patch, as applied to Claim 1, above, and further in view of Gammel et al. (US 5,832,429), hereafter Gammel.

As per Claim 23, Agapi suggests wherein the method includes examining resources of the computer system to generate a… resource names, and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a resource name from a… of resource names (Figures 1-2; paragraphs 8-9, 24, 27-29, 31-33, 35, 38-44, 46-51, 56-57;

Paragraphs 32 describes where “Before any new user-defined voice command… is accepted” the new command is compared against preexisting commands in the grammar data store, including by parsing the voice command and checking the entire command and each parsed piece for potential ambiguities with preexisting commands.  Paragraphs 27-29 similarly describe where voice commands are compared against a “set” of commands defined within grammar data store, where the grammar data store can include, among other things, a user-defined grammar, and where the user-defined grammar can include a set of user-defined commands.  Paragraphs 39-44 similarly describe parsing a voice command and analyzing “each parsed portion” to determine potential recognition ambiguities, and also more particularly describes where the portion being parsed and checked for ambiguities is a NAME.  Paragraphs 47-50 similarly describe parsing a voice command and determining likelihood of confusion for components, and more particularly describes where a “new-user-defined voice command” is received and where the components are “compared against preexisting voice commands”.  Paragraph 42 further describes where a NAME can be a word, phrase, or sentence, parsed portions of which can be similar to other commands, and where parsed portions are words.  Paragraphs 49 and 51 further describes where a new user-defined command can be accepted and associated with “one or more” programmatic actions.
These portions further suggest “wherein the method includes examining resources of the computer system to generate a… resource names, and wherein the determining includes determining whether the word or phrase of the candidate directive 
Agapi, in view of Ittycheriah do not, but Blandin suggests wherein the method includes examining resources of the computer system to generate a… resource names, and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a resource name from a… of resource names (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands [i.e. the existing command “resource NAMES”, including the multi-word phrase “second text based command”] and comparing those existing command “resource NAMES” to the spoken new user-defined voice phrase command determines whether the spoken new user-defined voice phrase command is confusingly similar to the TTS “speech rendering” of the “resource NAMES” in the set of existing command “resource NAMES” that are compared to the spoken new user-defined voice command.  “the determining” [that the “word or phrase of the candidate directive invoking vocal utterance” sounds confusingly similar to the “word or phrase defining second text based command”] includes “determining” that “the word or phrase of the candidate directive invoking vocal utterance” sounds confusingly similar to the TTS “speech rendering of” the “word or phrase defining second text based command” which is “a resource name from” the plurality/set “of resource names” because the “second text based command” is one of the set of “resource NAMES”)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic similarity comparison with another because the prior art teaches the claimed invention except for the substitution of an acoustic similarity comparison which does not necessarily compare a TTS-generated audio of a word with input audio with an acoustic similarity 
	Agapi, in view of Ittycheriah, Blandin, and Patch do not, but Gammel suggests wherein the method includes examining resources of the computer system to generate a list of resource names, and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a resource name from a list of resource names (“comparing the name to be enrolled to the names in the database to reject any name that is too similar”, col. 1, lines 36-41; “During similar name checking… match an existing name on the list… already on the list”, col. 5, lines 54-65; “If a third utterance is requested for enrollment, then that name is checked first to see if it is too similar to another name on the list”, col. 8, lines 49-53; “it is determine if that name is too similar… to a name already on the speed dial list”, Abstract; 

	Agapi teaches comparing a new user-defined voice command to a “set” [see Agapi, paragraph 27] but Gammel more specifically describes where the set of names compared to a name is more specifically a “list”.
Gammel thus suggests where the set of existing voice command “resource NAMES” which is compared to the spoken new user-defined voice command in Agapi is more specifically a “list” of existing voice command “resource NAMES” [as opposed to a “set” which is not necessarily a list])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of set which is compared to a natural language input with another because the prior art teaches the claimed invention except for the substitution of a set which is compared to a natural language input which is not necessarily a list with a set which is compared to a natural language input which is.  Gammel teaches that a list which is compared to a natural language input was known in the art.  One of ordinary skill in the art could have substituted one type of set which is compared to a natural language input with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the input speech of the new user-defined voice phrase command to a set of existing voice commands, and provides the user a warning that the new user-defined voice phrase command may be confused with an existing phrase command (as per Agapi) where the warning identifies both the new user-defined voice phrase command and the existing phrase command which may 

As per Claim 29, Agapi suggests …and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a resource name from a… of resource names (Figures 1-2; paragraphs 8-9, 24, 27-29, 31-33, 35, 38-44, 46-51, 56-57;
The combination [thus far] is as discussed in the rejection of claim 1, above.
Paragraphs 32 describes where “Before any new user-defined voice command… is accepted” the new command is compared against preexisting commands in the grammar data store, including by parsing the voice command and checking the entire command and each parsed piece for potential ambiguities with preexisting commands.  Paragraphs 27-29 similarly describe where voice commands are compared against a “set” of commands defined within grammar data store, where the grammar data store can include, among other things, a user-defined grammar, and where the user-defined grammar can include a set of user-defined commands.  Paragraphs 39-44 similarly describe parsing a voice command and analyzing “each parsed portion” to determine potential recognition ambiguities, and also more particularly describes where the portion being parsed and checked for ambiguities is a NAME.  Paragraphs 47-50 similarly 
These portions further suggest “wherein the method includes examining resources of the computer system to generate a… resource names, and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a resource name from a… of resource names”: analyzing/”examining” previous new user-defined voice command NAMES [including a spoken version of the multi-word phrase “second text based command”] which are used as “resources of the computer system” to define new user-defined voice command “resource names” [i.e. command words/phrases/sentences to be stored in the user-defined portion of the grammar data store] for the system, and accepting, into the user-defined portion of the grammar data store, those “resource names” [including the multi-word phrase “second text based command”] that are not likely to be confused with preexisting commands, thereby contributing to the generation of a set of preexisting command “resource NAMES” [including the multi-word phrase “second text based command”] that are compared to the “current” new user-defined voice phrase command/”candidate directive invoking vocal utterance” to “determine 
Agapi, in view of Ittycheriah do not, but Blandin suggests …and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a resource name from a… of resource names (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands [i.e. the existing command “resource NAMES”, including the multi-word phrase “second text based command”] and comparing those existing command “resource NAMES” to the spoken new user-defined voice phrase command determines whether the spoken new user-defined voice phrase command is confusingly similar to the TTS “speech rendering” of the “resource NAMES” in the set of existing command “resource NAMES” that are compared to the spoken new user-defined voice command.  “the determining” [that the “word or phrase of the candidate directive invoking vocal utterance” sounds confusingly similar to the “word or phrase defining second text based command”] includes “determining” that “the word or phrase of the 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of acoustic similarity comparison with another because the prior art teaches the claimed invention except for the substitution of an acoustic similarity comparison which does not necessarily compare a TTS-generated audio of a word with input audio with an acoustic similarity comparison which does.  Blandin teaches that acoustic similarity comparison which does compares a TTS-generated audio of a word with input audio was known in the art.  One of ordinary skill in the art could have substituted one type of acoustic similarity comparison with another to obtain the predictable results of a system which receives, from a user, input speech of a new user-defined voice phrase command, compares the input speech of the new user-defined voice phrase command to existing voice commands, and provides the user a warning that the new user-defined voice phrase command may be confused with an existing phrase command (as per Agapi) where the warning identifies both the new user-defined voice phrase command and the existing phrase command which may be confused with the new user-defined voice phrase command (as per Ittycheriah) where the comparing compares the input speech to a TTS-generated audio of the existing phrase command (as per Blandin).
	Agapi, in view of Ittycheriah, Blandin, and Patch, do not, but Gammel suggests …and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a resource name from a list of resource names (“comparing the name to be enrolled to the names in the database to reject any name that is too similar”, col. 1, lines 36-41; “During similar name checking… match an existing name on the list… already on the list”, col. 5, lines 54-65; “If a third utterance is requested for enrollment, then that name is checked first to see if it is too similar to another name on the list”, col. 8, lines 49-53; “it is determine if that name is too similar… to a name already on the speed dial list”, Abstract; 
	Gammel, like Agapi, teaches comparing names to existing names in a database to determine if any are too similar.  
	Agapi teaches comparing a new user-defined voice command to a “set” [see Agapi, paragraph 27] but Gammel more specifically describes where the set of names compared to a name is more specifically a “list”.
Gammel thus suggests where the set of existing voice command “resource NAMES” which is compared to the spoken new user-defined voice command in Agapi is more specifically a “list” of existing voice command “resource NAMES” [as opposed to a “set” which is not necessarily a list])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of set which is compared to a natural language input with another because the prior art teaches the claimed invention except for the substitution of a set which is compared to a natural language input which is not necessarily a list with a set which is compared to a natural language input which is.  Gammel teaches that a list which is compared to a natural 
Agapi, in view of Ittycheriah, Blandin, and Gammel, do not, but Patch suggests wherein the method includes examining resources of the computer system to generate a list of resource names that can be referenced in text based executable commands of the computer system (paragraphs 28, 109, 149;
	Paragraph 109 of Patch describes where a user can speak [e.g. “Add Location Kitchen”] to add a location to a list of locations [e.g. “Kitchen”] where the locations can be spoken as part of commands.  Paragraph 109 of Patch more specifically describes where the added location [“Kitchen”] can be referenced in commands [e.g. “From Kitchen” and “Go Kitchen”] and where the list includes other locations that are at least 
	Patch thus suggests where, in the combination applied to reject claim 1, Agapi’s system has an additional function [in addition to the new user-defined voice command safety analysis] of “examining resources of the computer system to generate a list of resource names that can be referenced in text based executable commands of the computer system” [“examining”, over time, multiple input user speech utterance “resources of the computer system” used by the system to define multiple locations/file-names/folder-names of a list, thereby “generating” “a list” of location/file/folder “resource names” that can be referenced by executable computer system commands like Go <Location> or From <Location>, where commands in computer systems are commonly/conventionally defined based on computer program code/language/”text”])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to combine prior art elements according to known methods because the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (Agapi, in view of Ittycheriah, Blandin, and Gammel suggest the limitations of claims 1 and 29 except for “examining resources of the computer system to generate a list of resource names that can be referenced in text based executable commands of .

Allowable Subject Matter
The following is a statement of reasons for the indication of allowable subject matter:  

As per Claim 9 (and similarly claims 10 and 34, and consequently claims 35-36, 38-39 which depend on claim 34):
	The prior art of record does not teach or suggest the combination of all limitations in claims 1 and 9 together, including (i.e. in combination with the remaining limitations in claims 1 and 9) determining whether the word or phrase of the candidate directive invoking vocal utterance (which is “for invoking a directive to execute a first text based command to perform a first computer function of a computer system”) sounds confusingly similar to a speech rendering of a resource name of the plurality of resource names (where the plurality of resource names are resource names “that can be referenced in text based commands that the computer system is configured to execute”)
As per Claim 9 Agapi suggests wherein the method includes examining resources to generate a…that includes a plurality of…names…and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a…name of the plurality of…names (Figures 1-2; paragraphs 8-9, 24, 27-29, 31-33, 35, 38-44, 46-51, 56-57;
The combination [thus far] is as discussed in the rejection of claim 1, above.

These portions further suggest “wherein the method includes examining resources to generate a… that includes a plurality of… names… and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to… a… name of the plurality of… 
Agapi suggests wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a…name of the plurality of…names.  Agapi, in view of Ittycheriah do not, but Blandin suggests wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a…name of the plurality of…names (Paragraphs 36 and 40;
Same combination as discussed in the rejection of claim 1, where performing TTS on the existing commands [i.e. the existing command NAMES] and comparing those NAMES to the new user-defined voice command determines whether the new user-defined voice command’s word[s] are confusingly similar to the TTS “speech rendering” of the NAMES in the set of NAMES that are compared to the new user-defined voice command)

	Agapi suggests examining resources to generate a… that includes a plurality of… names. Agapi, in view of Ittycheriah and Blandin, do not, but Gammel suggests examining resources to generate a list that includes a plurality of… names (“comparing the name to be enrolled to the names in the database to reject any name that is too similar”, col. 1, lines 36-41; “During similar name checking… match an existing name on the list… already on the list”, col. 5, lines 54-65; “If a third utterance is requested for enrollment, then that name is checked first to see if it is too similar to 
	Gammel, like Agapi, teaches comparing names to existing names in a database to determine if any are too similar.  
	Agapi teaches comparing a new user-defined voice command to a “set” [see Agapi, paragraph 27] but Gammel more specifically describes where the set of names compared to a name is more specifically a “list”.
Gammel thus suggests where the set of existing voice command NAMES which is compared to the new user-defined voice command in Agapi is more specifically a “list” of existing voice command NAMES [as opposed to a “set” which is not necessarily a list])
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of set which is compared to a name with another because the prior art teaches the claimed invention except for the substitution of a set which is compared to a name which is not necessarily a list with a set which is compared to a name which is.  Gammel teaches that a list which is compared to a name was known in the art.  One of ordinary skill in the art could have substituted one type of set which is compared to a name with another to obtain the predictable results of a system which receives, from a user, a new user-defined voice NAME, compares input speech of a new user-defined voice NAME to a set of existing voice NAMEs, and provides the user a warning that the new user-defined voice NAME may be confused with an existing NAME (as per Agapi) where the warning identifies both the input and which word may be confused with the input (as per 
	Yaker suggests where a name entered by a user can be a spoken file name for a newly created file, and Patch suggests where a user can also speak to generate a list of file names and/or directory names.
	An additional reference, however, would be required to address where Agapi’s new user-defined command (which is for invoking the execution of a computer system function) is compared to a list of file names or directory names that can be referenced in text based commands that the computer system is configured to execute.
	Agapi does teach where a name can be parsed into different words (paragraph 42) and where a portion of one command can be confused with the entirety of another command (paragraphs 8-9), Bickley does teach where a command can include a function and a word (play message and save message in paragraph 7), and Gopinath (cited in the rejection of claims 4 and 15 in the Office Action mailed 10/21/2020) teaches where a command can have a function component and a reference to a name (e.g. Call Bob).  While these references may suggest where Agapi’s commands can be commands that include a function component and a person/object name component, where Agapi’s system parses the name component and compares the name component to existing commands, these references, at a minimum, do not reasonably suggest where an object/person name portion of a new user-defined voice command is compared to an object/person name in the existing voice commands.
	In contrast, the combination applied to reject claim 13 suggests claim 13 because, in the combination, an input is a file name which is compared to a list of file names (i.e. comparing two names of the same type, and not comparing an input command type input with a list of file name type names).
	Also in contrast, Claim 23 does not specify that the resource names are resource names that can be referenced in text based commands that the computer system is configured to execute (such that the commands in Agapi, themselves, can be interpreted as resource names).

As per Claim 33: 
The prior art of record does not teach or suggest the combination of all limitations in claims 1 and 33 together, including (i.e. in combination with the remaining limitations in claims 1 and 33) wherein the first text based command (where the candidate directive invoking vocal utterance is for invoking a directive to execute a first text based command to perform a first computer function of a computer system) specifies a command operator and a first resource and wherein the second text based command (which is defined by a word or phrase whose text to speech rendering is determined to be confusingly similar to the candidate directive invoking vocal utterance) specifies the command operator and a second resource, and receiving, from a user, a second candidate directive invoking vocal utterance for invoking the directive to execute the first text based command to perform the first computer function (i.e. where the second candidate utterance is for invoking a directive to execute the same first text based command that the first candidate utterance is for invoking a directive to execute)
wherein… first text based command is a management command that specifies a command operator and a first resource and wherein… second text based command specifies the command operator and a second resource (paragraphs 28, 109, 149;
	Paragraph 109 of Patch describes where a user can speak [e.g. “Add Location Kitchen”] to add a location to a list of locations [e.g. “Kitchen”] where the locations can be spoken as part of commands.  Paragraph 109 of Patch more specifically describes where the added location [“Kitchen”] can be referenced in commands [e.g. “From Kitchen” and “Go Kitchen”] and where the list includes other locations that are at least suggested to be able to be referenced in the same commands [e.g. bathroom and basement].  Paragraph 149 further describes where a user is allowed to adds words for custom elements that may include “folder names” and “file names”, and paragraph 28 describes where a “custom list” that may be a list of locations can also be a list of “files” and/or “folders”.
Patch thus suggests where two different commands [at least suggested to be defined based on computer programming “text”] include two different command operators [e.g. Go and From] and reference the same “resource” [e.g. Kitchen])
In Agapi, however, in order to reject claim 33, the first text based command must be mapped to “one or more programmatic actions” and cannot be mapped to the new user-defined voice command, because if a second candidate directive invoking vocal utterance is received (i.e. a different new user-defined voice command), then it is not associated with the new user-defined voice command that was determined to be confusingly similar (since new user-defined voice commands are associated with programmatic actions and abstractions, and not associated with past new user-defined voice commands).  
As an example, if one candidate utterance is a spoken “Call John” and the first text based command is interpreted as the text phrase “Call John” (not the computer function of dialing John Smith’s phone number) and “Call John” is determined to be confusingly similar to an existing command “Call Jon”, a second candidate utterance “Call Smith” which is designed to invoke the same computer function of dialing John Smith’s phone number is not for invoking the first text based command which is the text phrase “Call John”.
“a word or phrase defining a second text based command” requires that the second text based command IS the word or phrase defining the second text based command, and not merely that the second text based command is invoked and executed based on the word or phrase being received and recognized.  Therefore, the text-to-speech of Blandin would be applied to the command words and not to the programmatic action(s) which the command words are associated with.
Additionally, Agapi does not specify the structure/form of the programmatic action(s), and therefore it is not clear, based on Agapi, whether the programmatic action(s) take the form of a command operator and a resource.

As per Claim 17 (and consequently its dependent claims 20, 27, and 31), the prior art of record does not teach or suggest the combination of all limitations in claim 17, including (i.e. in combination with the remaining limitations in claim 17) receiving a candidate audio data set with the candidate audio data set including: (i) a candidate text proposed for association with a candidate text based command, and (ii) audio data corresponding to a candidate text to speech rendering of the candidate text, determining, using the candidate text to speech rendering (i.e. the text to speech rendering that is included in the candidate audio data set, along with the candidate text proposed for association with a candidate text based command) that speech recognition software is likely to misidentify utterances of the candidate text as corresponding to a text based command other than the candidate text based command.
Bickley et al. (US 2003/0069729), while teaching the use of text-to-speech conversion (i.e. “rendering” text into “speech”) as part of the confusability prediction (paragraph 61) specifically teaches away from the use of TTS to compare spoken phrases (paragraphs 14 and 48).  Therefore, one of ordinary skill in the art, reading the passages in Bickley which teach away from TTS, would not find obvious a combination which includes the use of TTS.  Paragraph 12 also appears to describe where text to speech causes a system to be speaker/system speech recognition dependent and that a more reliable method to predict acoustic confusability is needed when using a combination of a text phrase and an audio file, and therefore it would not be obvious to one of ordinary skill in the art to combine Bickley with electronic synthesis based on “the voice data” received from a user or based on a user’s vocal tendencies which is designed to make the speech rendering user dependent (as per claims 6-7, whereas Bickley appears to be directed to user/system independent).
The prior art also teaches where enrollment data can include both a speech recording (i.e. not a TTS rendering) and text of the utterance(s) spoken by the speaker.  
2017/0169815 “The data used for adapting an acoustic model to a speaker is referred to herein as " enrollment data." Enrollment data may include speech data obtained from the speaker, for example, by recording the speaker speak one or more utterances in a text. Enrollment data may also include information indicating the content of the speech data such as, for example, the text of the utterance(s) spoken by the speaker and/or a sequence of hidden Markov model output states corresponding to the content of the spoken utterances”, paragraph 4; “After the trained neural network acoustic model is accessed at act 102, process 100 proceeds to act 104, where enrollment data to be used for adapting the trained neural network to a speaker is obtained. The enrollment data comprises speech data corresponding to one or more utterances spoken by the speaker. The speech data may be obtained in any suitable way. For example, the speaker may provide the speech data in response to being prompted to do so by a computing device that the speaker is using (e.g., the speaker's mobile device, such as a mobile smartphone or laptop). To this end, the computing device may prompt the user to utter a predetermined set of one or more utterances that constitute at least a portion of enrollment text. Additionally, the enrollment data may comprise information indicating the content of the utterance(s) spoken by the speaker 
2005/0071163 “For example, the user could input the text string "Welcome to the IBM text-to-speech system" in the text input field (42) and then click on the record button (43) to start recording as the user recites the same text string into the microphone in the manner in which the user wants the system to reproduce the synthesized speech. When the input utterance is complete, the user can click on the stop button (44) to stop the recording process”, paragraph 26; 
The prior art teaches “In some embodiments, details sub-node 414 includes text or recorded spoken words from the speech input, a digitized or text-to-speech version of a text input from the user, and/or the current location of user device 104 (FIG. 1) for inclusion in the automatic response” (which suggests where “details sub-node” includes text and a TTS version of a text input from a user)
2015/0045003 paragraphs 112-113
The prior art teaches determining similarity between activation words of voice recognition devices by converting words to phonetic symbol strings and then determining their edit distance, and if the activation words are the same or similar, then a warning is issued. (Comparing activation words of two different voice recognition devices)
2017/0053650 paragraphs 73-74;
David B. Roe, Michael D. Riley (“Prediction of Word Confusabilities for Speech Recognition”) teaches determining phonetic pronunciation of words from text by performing TTS and where words that have similar phonetic pronunciations are likely at the acoustic level (which seems to suggest that comparisons are not being made between speech renderings)
Roe et al. “basic idea behind predicting word confusability is simple.  Text-to-speech systems can determine the phonetic pronunciation of words from text.  Words that have similar phonetic pronunciations are likely to be confused by speech recognizers”, Section 1. Introduction; “Though there are several approaches for determining the acoustic similarity between words, we choose an approach based on phonetic pronunciation and a measure of confusability of the phonetic units rather than acoustic examples… of the words themselves.  Given two potentially similar words, we begin with their phonetic pronunciations from a text to speech synthesizer.  Then we estimate the probability that the phonetic pronunciation of the first word will be misrecognized as the second word, rather than the first… allows an estimate of confusability before recording speech utterances to find the actual pronunciations of the desired vocabulary… benefit of simplicity of calculation compared to estimates of similarity at the acoustic level… drawback that actual pronunciations may not be represented accurately by the phonetic pronunciations from a dictionary”, Section 2. Theory
The prior art describes performing speech rendering by converting an input text phrase to a synthesized speech rendering, performing text transcription and then comparing the text transcription with a list of test phrases (in order to identify acoustic similarity).  Paragraph 25 describes converting two text phrases into phoneme Roe describes phonetic distance between phonetic pronunciations determined by TTS).  Paragraph 18 has phonetic transcription corresponding to a symbolic representation of how a spoken rendering of the text should sound which seems to suggest that the phonetic transcription is not a spoken rendering, and all other instances seem to suggest the rendering is the actual audio sound, not the phonetic representation (which suggests that phonetic pronunciation is not a rendering)
Rao et al. (US 2019/0295531) “As one particular example, an input text phrase of "profit" can be input by a user. The input text phrase can be converted to an audio output corresponding to a synthesized speech rendering of the word "profit." A text transcription of the audio output can be determined. For instance, the text transcription can be a transcription that reads as "prophet," which is a homophone (e.g. phonetically similar) to the word "profit." The text transcription can be compared against a list of test phrases to identify a match between the text transcription and one or more of the test phrases. If the list of test phrases includes the word "prophet," a match can be found, and the input text phrase, "profit," can be identified as being phonetically similar to the word "prophet," as found in the list of test phrases”, paragraph 16 [supported by paragraph 16 of 62/410,564]; paragraphs 18-19
Another prior art reference also teaches receiving, from a user, an input nametag (e.g. a phrase) via microphone (obviously audio) or keyboard (obviously text) and applying TTS to text input in order to make confusability calculations.  In this reference, paragraph 58 describes calculating confusability using the TTS sequence of phonemes by comparing the “text entry sequence of phonemes” (at least suggested to be the TTS-sequence of phonemes which, similar to what was discussed above pertaining to Roe and Rao, is not necessarily a speech rendering.  Paragraph 59 further describes an example of phoneme comparison which looks like it compares data representations of phonemes (e.g. JH/IH/M and T/IH/M have a 2/3 overlap) which suggests that “sequence of phonemes” is not a speech rendering.  This reference also teaches where ASR detects presence of not just nametags but also spoken commands and numbers (paragraph 2).  Paragraph 4 also teaches where a user tries to store a nametag that sounds like an already-stored nametag, number or command, and where confusability between similar sounding words is known as a “substitution error”  Paragraph 6 also teaches where confusability scores are calculating by comparing an uttered nametag with all previously stored nametags and commands combined, and prompting the user to use a different nametag when a confusability calculation is too high (this paragraph only teaches away from using this technique for numbers).  This reference, however, does not specifically teach that TTS is also applied to the entity being compared to the input (paragraph 58 describes comparing phonemes generated by TTS to phonemes of entries already stored in at least one of the domains but does not specifically state that those phonemes of already-stored entries are also TTS electronically synthesized from the entries [i.e. “rendered” into “speech” from the entries]).  This reference is particularly towards storing nametags but may suggest (but does not specifically describe) where a user is trying to enter a new command (for claim 1).  
not appear to teach where the text and the TTS speech rendering are part of the same entity.  Additionally, since the phonemes are not necessarily speech renderings (in the acoustic level audio signal sense) it is also not clear that a set including the text and a TTS speech rendering is received (as opposed to text and a phonetic text representation).
Additionally, in this reference, the TTS phoneme comparison is directed to the text-independent embodiment, which involves inputting a new nametag by typing text and performing TTS on the typed text, whereas the text-dependent embodiment which involves entering a new nametag by microphone makes confusability determinations by using SLMs and comparing confidence levels for a nametag/number/command domain to thresholds (i.e. the text-dependent embodiment appears to be based on speech recognition techniques and not TTS-based phoneme comparison, and thus this reference does not read on claim 1).
Chengalvarayan et al. (US 2011/0288867) “nametag input for the nametag is received from the user and processed… receive the nametag input via the microphone… other examples… alphabetical or alpha-numerical keyboard”, paragraph 50; “speaker-independent… input is a text entry from the user… TTS… converts the text entry to a sequence of phonemes”, paragraph 51; “speaker-dependent… nametag input is an utterance from the user”, paragraph 52; “nametag confusability”, paragraph 55; “confusability of the nametag input is calculated with previously stored nametags”, paragraph 56; “confusability calculation can be based on a comparison of the text entry sequence of phonemes to phonemes of entries already stored in at least one of the domains… TTS… convert the text entry into the sequence of phonemes… confusability 
The prior art teaches keyword spotting that performs text-to-speech on keywords and then correlates the text-to-speech audio signals of keywords to event audio using acoustic similarity measures to spot keyword occurrences.
Blandin (US 2017/0169816) paragraph 40;
The prior art teaches comparing a name and determining that a name is too similar to a name already on a speed dial list
	Gammel et al. (US 5832429) “request for a new template is received it is determined if the list of speed dial names is full (Step 201) and is not it is determined if that name is too similar (Step 205) to a name already on the speed dial list. If so, that name is rejected but if not it is determined if the speed dial name is too short (Step 302), and if not; too short or if the user wants to enter the short name the system asks the user to repeat the speed dial name and if a match it is entered. If not a match the system will swap the first and second utterance and compare to see if a match”
The prior art teaches receive an utterance and compare the utterance with pre-existing commands in at least one speech recognition grammar (paragraph 26) determining if the provided utterance is potentially ambiguous or acoustically similar to a pre-existing command, and if so, determining a substitute (paragraph 27) and providing a substitute that is dissimilar to pre-existing commands and presenting a notice that the utterance is potentially confusing and the option to use the determined substitute teaches away from receiving a subsequent different voice command after a user-defined command is similar to an existing command (paragraph 6) and therefore cannot be applied to reject claim 1.
2008/0133244 paragraphs 10, 21, 26, 27, 28, 29
Paragraphs 20-21 describe where a speaker provide, via a microphone, a spoken utterance meant to be associated as a user-defined command, and where the spoken utterance that is meant to be associated as a user-defined command is analyzed to determine if the spoken utterance is acoustically similar to any existing commands contained in the command store, which can include user defined commands and/or system defined commands, and where the commands can each be associated with a set of programmatic actions to be performed whenever a user issues a corresponding command.  Paragraph 5 further describes examples of acoustically similar speech commands [including one user-defined speech command] which are at least suggested to perform a corresponding computer function [i.e. mail check or spell check].  Paragraphs 38-39 describes a computer system embodiment which is controlled by loading and executing a computer program, where computer programs are any expression, in any language, code, or notation, of a set of instructions intended to cause a system to perform a particular function.

“responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…a word or phrase defining a second… command, communicating, to the user, information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to…the word or phrase defining the second… command”: Paragraphs 20-21 describe where a speaker provide, via a microphone, a spoken utterance meant to be associated as a user-defined command, and where the spoken utterance that is meant to be associated as a user-defined command is analyzed to determine if the spoken utterance is acoustically similar to any existing commands contained in the command store, which can include user defined commands and/or system defined commands, and where the commands can each be associated with a set of programmatic actions to be performed whenever a user issues a corresponding command.  Paragraph 5 further describes examples of acoustically similar speech commands [including one user-defined speech command] which are at least suggested to perform a corresponding computer function [i.e. mail check or spell check].  Paragraphs 38-39 describes a computer system 
	The prior art teaches confusable commands, including where examples are “delete this voicemail” and “repeat this voicemail”, “read it” and “delete it” and “get rid of it”, 
2011/0224972 “After selecting one of the remaining unprocessed menu elements in the FSM document 112, the build system 118 uses the language-neutral GRXML document 120 and the localized response document to identify responses for the selected menu element (608). After identifying the responses for the selected menu element, the build system 118 determines whether the responses for the selected menu element pass an acoustic confusability test (610). The acoustic confusability test determines whether two or more of the responses are acoustically confusable. Two responses are acoustically confusable when the IVR system 110 would perform different actions in response to the responses and there is a significant possibility that the IVR system 110 would confuse one of the responses for the other response. For example, the IVR system 110 could potentially confuse the words " delete" and "repeat" because these words have similar sounding endings. The responses for the selected menu element pass the acoustic confusability test if none of the responses for the selected menu element are acoustically confusable with any other one of the responses for the selected menu element”, paragraph 100; 
Doyle (US 2003/0125945) “For example, if "read it" is being confused with "delete it" because the two phrases are acoustically similar, then the system would, for example, remove " delete it" from the grammar's vocabulary and substitute it with "get rid of it". The phrase "get rid of it" is not acoustically similar to "read it" and therefore cannot be as easily confused by the system”, paragraph 99;
2008/0221896 “delete this voicemail… confused with ‘repeat this voicemail’”, paragraph 6
2019/0027138 teaches where the same word is already used to wake another command (“The predefined use can be determined by looking up existing commands. For example, if "Gort" has already been coded as a command for turning on a microwave, then using it as a wake-up utterance for the command hub 104 is likely to cause confusion”, paragraph 33;)
	10699706 appears to teach recognizing a list corresponding to input speech so that the system can disambiguate who the user is trying to refer to, and doesn’t seem to be generating speech renderings of the generated list.  This reference does include a command called Kitchen in Figure 1.  (“In the illustrative embodiment, individual 2 may seek to establish a communications session with a particular device associated with a user account for individual 2 (e.g., a device named "Kitchen") using voice activated electronic device 100a. In some embodiments, a user account associated with individual 2 may have a contact named "Kitchen" as well as a number of devices nicknamed "Kitchen." Thus, it may be necessary for computing system 300 to perform various processing methods to determine whether to establish a communications session between electronic device 100a and a contact named "Kitchen" or a device 
7313525 teaches comparing a user-identified bookmark name to existing bookmark names and grammars, and also comparing a list of suggested bookmark names to compare with existing bookmark names and grammars (“In a further effort to improve the accuracy of bookmark recognition, the system may also include functionality that compares an elected bookmark name with existing bookmark names and grammars to ensure there is no confusion. In one example, once the user identifies a bookmark name the system provisionally accepts the bookmark name. However, before entering the bookmark name into the user profile, the system compares the provisionally accepted bookmark name with the existing bookmark names and grammars. If there is no conflict, then the bookmark name is finally accepted and added to the user profile. If, however, the system identifies a conflict with existing bookmark names or grammars the system may then prompt the user to select another bookmark name. Another example may have the system checking for conflicts before it presents the list of suggested bookmark names to the user. Specifically, the system after retrieving the list of suggested bookmark names from the application may compare the suggested list against the existing bookmark names and grammars in the user profile. If the system identifies a conflict, the conflicting bookmark name from the proposed list will not be presented to the user. There are other examples that can function individually or in combination to minimize potential conflict between bookmark names and the associated potential inaccurate recognition”;)
6535848 teaches listing a set of file names from which a user can select (“Display screen 700 is displayed on the transcription computer monitor. Display screen 700 desirably lists a set of file names 702 from which the user can select. Display 
2007/0016420 (IBM reference) describes constructing a list of alternative letter sequences by replacing letters in a sequence with similar sounding letters. (“The probabilities of mistaking one letter for another are typically represented as a matrix, which is called a "confusion matrix." The probability of interchanging letters belonging to different letter classes is assumed to be small. When using letter classes, the post processor constructs the list of alternative letter sequences by replacing each letter of the best ranking sequence with similarly-sounding letters, according to the letter classes described above. The post processor typically ranks the list, for example by computing likelihood scores based on the confusion matrix”, paragraph 66;)
2005/0203741 teaches comparing a letter sequence with a list of allowable words (“system compares each letter sequence with a list of allowable words and identifies the spelled identifier as soon as the list is reduced to a single identifier”, paragraph 3;)
5710864 teaches callers vocalizing utterances that are similar to an employee name (“Conventional recognizers often have difficulty verifying the occurrence of keywords in an unknown speech utterance. Typical automated inbound telephone applications, for example, route incoming calls to particular employees by recognizing a caller's utterance of an employee's name and associating that utterance with the employee's extension. Callers often pronounce names that are not in the directory, or worse, vocalize utterances that are phonetically similar to a different employee's name, causing the caller to be routed to the wrong party”)
8380514 (Figures 1-2) also more specifically teaches where a user is notified that an utterance is “potentially confusing” and provides a substitute, and receiving an input from a user that either refuses or accepts the substitute.  This reference, however, does not describe the manner in which ambiguity and acoustic similarity is determined.

Upon further search (in response to the amendment filed 9/27/2021):
6839670 teaches where a user/speaker can set up or edit a personal vocabulary in the form of name lists, function lists, etc., and suggests a user “enrolling” a spoken name as a means for dialing a particular phone number (col. 5, lines 11-32) and where there are a plurality of user-specific name lists which are set up, including a list for storing telephone numbers under predetermined name/abbreviations, and a list for storing function names for commands or command sequences” (col. 20, lines 1-10) and where each user can set upon his/her own name lists or abbreviation lists (col. 17, lines 32-34).  This reference appears to suggest where a user can have a personal list of function names (for executing commands) and a personal list of names, but this reference does not appear to specifically describe where spoken name is compared to multiple lists to determine confusability.
7110948 teaches “A speech recognition system in a mobile telephone, the speech recognition system comprising: means for storing a word vocabulary in trellis tree structure, wherein words in the vocabulary are arranged in a plurality of different groups of words, word group selection means for enabling a user to speak via voice commands into the mobile telephone to select a first of said plurality of different groups of words, said first group of words being selected based upon at least a word spoken by 
6584439 teaches “The help function is context sensitive--whenever Help is requested, the voice controlled device responds with a description of the available options, given the current context of the voice controlled device. If Help is requested when the voice controlled device is listening for a command, the voice controlled device will respond with its state and the list the commands that it can respond to (e.g. "At Main menu. You can say . . . ") Further detail on any specific command can be obtained with the "Help <command>" syntax (e.g. "Help Dial", "Help Call", and even "Help Help"). If "Help" is requested while the voice controlled device is waiting for some type of non-command response (e.g. "Say the name"), then the voice controlled device will respond with a statement of the voice controlled device's current status, followed by a description of what it is waiting for (e.g. "Waiting for user response. Say the name of the person whose phonebook entry you wish to create, or say Nevermind to cancel.").”.  This reference describes where syntax for a command is a function followed by one of a plurality of command words that can be referenced by the function command.
5754977 teaches determining a list of words which are closest to a “current output” (Figure 3) which is the same word as a word that a user desires to add to the list of words available for subsequent recognition (col. 3, line 54 – col. 4, line 18) and at least suggests adding a word to a data base list if the word that the user desires to add is sufficiently “distant” from the words that are already in the list and where the system prompts for possible new input if the word that the user desires to add is insufficiently “distant” (Figure 4, col. 4, lines 19-59).  This reference also describes where words can be data representing pictures, phrases, graphics, charts, voice prints, etc. (col. 3, lines 48-53) which suggests where a newly added word is compared to different types of words.
2007/0005372 teaches “In a very large vocabulary, such as for example a list of names of all cities in Germany, there is the problem that the addition of other words, for which recognition is activated parallel to this list, leads to a higher probability of a mix-up. This means, that supplemental commands, which are active in parallel, are often confused with city names. The recognition of larger vocabularies is particularly difficult with large dynamic loaded lists; these lists could be either static lists such as city names or also dynamic lists such as text or voice enrollments. It is here difficult to define in advance what size of resources the speech recognition system must have allocated to it in order to be able to evaluate sufficient numbers of alternatives in the case of similar words” (paragraph 5).  This reference describes where commands can be confused with city names, but does not appear to describe where the system determines that a candidate new command is determined to be confusable with a city name.
2012/0192096 teaches “The active command line driven user interface provides forgiveness in that each command name has a number of aliases which may be alternate names or phrases. A command alias may be shared between two or more command names. If multiple commands share the same alias, the multiple commands will all appear in the command list 512. This obviates the need for users to know the command name for a desired action, or to navigate through a list of available commands in a user interface menu. The user need only type what they want naturally. If the input provided by the user matches any one of the command aliases, that command is displayed in the command list 512. For example, as shown in FIG. 5, if the user types the command alias "Dial" in the command line 504 the proper command name "Call" is displayed in the list in the command list 512. As shown in FIG. 6, if the user types the command alias "Show", the proper command name "Browse" and "Map" commands are displayed in the list in the command list 512. Displaying the proper command name in response to entry of a command alias may help users learn the proper command names overtime as a result of being repeatedly presented with the proper command names in response to entry of a command alias. This may reduce the amount of alias handling over time, thereby reducing processing demands on the portable electronic device” (paragraph 113).  This reference describes where a word command can have an alias (a different word that is associated with that command).
2006/0064177 teaches “Sample Phonebook Situation: A particular phonebook can include the names "Bill Clinton," "George Bush," "Tony Blair" and "Jukka Hakkinen." In the event that the user wishes to add the new name "John Smith," it may not be confused with any of the existing words due to the very low degree of similarity with the 
8380758 teaches “In some environments, commands may be mapped to longer, more descriptive names to reduce the likelihood of confusion with other commands. For example, the UNIX command "ls-la", which instructs a command terminal to list the full details of all files (including hidden files) in a directory, could hypothetically be represented by the command string "list_directory_contents-all_files" instead. However, not only would such longer commands be more tedious and time-consuming for users to type, but their length might result in frequent mistakes”.  This reference describes associating a sequence of words with a UNIX command (a sequence of characters which can be interpreted as a “text-based command”).
5987411 teaches “FIG. 2 is a flow diagram showing an enrollment method implemented by VAD system 100 for testing the confusability and inconsistency of a candidate phrase. FIG. 2 shows that VAD system 100 first determines whether dictionary 150 is full (step 210). If dictionary 150 is full, interface unit 110 directs the user to delete an old phrase before adding a new one to dictionary 150, and then .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 5, 7, 9, 10, 12, 23, 29, 32-38, are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 3, 5, 7, 8, 10, and 11, of U.S. Patent No. 10,586,537, hereafter Parent Patent, in view of Kovales et al. (US 2002/0110248), hereafter Kovales.

As per Claim 1, Claim 2 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches A method comprising: receiving, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system; (Claim 1 of the Parent Patent, first limitation)
and responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a… speech rendering of a word or phrase defining a second text based command, communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command (Claim 1 of the Parent Patent, 2nd limitation)
wherein the first text based command specifies a command operator and a first resource and wherein the second text based command specifies the command operator and a second resource (Claim 2 of the Parent Patent)
Claims 1-2 of the Parent Patent do not, but Kovales teaches text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)


As per Claim 5, Claim 3 of the Parent Patent teaches its limitations.
As per Claim 7, Claim 5 of the Parent Patent teaches its limitations.
As per Claim 9, Claim 7 of the Parent Patent teaches its limitations.
As per Claim 10, Claim 8 of the Parent Patent teaches its limitations.
As per Claim 12, Claim 10 of the Parent Patent teaches its limitations.

As per Claim 23, Claims 1-2 of the Parent Patent, in view of Kovales, do not, but Claim 7 of the Parent Patent teaches its limitations except for where the examined resources are resources “of the computer system”.
Claim 7 suggests combining its limitations with the limitations of Claims 1-2 because Claim 7 teaches combining its limitations with the limitations of Claim 1 and 
Claims 1-2 and 7 of the Parent Patent, in view of Kovales, do not, but Claim 11 teaches where resources are resources of the computer system (first limitation).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of resource with another because Claims 1, 2, and 7 of the Parent Patent, in view of Kovales, teaches the claimed invention except for the substitution of a resource which is not necessarily a resource of the computer system with a resource which is.  Claim 11 of the Parent Patent teaches that resources of the computer system were known in the claims.  One of ordinary skill in the art could have substituted one type of resource with another to obtain the predictable results of a computer system that examines resources (Claim 7 of the Parent Patent) where the resources are resources of the computer system (Claim 11 of the Parent Patent).

As per Claim 29, Claims 1-2 of the Parent Patent, in view of Kovales, do not, but Claim 7 of the Parent Patent teaches its limitations except for where the examined resources are resources “of the computer system”.
Claim 7 suggests combining its limitations with the limitations of Claims 1-2 because Claim 7 teaches combining its limitations with the limitations of Claim 1 and Claim 2 (interpreted as incorporating the limitations of claim 1) include the limitations of claim 1.

Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of resource with another because Claims 1, 2, and 7 of the Parent Patent, in view of Kovales, teaches the claimed invention except for the substitution of a resource which is not necessarily a resource of the computer system with a resource which is.  Claim 11 of the Parent Patent teaches that resources of the computer system were known in the claims.  One of ordinary skill in the art could have substituted one type of resource with another to obtain the predictable results of a computer system that examines resources (Claim 7 of the Parent Patent) where the resources are resources of the computer system (Claim 11 of the Parent Patent).

As per Claim 32, Claims 1-2 of the Parent Patent do not, but Kovales suggests wherein the speech rendering of the word or phrase defining the second text based command is a text to speech rendering of the word or phrase defining the second text based command (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claims 1-2 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was 

	As per Claim 33, Claim 2 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches …and wherein the method includes receiving, from a user, a second candidate directive invoking vocal utterance for invoking the directive to execute the first text based command to perform the first computer function (Claim 1 of the Parent Patent, last limitation)
Claims 1-2 of the Parent Patent do not, but Kovales suggests wherein the speech rendering of the word or phrase defining the second text based command is a text to speech rendering of the word or phrase defining the second text based command (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claims 1-2 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was known in the art.  One of ordinary skill in the art could have substituted one type of rendering with another to obtain the predictable results of a system which determines 

As per Claim 37, Claim 2 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command includes information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command (2nd to last limitation of Claim 1 of the Parent Patent, the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the speech rendering of the word or phrase defining the second text based command communicates the information to the user [and thus the communicating step includes the information which is directed to the user])
Claims 1-2 of the Parent Patent do not, but Kovales teaches text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claims 1-2 of the Parent Patent teaches the claimed invention 

As per Claim 34, Claim 7 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches A method comprising: receiving, voice data defining a candidate directive invoking vocal utterance for invoking a directive to execute a first text based command to perform a first computer function of a computer system; (Claim 1 of the Parent Patent, first limitation)
and responsive to determining that a word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a… speech rendering of a word or phrase defining a second text based command, communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command, (Claim 1 of the Parent Patent, 2nd limitation)
wherein the method includes examining resources to generate a list that includes a plurality of resource names that can be referenced in text based commands that the computer system is configured to execute and wherein the determining includes determining whether the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to a speech rendering of a resource name of the plurality of resource names (Claim 7 of the Parent Patent).
Claims 1 and 7 of the Parent Patent do not, but Kovales teaches text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claims 1 and 7 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was known in the art.  One of ordinary skill in the art could have substituted one type of rendering with another to obtain the predictable results of a system which determines confusing similarity based on a speech rendering of a word or phrase defining a second text based command (as per Claims 1 and 7 of the Parent Patent) where the rendering results from text-to-speech transformation (as per Kovales).

As per Claim 35, Claim 7 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches and wherein the method includes receiving, from a user, a second candidate directive invoking vocal utterance for invoking the directive to execute the first text based command to perform the first computer function (Claim 1 of the Parent Patent, last limitation).

As per Claim 36, Claim 10 of the Parent Patent teaches its limitations.

	As per Claim 38, Claim 7 of the Parent Patent (interpreted as incorporating the limitations of Claim 1 of the Parent Patent) teaches wherein the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command includes communicating information to a user indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the… speech rendering of the word or phrase defining the second text based command (2nd to last limitation of Claim 1 of the Parent Patent, the communicating information indicating that the word or phrase of the candidate directive invoking vocal utterance sounds confusingly similar to the speech rendering of the word or phrase defining the second text based command communicates the information to the user)
Claims 1 and 7 of the Parent Patent do not, but Kovales teaches text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claims 1 and 7 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was 

Claim 39 is rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 7 of the Parent Patent, in view of Kovales, as applied to claim 34, above, and further in view of Claim 10 of the Parent Patent, Dodrill et al. (US Patent 6,901,431), hereafter Dodrill, and King (US Patent 6,532,446). 

As per Claim 39, Claim 7 of the Parent Patent teaches wherein the examining resources includes examining… (examining resources includes examining resources)
Claims 1 and 7 of the Parent Patent, in view of Kovales, do not, but Claim 10 of the Parent Patent suggests wherein the examining resources includes examining files and directories… (Claim 10 of the Parent Patent describes where resources are selected from the group consisting of file resources and directory resources, which suggests where some of the examined resources of Claim 7 of the Parent Patent can be file resources [suggested to be files] and others of the examined resources can be directory resources [suggested to be directories])
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one set of resources with another and another set of resources with another because Claims 1 and 7 of the 
Claims 1, 7, and 10 of the Parent Patent, in view of Kovales, do not, but Dodrill suggests wherein the examining resources includes examining files and directories… (“personalized… user-specific] XML documents are stored in user-specific directories separate from the generic XML documents stored in the XML application and functions database”, col. 8, lines 21-43;
Dodrill describes user-specific XML documents [which are at least suggested to be files] and user-specific directories [col. 8, lines 21-43].
Dodrill suggests where the file resources [suggested to be files] and directory resources [suggested to be directories] in the combination of Claims 1, 7, and 10 of the 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of type of file with another and one type of directory with another because Claims 1, 7, and 10 of the Parent Patent, in view of Kovales, teaches the claimed invention except for the substitution of files which are not necessarily user-specific with files which are, and the substitution of directories which are not necessarily user-specific with directories which are.  Dodrill teaches where user-specific files and user-specific directories were known in the art.  One of ordinary skill in the art could have substituted one type of file with another and one type of directory with another to obtain the predictable results of a system which determines confusing similarity based on a speech rendering of a word or phrase defining a second text based command and which examines resources (as per Claims 1 and 7 of the Parent Patent) where the rendering results from text-to-speech transformation (as per Kovales) where the examined resources include file resources and directory resources (as suggested by Claim 10 of the Parent Patent) where the file resources and the directory resources are user-specific data (as per Dodrill).
	Claims 1, 7, and 10 of the Parent Patent, in view of Kovales and Dodrill, do not, but King suggests wherein the examining resources includes examining files and directories accessible with use of a subscriber ID (“subscriber ID may be associated with, and utilized to, access the user specific files… associated with a particular user or device”, col. 9, lines 53-65;

	King suggests where the user-specific files and user-specific directories in the combination of Claims 1, 7, and 10 of the Parent Patent, in view of Kovales and Dodrill, are user-specific data that is “accessible with use of a subscriber ID”)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of type of user-specific data with another because Claims 1, 7, and 10 of the Parent Patent, in view of Kovales and Dodrill, teaches the claimed invention except for the substitution of user-specific data which is not necessarily accessible with use of a subscriber ID with user-specific data which is.  King teaches that user-specific data which is accessible with use of a subscriber ID was known in the art.  One of ordinary skill in the art could have substituted one type of user-specific data with another to obtain the predictable results of a system which determines confusing similarity based on a speech rendering of a word or phrase defining a second text based command and which examines resources (as per Claims 1 and 7 of the Parent Patent) where the rendering results from text-to-speech transformation (as per Kovales) where the examined resources include file resources and directory resources (as suggested by Claim 10 of the Parent Patent) where the file resources and the directory resources are user-specific data (as per Dodrill) where the user-specific data is accessible with use of a subscriber ID (as per King).

Claims 17, 20, 27, 31, rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 7, 10, 11, 15, 20 of U.S. Patent No. 10,586,537, hereafter Parent Patent, in view of Kovales et al. (US 2002/0110248), hereafter Kovales.

As per Claim 17, it is broader than Claim 15 of the Parent Patent. (limitations 1-3 read on limitations 1-3, respectively) except for where the speech renderings are, more particularly, text to speech renderings.
Kovales teaches text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because the claim 15 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was known in the art.  One of ordinary skill in the art could have substituted one type of rendering with another to obtain the predictable results of a system which determines confusing similarity based on a speech rendering (as per Claim 15 of the Parent Patent) where the rendering results from text-to-speech transformation (as per Kovales).

As per Claim 20, Claim 20 of the Parent Patent teaches its limitations (except for where the candidate speech rendering is a candidate text to speech rendering).
text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because claims 15 and 20 of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was known in the art.  One of ordinary skill in the art could have substituted one type of rendering with another to obtain the predictable results of a system which determines confusing similarity based on a speech rendering (as per Claim 15 of the Parent Patent) where the rendering results from text-to-speech transformation (as per Kovales).

As per Claim 27, Claim 20 (interpreted as incorporating the limitations of claim 15) of the Parent Patent teaches wherein the determining includes comparing the candidate… speech rendering to a plurality of speech renderings respectively corresponding to the already defined text based commands.
Claim 20 of the Parent Patent does not, but Claim 7 of the Parent Patent suggests examining resources… to identify already defined text based commands… (claim 20 describes where entities compared to the system input are “already defined text based commands”, and Claim 7 teaches examining resources to generate a list of entities that are compared to the system input, and so suggests where the “already defined text based commands of claim 20 are identified by examining resources of the computer system)

Claims 20 and 7 of the Parent Patent do not, but Claim 1 of the Parent Patent teaches where the text based commands are commands of the computer system (first limitation.
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of command with another because claims 15, 20, and 7 of the Parent Patent teaches the claimed invention except for the substitution of a command which is not necessarily a command of the computer system with a command which is.  Claim 1 of the Parent Patent teaches that a command of the computer system was known in the claims.  One of ordinary skill 
Claims 20, 7, and 1, do not, but Claim 11 of the Parent Patent teaches resources of a computer system (first limitation)
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of resource with another because Claims 15, 20, 7, and 1, of the Parent Patent teaches the claimed invention except for the substitution of a resource which is not necessarily a resource of the computer system with a resource which is.  Claim 11 of the Parent Patent teaches that resources of the computer system were known in the claims.  One of ordinary skill in the art could have substituted one type of resource with another to obtain the predictable results of a system which compares a candidate speech rendering to a plurality of speech renderings respectively corresponding to already defined text based commands (as per Claims 15 and 20 of the Parent Patent) where the already defined text based commands are identified by examining resources (as per Claim 7 of Parent Patent 1) where the text based commands are commands of the computer system (as per Claim 1 of Parent Patent 1) where the resources are resources of the computer system (as per Claim 11 of Parent Patent 1).
text to speech rendering (“audio rendering resulting from a text-to-speech transformation”, paragraph 36)
	Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of rendering with another because claims 15, 20, 7, 1, and 11, of the Parent Patent teaches the claimed invention except for the substitution of a rendering which is not necessarily a text to speech rendering with a rendering which is.  Kovales teaches that text to speech rendering was known in the art.  One of ordinary skill in the art could have substituted one type of rendering with another to obtain the predictable results of a system which compares a candidate speech rendering to a plurality of speech renderings respectively corresponding to already defined text based commands (as per Claims 15 and 20 of the Parent Patent) where the already defined text based commands are identified by examining resources (as per Claim 7 of Parent Patent 1) where the text based commands are commands of the computer system (as per Claim 1 of Parent Patent 1) where the resources are resources of the computer system (as per Claim 11 of Parent Patent 1) where the rendering results from text-to-speech transformation (as per Kovales).

As per Claim 31, Claim 15 of the Parent Patent teaches the text based command other than the candidate text based command and Claim 10 of the Parent Patent teaches where a text based command is a file or directory management command (which suggests where the text based command other than the candidate text based 
Therefore, it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform a simple substitution of one type of text based command with another because claim 15 of the Parent Patent teach the claimed invention except for the substitution of a text based command which is not necessarily selected from the group consisting of a file management command and a directory management command with a text based command which is.  Claim 10 of the Parent Patent teaches that a text based command which is selected from the group consisting of a file management command and a directory management command was known in the claims.  One of ordinary skill in the art could have substituted one type of text based command with another to obtain the predictable results of a system which determines that speech recognition software is likely to misidentify utterances of the candidate text as corresponding to a text based command other than the candidate text based command (as per Claim 15 of the Parent Patent) where the text based command is a file or directory management command (as per Claim 10 of the Parent Patent).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 11/11/2021
/ERIC YEN/Primary Examiner, Art Unit 2658