DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 28, 2019 is being considered by the examiner.

Claim Objections
Claim 9 objected to because of the following informalities:  
Claim 9, line 3 – the phrase “the plurality of user-defined invocations” should be “the plurality of user-defined invocation phrases”.  
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.



As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been 

Claims 1-7, 9, 11, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh (U.S. Pat. App. Pub. No. 2020/0152186, hereinafter Koh) in view of Aggarwal (U.S. Pat. App. Pub. No. 2020/0258508, hereinafter Aggarwal)

Regarding claim 1, Koh discloses A method for processing spoken requests, performed by an electronic device having one or more processors and memory, the method comprising (“method 300 performed by the communication device 105”; Koh, ¶ [0040]): at the electronic device (the method steps are performed by the communication device 105, thus at the electronic device; Koh, ¶ [0040]): receiving audio input containing a user utterance (“At block 305, the electronic processor 205 of the communication device 105 receives a first voice command via the microphone 220,” where the microphone provides audio input which contains the first voice command (a user utterance).; Koh, ¶ [0041]); determining, from the audio input, a text representation of the utterance (“At block 310, the electronic processor 205 analyzes the first voice command using a first type of voice recognition. In some embodiments, the electronic processor 205 uses speech-to-text techniques to convert the first voice command into text.”; Koh, ¶ [0042]); in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases (though not expressly disclosed as user-defined,” the method includes “determin[ing] that an action to be performed in accordance with the first voice command is unrecognizable based on the analysis using the first type of voice recognition,” where unrecognizable is defined as any “converted text or voice data of a voice command [that does not] exactly [match]... a supported voice command”; Koh, ¶¶ [0043]-[0044]), determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user- defined invocation phrases satisfies one or more rule-based conditions (The method can include a “first type of voice Koh, ¶¶ [0043]-[0045]). However, Koh fails to expressly recite a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent.

Aggarwal teaches a digital assistant application for parsing audio input signals. (Aggarwal, ¶ [0015]). Regarding claim 1, Aggarwal teaches a user-defined invocation phrase… (the method includes “the NLP component 114 matches the text to words that are associated, for example via training across users or through manual specification, with actions…” where matching words which are associated by “manual specification” is a user-defined invocation Aggarwal, ¶ [0045]) in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, (The method includes “the regular expressions 128 can include rules about when the voice-based session between the client devices 104 and the data processing system 102 is to include the navigation application 110 and the navigator service 106,” where the rules about the voice-based session are the rules-based conditions and the “NLP component 114 can determine whether one or more keywords identified from the input audio signal (text representation) references at least one function of the navigation application 110 using the regular expressions 128 for the navigation application 110,” where the functions are associated with specified keywords from the regular expressions 128, thus the user-defined invocation phrase, and where referencing at least one function of the navigation application is satisfying the one or more rule-based conditions; Aggarwal, ¶¶ [0039], [0050]); processing the text representation and the user-defined invocation phrase using a machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶ [0044]-[0045], [0047]) to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase (The system discloses determining the semantic distance (semantic equivalence), using the NLP component 114, “between the referential keywords (text representation) and the identifiers of the point location,” and determining an indexical measure (score representing a degree of semantic equivalence) using semantic distance “between the referential keywords (text representation)” and “node[s] corresponding to the identifier for the point location (keywords or phrases)” on a semantic knowledge graph, where each of nodes “corresponds to a keyword or phrase,” therefore providing a score representing the degree of Aggarwal, ¶ [0070], [0072]); in accordance with a determination that the determined score satisfies a threshold condition, (“The NLP component 114 can apply the semantic analysis technique to calculate or determine an indexical measure between the corresponding identifier for the point location and the referential keywords” where “NLP component 114 can compare the indexical measures between each referential keyword (text representation) and the identifier for each point location (user-defined invocation phrase) to a threshold measure,” where the threshold measure are the threshold condition.; Aggarwal, ¶¶ [0072], [0073]); performing a predefined task flow corresponding to the user-defined invocation phrase, (“Responsive to the determination that at least one indexical measure is less than or equal to the threshold measure, the NLP component 114 can determine at least one referential keyword refers to one of the point locations within the reference frame,” where the determination is the predefined task flow.; Aggarwal, ¶ [0073], [0054]); wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows (“predefined keywords can include a function identifier” [or] “parameters for… carry[ing] out the request corresponding to the function” and “each function identifier...can be associated with one of the functions (a respective predefined task flow) of the navigation application 110,” where the predefined keywords includes “keywords associated with points of interest (also referred to as the point locations, thus the plurality of user-defined invocation phrases)” which correspond to a function of the [plurality of] functions (a plurality of predefined task flows).; Aggarwal, ¶ [0051]); and in accordance with a determination that the determined score does not satisfy the threshold condition: (“Responsive to the determination that all the indexical measures are greater than the threshold measure,” where greater than the threshold measure indicates that the indexical measure (the determined score) does not satisfy the threshold measure (threshold condition), “the NLP component 114 can determine that the referential keywords do not refer to any point locations within the reference Aggarwal, ¶ [0073]); performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation (“In identifying the one or more point locations, the NLP component 114 can search for other keywords related to the referential keywords identified in the input audio signal... based on content or preferences the data processing system 102 received from the client device 104,” where the other keywords are the actionable intent, where the NLP component 114 performs natural language processing, and where the “one or more point locations [are] outside the reference frame.”; Aggarwal, ¶¶ [0047], [0077], [0093]); and performing a task flow corresponding to the actionable intent (“Responsive to identifying point locations outside the initial reference frame, the location finder component 140 can modify the reference frame to include the point location with the identifier matching the referential keywords,” where the modification of the reference frame is the task flow corresponding to identifying point locations outside the initial reference frame in light of “other keywords” (the actionable intent); Aggarwal, ¶ [0094]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 2, the rejection of claim 1 is incorporated. Koh further discloses further comprising: in accordance with a determination that the text representation exactly matches one of the plurality of user-defined invocation phrases (The system includes “exact … keyword matching such that... [when] the voice command exactly matches… a supported voice command that is locally stored in the memory 210 of the communication device 105 or of an IoT device 115... the electronic processor 205 recognizes the exact spoken voice command ...and executes a corresponding action in response”; Koh, ¶ [0043]), performing a second predefined task flow corresponding to the one of the plurality of user-defined invocation phrases (In response to recognizing the exact spoken voice command “the electronic processor 205 ...executes a corresponding action in response,” where the corresponding action is the second predefined task flow corresponding to the spoken voice command (one of a plurality of user-defined invocation phrases).; Koh, ¶ [0043]).

Regarding claim 3, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. However, Koh fails to expressly disclose further comprising: in accordance with a determination that a comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule- based conditions, forgo processing the text representation through the machine-learned model.

Aggarwal is disclosed above with reference to claim 1. Regarding claim 3, Aggarwal further discloses further comprising: in accordance with a determination that a comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule- based conditions (“In determining whether the one or more keywords reference at least one function of the navigation application 110, the NLP component 114 can compare the one or more keywords against the regular expression 128,” if there is no match between the one or more keywords and “first set of predefined keywords specified by the regular expression 128,” the comparison does not satisfy the rules of the “regular expression 128”, one or more rule-based conditions.; Aggarwal, ¶ [0052]), forgo processing the text representation through the machine-learned model. (“Responsive to determining no match” between the one or more keywords and the regular expression 128, “the NLP component 114 can determine that the input audio signal instead references one of the functions of the digital assistant application 108. The digital assistant application 108 can perform further processing with the keywords to carry out the request,” thus the method forgoes processing by NLP component 114, including the machine learned model, and processing is transferred to the digital assistant application 108.; Aggarwal, ¶ [0052]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include further comprising: in accordance with a determination that a comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule- based conditions, forgo processing the text representation through the machine-learned model. The systems and methods described here “provide an improved guided interaction” for natural language processing while interacting with remote services more efficiently, as recognized by Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 4, the rejection of claim 3 is incorporated. Koh and Aggarwal disclose all elements as described above. Koh further discloses further comprising: in accordance with a determination that the comparison between the text representation and each of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions (Using the rule-based conditions that the word is not a “near exact match” and that the word/phrase is not a “supported voice command locally-stored by... [another] IoT device 115… within communication range of the IoT device 115,” when the method determines that the first voice command is not a “near-exact match” and/or is not supported in “a list of voice commands supported by nearby IoT devices 115,”; Koh, ¶¶ [0043]-[0045]): performing natural language processing on the text representation (then the method forwards the first voice command to “be analyzed using a second type of voice recognition different from the first type of voice recognition,” where “the second type of voice recognition uses a natural language processing engine”; Koh, ¶¶ [0049], [0052]) to determine the actionable intent corresponding to the text representation (“the second type of voice recognition uses a natural language processing engine … configured to determine an intent and/or content of the first voice command,” where intent of the first voice command is the actionable intent corresponding to the text representation; Koh, ¶ [0052]); and performing the task flow corresponding to the actionable intent (“based on the analysis of the first voice command using the second type of voice recognition, the remote electronic computing device determines the action to be performed in accordance with the first voice command” where the action (task flow) corresponds to the intent (the actionable intent) of the first voice command; Koh, ¶¶ [0052], [0053]).


Regarding claim 5, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. However, Koh fails to expressly disclose further comprising: in 

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 5, Aggarwal further discloses further comprising: in accordance with the determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions and (“The digital assistant application 108 can include a direct action handler component 120. The digital assistant application 108 can include a response selector component 124 to select responses to audio-based input signals,” thus including multiple inputs in an audio signal and multiple responses where the first input can include the “NLP component 114 can determine whether one or more keywords identified from the input audio signal (text representation) references at least one function of the navigation application 110 using the regular expressions 128 for the navigation application 110,” where the functions are associated with specified keywords from the regular expressions 128, thus the user-defined invocation phrase, and where referencing at least one function of the navigation application is satisfying the one or more rule-based conditions; Aggarwal, ¶¶ [0023], [0039], [0050], [0137]) in accordance with the determination that a comparison between the text representation and a second user-defined invocation phrase of the plurality of user-defined invocation phrases does not satisfy the one or more rule-based conditions (and the second input can includes “determining whether the one or more keywords reference at least one function of the navigation application 110, the NLP component 114 can compare the one or Aggarwal, ¶ [0052]): processing the text representation and the user-defined invocation phrase through the machine-learned model without processing the text representation and the second user- defined invocation phrase through the machine-learned model (The first input is processed by the NLP component, where “the NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.” However, “responsive to determining no match” between the second input and the regular expression 128, “the NLP component 114 can determine that the input audio signal instead references one of the functions of the digital assistant application 108. The digital assistant application 108 can perform further processing with the keywords to carry out the request,” thus the method forgoes processing by NLP component 114, including the machine learned model, and processing is transferred to the digital assistant application 108.; Aggarwal, ¶¶ [0044]-[0045], [0047]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include further comprising: in accordance with the determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions and in accordance with the determination that a comparison between the text representation and a second user-defined invocation phrase of the plurality of user-defined invocation phrases does not satisfy the one or Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 6, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. Koh further discloses wherein the one or more rule-based conditions include a first rule-based condition that the text representation contains a word that is also contained in the user-defined invocation phrase (Koh describes the use of exact matching as a rule-based condition where “exact keyword matching can be described as “100% keyword match,” where 100% keyword match is a rule-based condition that the voice command (text representation) contain a word that is also contained in the keyword (user-defined invocation phrase).; Koh, ¶ [0011]). 

Regarding claim 7, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. However, Koh fails to expressly disclose wherein the one or more rule-based conditions include a second rule-based condition that the text representation contains: the user-defined invocation phrase; and additional text positioned before or after the user-defined invocation phrase. 

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 7, Aggarwal further discloses wherein the one or more rule-based conditions include a second rule-based condition that the text representation contains: the user-defined invocation phrase (“The regular expression 128 can define a pattern to match to determine Aggarwal, ¶ [0050]); and additional text positioned before or after the user-defined invocation phrase (“The regular expression 128 can specify a sequence for the request and the referential keywords in the one or more keywords identified from the input audio signal,” where the referential keywords contains text both before (e.g., the request) and after (auxiliary keywords).; Aggarwal, ¶ [0050]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include wherein the one or more rule-based conditions include a second rule-based condition that the text representation contains: the user-defined invocation phrase; and additional text positioned before or after the user-defined invocation phrase. The systems and methods described here “provide an improved guided interaction” for natural language processing while interacting with remote services more efficiently, as recognized by Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 9, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. However, Koh fails to expressly disclose further comprising: in accordance with a determination that a comparison between the text representation and a third user-defined invocation phrase of the plurality of user-defined invocations satisfies the one or more rule-based conditions, processing the text representation and the third user-defined invocation phrase through the machine-learned model to determine a second score representing a degree of semantic equivalence between the text representation and the third user-defined invocation phrase; and performing the predefined task flow in accordance with the determination 

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 9, Aggarwal further discloses further comprising: in accordance with a determination that a comparison between the text representation and a third user-defined invocation phrase of the plurality of user-defined invocations satisfies the one or more rule-based conditions (“The digital assistant application 108 can include a direct action handler component 120. The digital assistant application 108 can include a response selector component 124 to select responses to audio-based input signals,” thus including multiple inputs in an audio signal and multiple responses where the first input can include the “NLP component 114 can determine whether one or more keywords identified from the input audio signal (text representation) references at least one function of the navigation application 110 using the regular expressions 128 for the navigation application 110,” where the functions are associated with specified keywords from the regular expressions 128, thus the user-defined invocation phrase, and where referencing at least one function of the navigation application is satisfying the one or more rule-based conditions; Aggarwal, ¶¶ [0023], [0039], [0050], [0137]), processing the text representation and the third user-defined invocation phrase through the machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶¶ [0044]-[0045], [0047]) to determine a second score representing a degree of semantic equivalence between the text representation and the third user-defined invocation phrase (“The NLP component 114 can apply the semantic analysis technique to calculate or determine an indexical measure between Aggarwal, ¶¶ [0072], [0073]); and performing the predefined task flow in accordance with the determination that the determined score satisfies the threshold condition and (“Responsive to the determination that at least one indexical measure is less than or equal to the threshold measure (the determined score satisfies the threshold condition), the NLP component 114 can determine at least one referential keyword refers to one of the point locations within the reference frame (the predefined task flow)” as applied to the third  user-defined invocation phrase; Aggarwal, ¶ [0073]) in accordance with the determined score being greater than the determined second score (“Having determined the indexical measures, the NLP component 114 can identify the point location with the greatest indexical measure with the one or more referential keywords. To identify multiple point locations, the NLP component 114 can identify the one or more point locations with the greatest n indexical measures in relation to the referential keywords,” thus in accordance with the determined score being greater than the determined second score; Aggarwal, ¶ [0072]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include further comprising: in accordance with a determination that a comparison between the text representation and a third user-defined invocation phrase of the plurality of user-defined invocations satisfies the one or more rule-based conditions, processing the text representation and the third user-defined invocation phrase through the machine-learned model to determine a second score representing a degree of Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 11, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all elements as described above. However, Koh fails to expressly disclose wherein the machine-learned model determines the score representing the degree of semantic equivalence between the text representation and the user- defined invocation phrase without determining an actionable intent corresponding to the text representation. 

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 11, Aggarwal further discloses wherein the machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶¶ [0044]-[0045], [0047]) determines the score representing the degree of semantic equivalence between the text representation and the user- defined invocation phrase (The system discloses determining the semantic distance (semantic equivalence), using the NLP component 114, “between the referential keywords (text representation) and the identifiers of the point location,” and determining an indexical measure (score representing a degree of semantic equivalence) using semantic distance “between the referential keywords (text representation)” and “node[s] corresponding to the identifier for the point location (keywords or Aggarwal, ¶¶ [0070], [0072]) without determining an actionable intent corresponding to the text representation (“The semantic distance (semantic equivalence)… [is] a semantic similarity or relatedness measure between the words or phrases of the nodes” thus not expressly including actionable intent, and other keywords based on content and preferences (actionable intent) are not determined if the method determines “that at least one indexical measure is less than or equal to the threshold measure”; Aggarwal, ¶¶ [0070], [0073]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include wherein the machine-learned model determines the score representing the degree of semantic equivalence between the text representation and the user- defined invocation phrase without determining an actionable intent corresponding to the text representation. The systems and methods described here “provide an improved guided interaction” for natural language processing while interacting with remote services more efficiently, as recognized by Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 17, Koh discloses An electronic device (“communication device 105”; Koh, ¶ [0018]), comprising: one or more processors; memory (“electronic processor 205… electrically coupled to a memory 210”; Koh, ¶ [0030]); and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors (“The electronic processor 205 is configured to receive instructions and data from the memory 210 and execute, among other things, the instructions.”; Koh, ¶ [0031]), the one or more programs including instructions for (“instructions stored in the memory 210… perform the methods described herein.”; Koh, ¶ [0031]): receiving audio input containing a user utterance (though not expressly disclosed as user-defined,” the method includes “ the electronic processor 205 of the communication device 105 receives a first voice command via the microphone 220,” where the microphone provides audio input which contains the first voice command (a user utterance).; Koh, ¶ [0041]); determining, from the audio input, a text representation of the utterance (“At block 310, the electronic processor 205 analyzes the first voice command using a first type of voice recognition. In some embodiments, the electronic processor 205 uses speech-to-text techniques to convert the first voice command into text.”; Koh, ¶ [0042]); in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases (though not expressly disclosed as “user-defined,” the method includes that “the electronic processor 205 determines that an action to be performed in accordance with the first voice command is unrecognizable based on the analysis using the first type of voice recognition,” where unrecognizable is defined as any “converted text or voice data of a voice command [that does not] exactly [match]... a supported voice command”; Koh, ¶¶ [0043]-[0044]), determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user- defined invocation phrases satisfies one or more rule-based conditions (The method can include a “first type of voice recognition [which] recognize converted text or voice data of a voice command… [when] the voice command … nearly exactly matches a supported voice command,” where the near exact match is a rule based condition (examples including suffixes on words (e.g., where the key phrase is “play song 1” and the voice command received is “playing song 1”) and additional words within a key phrase (e.g., where the key phrase is “play song 1” and the voice command received is “please play song 1”)) and the method can include “first voice command may be intended for an IoT device 115 located nearby the communication device 105… [based on] supported voice commands for [other nearby] IoT devices 115”; Koh, ¶¶ [0043]-[0045]). Koh fails to expressly recite a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent.

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 17, Aggarwal teaches a user-defined invocation phrase… (the method includes “the NLP component 114 matches the text to words that are associated, for example via training across users or through manual specification, with actions…” where matching words which are associated by “manual specification” is a user-defined invocation phrase; Aggarwal, ¶ [0045]) in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, (The method includes “the regular expressions 128 can include rules about when the voice-based session between the client devices 104 and the data processing system 102 is to include the navigation application 110 and the navigator service 106,” where the rules about the voice-based session are the rules-based conditions and the “NLP component 114 can determine whether one or more keywords identified from the input audio signal (text representation) references at least Aggarwal, ¶¶ [0039], [0050]); processing the text representation and the user-defined invocation phrase using a machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶ [0044]-[0045], [0047]) to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase (The system discloses determining the semantic distance (semantic equivalence), using the NLP component 114, “between the referential keywords (text representation) and the identifiers of the point location,” and determining an indexical measure (score representing a degree of semantic equivalence) using semantic distance “between the referential keywords (text representation)” and “node[s] corresponding to the identifier for the point location (keywords or phrases)” on a semantic knowledge graph, where each of nodes “corresponds to a keyword or phrase,” therefore providing a score representing the degree of semantic equivalence between the referential keyword (text representation) and the identifier for the point location (user-defined invocation phrase); Aggarwal, ¶ [0070], [0072]); in accordance with a determination that the determined score satisfies a threshold condition, (“The NLP component 114 can apply the semantic analysis technique to calculate or determine an indexical measure between the corresponding identifier for the point location and the referential keywords” where “NLP component 114 can compare the indexical measures between each referential keyword (text representation) and the identifier for each point location (user-defined invocation phrase) to a threshold measure,” where the threshold measure Aggarwal, ¶¶ [0072], [0073]); performing a predefined task flow corresponding to the user-defined invocation phrase, (“Responsive to the determination that at least one indexical measure is less than or equal to the threshold measure, the NLP component 114 can determine at least one referential keyword refers to one of the point locations within the reference frame,” where the determination is the predefined task flow.; Aggarwal, ¶ [0073], [0054]); wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows (“predefined keywords can include a function identifier” [or] “parameters for… carry[ing] out the request corresponding to the function” and “each function identifier...can be associated with one of the functions (a respective predefined task flow) of the navigation application 110,” where the predefined keywords includes “keywords associated with points of interest (also referred to as the point locations, thus the plurality of user-defined invocation phrases)” which correspond to a function of the [plurality of] functions (a plurality of predefined task flows).; Aggarwal, ¶ [0051]); and in accordance with a determination that the determined score does not satisfy the threshold condition: (“Responsive to the determination that all the indexical measures are greater than the threshold measure,” where greater than the threshold measure indicates that the indexical measure (the determined score) does not satisfy the threshold measure (threshold condition), “the NLP component 114 can determine that the referential keywords do not refer to any point locations within the reference frame”; Aggarwal, ¶ [0073]); performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation (“In identifying the one or more point locations, the NLP component 114 can search for other keywords related to the referential keywords identified in the input audio signal... based on content or preferences the data processing system 102 received from the client device 104,” where the other keywords are the actionable intent, where the NLP component 114 performs natural language processing, and where the “one or more point locations [are] outside the reference frame.”; Aggarwal, ¶¶ [0047], [0077], [0093]); and performing a task flow corresponding to the actionable intent (“Responsive to identifying point locations outside the initial reference frame, the location finder component 140 can modify the reference frame to include the point location with the identifier matching the referential keywords,” where the modification of the reference frame is the task flow corresponding to identifying point locations outside the initial reference frame in light of “other keywords” (the actionable intent); Aggarwal, ¶ [0094]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent. The systems and methods described here “provide an improved guided interaction” for natural language processing while interacting with remote services more efficiently, as recognized by Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 18, Koh discloses A non-transitory computer-readable storage medium (memory 210 may include read only memory (ROM), random access memory (RAM), other non-transitory computer-readable media”; Koh, ¶ [0031]) storing one or more programs configured to be executed by one or more processors of an electronic device, the one or more programs including instructions for (“The electronic processor 205 is configured to receive instructions and data from the memory 210 and execute, among other things, the instructions,” the instructions are described in the context of the method 300; Koh, ¶ [0031]): receiving audio input containing a user utterance (“At block 305, the electronic processor 205 of the communication device 105 receives a first voice command via the microphone 220,” where the microphone provides audio input which contains the first voice command (a user utterance).; Koh, ¶ [0041]); determining, from the audio input, a text representation of the utterance (“At block 310, the electronic processor 205 analyzes the first voice command using a first type of voice recognition. In some embodiments, the electronic processor 205 uses speech-to-text techniques to convert the first voice command into text.”; Koh, ¶ [0042]); in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases (though not expressly disclosed as “user-defined,” the method includes that “the electronic processor 205 determines that an action to be performed in accordance with the first voice command is unrecognizable based on the analysis using the first type of voice recognition,” where unrecognizable is defined as any “converted text or voice data of a voice command [that does not] exactly [match]... a supported voice command”; Koh, ¶¶ [0043]-[0044]), determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user- defined invocation phrases satisfies one or more rule-based conditions (The method can include a “first type of voice recognition [which] recognize converted text or voice data of a voice command… [when] the voice command … nearly exactly matches a supported voice command,” where the near exact match is a rule based condition (examples including suffixes on words (e.g., where the key phrase is Koh, ¶¶ [0043]-[0045]). However, Koh fails to expressly recite a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent.

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 18, Aggarwal teaches a user-defined invocation phrase… (the method includes “the NLP component 114 matches the text to words that are associated, for example via training across users or through manual specification, with actions…” where matching words which are associated by “manual specification” is a user-defined invocation phrase; Aggarwal, ¶ [0045]) in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, (The method includes “the regular expressions 128 can include rules about when the voice-based Aggarwal, ¶¶ [0039], [0050]); processing the text representation and the user-defined invocation phrase using a machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶ [0044]-[0045], [0047]) to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase (The system discloses determining the semantic distance (semantic equivalence), using the NLP component 114, “between the referential keywords (text representation) and the identifiers of the point location,” and determining an indexical measure (score representing a degree of semantic equivalence) using semantic distance “between the referential keywords (text representation)” and “node[s] corresponding to the identifier for the point location (keywords or phrases)” on a semantic knowledge graph, where each of nodes “corresponds to a keyword or phrase,” therefore providing a score representing the degree of semantic equivalence between the referential keyword (text representation) and the identifier for the point location (user-defined invocation phrase); Aggarwal, ¶ [0070], [0072]); in accordance with a determination that the determined score satisfies a threshold condition, (“The NLP component 114 can apply the semantic analysis technique to Aggarwal, ¶¶ [0072], [0073]); performing a predefined task flow corresponding to the user-defined invocation phrase, (“Responsive to the determination that at least one indexical measure is less than or equal to the threshold measure, the NLP component 114 can determine at least one referential keyword refers to one of the point locations within the reference frame,” where the determination is the predefined task flow.; Aggarwal, ¶ [0073], [0054]); wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows (“predefined keywords can include a function identifier” [or] “parameters for… carry[ing] out the request corresponding to the function” and “each function identifier...can be associated with one of the functions (a respective predefined task flow) of the navigation application 110,” where the predefined keywords includes “keywords associated with points of interest (also referred to as the point locations, thus the plurality of user-defined invocation phrases)” which correspond to a function of the [plurality of] functions (a plurality of predefined task flows).; Aggarwal, ¶ [0051]); and in accordance with a determination that the determined score does not satisfy the threshold condition: (“Responsive to the determination that all the indexical measures are greater than the threshold measure,” where greater than the threshold measure indicates that the indexical measure (the determined score) does not satisfy the threshold measure (threshold condition), “the NLP component 114 can determine that the referential keywords do not refer to any point locations within the reference frame”; Aggarwal, ¶ [0073]); performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation (“In identifying the one or more point locations, the NLP component 114 can search for other keywords related to the referential keywords identified in the input audio Aggarwal, ¶¶ [0047], [0077], [0093]); and performing a task flow corresponding to the actionable intent (“Responsive to identifying point locations outside the initial reference frame, the location finder component 140 can modify the reference frame to include the point location with the identifier matching the referential keywords,” where the modification of the reference frame is the task flow corresponding to identifying point locations outside the initial reference frame in light of “other keywords” (the actionable intent); Aggarwal, ¶ [0094]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include a user-defined invocation phrase… in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent. The systems and methods described here “provide an Aggarwal. (Aggarwal, ¶ [0021]).

Regarding claim 19, Koh discloses An electronic device, comprising (“method 300 performed by the communication device 105”; Koh, ¶ [0040]): means for receiving audio input containing a user utterance (“At block 305, the electronic processor 205 of the communication device 105 receives a first voice command via the microphone 220,” where the microphone provides audio input which contains the first voice command (a user utterance).; Koh, ¶ [0041]); determining, from the audio input, a text representation of the utterance (“At block 310, the electronic processor 205 analyzes the first voice command using a first type of voice recognition. In some embodiments, the electronic processor 205 uses speech-to-text techniques to convert the first voice command into text.”; Koh, ¶ [0042]); means for, in accordance with a determination that the text representation does not exactly match any of a plurality of user-defined invocation phrases (though not expressly disclosed as “user-defined,” the method includes that “the electronic processor 205 determines that an action to be performed in accordance with the first voice command is unrecognizable based on the analysis using the first type of voice recognition,” where unrecognizable is defined as any “converted text or voice data of a voice command [that does not] exactly [match]... a supported voice command”; Koh, ¶¶ [0043]-[0044]), determining whether a comparison between the text representation and a user-defined invocation phrase of the plurality of user- defined invocation phrases satisfies one or more rule-based conditions (The method can include a “first type of voice recognition [which] recognize converted text or voice data of a voice command… [when] the voice command … nearly exactly matches a supported voice command,” where the near exact match is a rule based condition (examples including suffixes on words (e.g., where the key phrase is “play song 1” and the voice command received is “playing song 1”) and additional words within a key phrase (e.g., where the key phrase is “play song 1” and the voice command received is Koh, ¶¶ [0043]-[0045]). However, Koh fails to expressly recite a user-defined invocation phrase… means for, in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; means for, in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and means for, in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent.

The relevance of Aggarwal is disclosed above with reference to claim 1. Regarding claim 19, Aggarwal teaches a user-defined invocation phrase… (the method includes “the NLP component 114 matches the text to words that are associated, for example via training across users or through manual specification, with actions…” where matching words which are associated by “manual specification” is a user-defined invocation phrase; Aggarwal, ¶ [0045]) means for, in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, (The method includes “the regular expressions 128 can include rules about when the voice-based session between the client devices 104 and the data processing system Aggarwal, ¶¶ [0039], [0050]); processing the text representation and the user-defined invocation phrase using a machine-learned model (“The NLP component 114 can include or be configured with techniques based on machine learning, such as statistical machine learning,” thus using a machine learned model, where the input audio signal (text representation) and the keywords as stored in the digital assistant application 108 and the navigation application 110 (user-defined invocation phrase) are processed by the NLP component 114.; Aggarwal, ¶ [0044]-[0045], [0047]) to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase (The system discloses determining the semantic distance (semantic equivalence), using the NLP component 114, “between the referential keywords (text representation) and the identifiers of the point location,” and determining an indexical measure (score representing a degree of semantic equivalence) using semantic distance “between the referential keywords (text representation)” and “node[s] corresponding to the identifier for the point location (keywords or phrases)” on a semantic knowledge graph, where each of nodes “corresponds to a keyword or phrase,” therefore providing a score representing the degree of semantic equivalence between the referential keyword (text representation) and the identifier for the point location (user-defined invocation phrase); Aggarwal, ¶ [0070], [0072]); means for, in accordance with a determination that the determined score satisfies a threshold condition, (“The NLP component 114 can apply the semantic analysis technique to calculate or determine an indexical measure between Aggarwal, ¶¶ [0072], [0073]); performing a predefined task flow corresponding to the user-defined invocation phrase, (“Responsive to the determination that at least one indexical measure is less than or equal to the threshold measure, the NLP component 114 can determine at least one referential keyword refers to one of the point locations within the reference frame,” where the determination is the predefined task flow.; Aggarwal, ¶ [0073], [0054]); wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows (“predefined keywords can include a function identifier” [or] “parameters for… carry[ing] out the request corresponding to the function” and “each function identifier...can be associated with one of the functions (a respective predefined task flow) of the navigation application 110,” where the predefined keywords includes “keywords associated with points of interest (also referred to as the point locations, thus the plurality of user-defined invocation phrases)” which correspond to a function of the [plurality of] functions (a plurality of predefined task flows).; Aggarwal, ¶ [0051]); and means for, in accordance with a determination that the determined score does not satisfy the threshold condition: (“Responsive to the determination that all the indexical measures are greater than the threshold measure,” where greater than the threshold measure indicates that the indexical measure (the determined score) does not satisfy the threshold measure (threshold condition), “the NLP component 114 can determine that the referential keywords do not refer to any point locations within the reference frame”; Aggarwal, ¶ [0073]); performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation (“In identifying the one or more point locations, the NLP component 114 can search for other keywords related to the referential keywords identified in the input audio signal... Aggarwal, ¶¶ [0047], [0077], [0093]); and performing a task flow corresponding to the actionable intent (“Responsive to identifying point locations outside the initial reference frame, the location finder component 140 can modify the reference frame to include the point location with the identifier matching the referential keywords,” where the modification of the reference frame is the task flow corresponding to identifying point locations outside the initial reference frame in light of “other keywords” (the actionable intent); Aggarwal, ¶ [0094]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh to incorporate the teachings of Aggarwal to include a user-defined invocation phrase… means for, in accordance with a determination that the comparison between the text representation and the user-defined invocation phrase satisfies the one or more rule-based conditions, processing the text representation and the user-defined invocation phrase using a machine-learned model to determine a score representing a degree of semantic equivalence between the text representation and the user-defined invocation phrase; means for, in accordance with a determination that the determined score satisfies a threshold condition, performing a predefined task flow corresponding to the user-defined invocation phrase, wherein each of the plurality of user-defined invocation phrases corresponds to a respective predefined task flow of a plurality of predefined task flows; and means for, in accordance with a determination that the determined score does not satisfy the threshold condition: performing natural language processing on the text representation to determine an actionable intent corresponding to the text representation; and performing a task flow corresponding to the actionable intent.. The systems and methods described here “provide Aggarwal. (Aggarwal, ¶ [0021]).

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 1 above, and further in view of Sarkar (U.S. Pat. App. Pub. No. 2006/0074980, hereinafter Sarkar)

Regarding claim 8, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all of the elements of the current invention as stated above. However, Koh and Aggarwal fail to expressly recite wherein the one or more rule-based conditions include a third rule-based condition that a text normalization of the text representation contains a text normalization of the user-defined invocation phrase.

Sarkar teaches systems and methods for semantically disambiguating text information. (Sarkar, ¶ [0001]). Regarding claim 8, Sarkar teaches wherein the one or more rule-based conditions include a third rule-based condition that a text normalization of the text representation contains a text normalization of the user-defined invocation phrase (Describes a rule-based matching system where “Early uniform normalization (to Unicode Normal Form C) may be used to perform the matching... where words are converted to their root forms before matching against the index (text normalization). The input string may be analyzed for each of its constituent words, to generate a so-called “stem” (or “base”) form,” and where “The matching process can be based on complete... match of the entered text with the given keyword,” thus text normalized text representation containing a text normalization of the user-defined invocation phrase; Sarkar, ¶ [0169]).

Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Sarkar to include wherein the one or more rule-based conditions include a third rule-based condition that a text normalization of the text representation contains a text normalization of the user-defined invocation phrase. The systems and methods described here can provide association of concepts for conveying information “without any ambiguity or without being hampered by the limitations of human languages,” as recognized by Sarkar. (Sarkar, ¶ [0001]).

Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 1 above, and further in view of Yavagal (U.S. Pat. App. Pub. No. 2020/0184966, hereinafter Yavagal)

Regarding claim 10, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all of the elements of the current invention as stated above. However, Koh and Aggarwal fail to expressly recite wherein the machine-learned model is configured to: receive, as input, a feature vector of the text representation and a feature vector of the user-defined invocation phrase; and generate, as output, the score.

Yavagal teaches systems and methods for two stage wakeword detection. (Yavagal, ¶ [0016]). Regarding claim 10, Yavagal teaches wherein the machine-learned model is configured to: receive, as input, a feature vector of the text representation and (“An acoustic model 408 creates phonetic probabilities 410 using the feature vectors,” thus receiving “feature vectors 406 that represent one or more features of the audio data 402” from the feature extractor 404 as an input, where “audio data (e.g., synthesized speech) [is generated] from text data,” and where the acoustic model 408 “may be trained by the server(s) 120 … using the audio data (and/or vectors),” thus a machine-learned model; Yavagal, ¶¶ [0036], [0054]-[0055], [0065]); a feature vector of the user-defined invocation phrase (“the various models 404, 408, 412, 416 may be trained… [using] the audio data (and/or vectors),” where vectors includes feature vectors, and said vectors can represent “user speech of entire user inputs (user-defined invocation phrases).”; Yavagal, ¶ [0065]); and generate, as output, the score (“The wakeword detector 412 may output a wakeword-detection hypothesis score 414 using the phonetic probabilities 410.”; Yavagal, ¶ [0062]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Yavagal to include wherein the machine-learned model is configured to: receive, as input, a feature vector of the text representation and a feature vector of the user-defined invocation phrase; and generate, as output, the score. The two stage detection model described herein provides speech processing with increased accuracy due to “additional input signals, more processing power, and/or a more sophisticated wakeword model,” as recognized by Yavagal. (Yavagal, ¶¶ [0016], [0018]).

Claims 12 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 1 above, and further in view of Yavagal and Schalkwyk (U.S. Pat. App. Pub. No. 2015/0340034, hereinafter Schalkwyk)

Regarding claim 12, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all of the elements of the current invention as stated above. However, Koh and Aggarwal fail to expressly recite wherein the machine-learned model is trained using a plurality of sets of one or more text representations.

 Yavagal is disclosed above with reference to Claim 10. Regarding claim 12, Yavagal teaches wherein the machine-learned model is trained using a plurality of sets [of audio data] (“trained wakeword detection components 220 implemented by the device 110 may be trained and operated according to various machine learning techniques…[including], neural networks (such as deep neural networks and/or recurrent neural networks), inference engines, trained classifiers, etc.” where the method can include “the wakeword-detection models… [being] implemented for their corresponding locations via training... [by] using speech data corresponding to the wakeword (audio data),” given the disclosure of “the quiet and noisy-location wakeword detection models” which can be “implemented to perform specifically…” for their “corresponding locations via training,” the disclosure includes at least includes two sets, a set for noisy environments and a set for quiet environments (a plurality of sets of audio data); Yavagal, ¶¶ [0074]-[0075]) that correspond to a plurality of sets of one or more user utterances received prior to receiving the audio input (the training for the machine learning model “may require audio data representing numerous spoken wakewords in order for the wakeword detector to be trained,” thus indicating a plurality of sets of one or more user utterances, and where training occurs prior to receiving the audio input (the audio input is received by “trained wakeword detection components”; Yavagal, ¶ [0018], [0066]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Yavagal to include wherein the machine-learned model is trained using a plurality of sets of one or more text representations. The two stage detection model described herein provides speech processing with increased accuracy due to “additional input signals, more processing power, and/or a more sophisticated wakeword model,” as recognized by Yavagal. Yavagal, ¶¶ [0016], [0018]). However, Yavagal fails to expressly describe wherein the audio data for the machine-learned model is a text representation.

Schalkwyk teaches systems and methods for speech recognition using neural networks. (Schalkwyk, ¶¶ [0013], [0016]). Regarding claim 12, Schalkwyk teaches wherein the audio data for the machine learned model is a text representation (The method can include “The language model 130 is a neural network-based model (machine-learned model),” where “The system trains the language model on the text training data to adjust the values of the parameters from the initial values to the pre-trained values…” and “The text training data includes sequences of graphemes for which the text label that should be generated by the language model is known and can include text data from any of a variety of sources.”; Schalkwyk, ¶ [0022], [0035]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh, as modified by the digital assistant and navigation interface of Aggarwal and the systems and methods for two stage wakeword detection of Yavagal to incorporate the teachings of Schalkwyk to include wherein the audio data for the machine learned model is a text representation. “By training the language model and the inverse pronunciation model on text training data, limitations of the performance of the system due to a lack of audio training data can be avoided.” as recognized by Schalkwyk. (Schalkwyk, ¶ [0005]).

Regarding claim 13, the rejection of claim 12 is incorporated. Koh, Aggarwal, Yavagal, and Schalkwyk disclose all of the elements of the current invention as stated above. However, Koh, Aggarwal, and Schalkwyk fail to expressly recite wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is associated with a respective user-defined invocation phrase of the plurality of user-defined invocation phrases.

The relevance of Yavagal is disclosed above with reference to Claim 10. Regarding claim 13, Yavagal further teaches wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is associated with a respective user-defined invocation phrase of the plurality of user-defined invocation phrases (The method discloses “The wakeword-detection model may be implemented to perform specifically... to detect a different, work-related wakeword or to not detect wakewords at all... “ where “the wakeword-detection models may be implemented for their corresponding locations via training using location-specific training data,” thus location specific training data (each of the sets of one or more user utterances of the plurality of sets of one or more user utterances) is associated with the specific implementations of different wakewords (each of respective user defined invocation phrases of the plurality of user-defined invocation phrases).; Yavagal, ¶¶ [0074], [0075]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Yavagal to include wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is associated with a respective user-defined invocation phrase of the plurality of user-defined invocation phrases. The two stage detection model described herein provides speech processing with increased accuracy due to “additional input signals, more processing power, and/or a more sophisticated wakeword model,” as recognized by Yavagal. (Yavagal, ¶¶ [0016], [0018]).

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 13 above, and further in view of Yavagal, Schalkwyk, and Senior (U.S. Pat. App. Pub. No. 2017/0103752, hereinafter Senior)

Regarding claim 14, the rejection of claim 13 is incorporated. Koh, Aggarwal, Yavagal, and Schalkwyk disclose all of the elements of the current invention as stated above. However, Koh, Aggarwal, Yavagal, and Schalkwyk fail to expressly recite wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is received within a respective predetermined time period prior to the respective user-defined invocation phrase being invoked.

Senior teaches systems and methods for training neural networks to include latency restraints. (Senior, ¶ [0027]). Regarding claim 14, Senior teaches wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is received within a respective predetermined time period (The method can include “processing of one training data example from the training data 110,” where “the training data 110 may include audio data 112 for utterances of many different speakers.” The method can further include “split[ting] the audio data 112a into a sequence of multiple frames that correspond to different portions or time periods of the audio data 112a. For example, each frame may describe a different 25-millisecond portion of the audio data 112a,”; Senior, ¶¶ [0042], [0043]); prior to the respective user-defined invocation phrase being invoked (Operations (A) through (G) describe a training process for the neural network 130, which occurs prior to “the trained recurrent neural network… determin[ing] a transcription for the utterance”, thus the neural network is trained prior to the determining a transcription (a respective user-defined invocation phrase being invoked); Senior, ¶¶ [0042], [0043], [0079]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh, as modified by the digital assistant and navigation interface of Aggarwal, the systems and Yavagal, and the neural network training of Schalkwyk, to incorporate the teachings of Senior to include wherein each set of one or more user utterances of the plurality of sets of one or more user utterances is received within a respective predetermined time period prior to the respective user-defined invocation phrase being invoked. The systems and method described here can control and reduce “additional latency from system self-alignment …[thus] achiev[ing] improved performance in terms of computation time,” as recognized by Senior. (Senior, ¶¶ [0015], [0016]).

Claim 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 1 above, and further in view of Mukherjee (U.S. Pat. App. Pub. No. 2018/0336885, hereinafter Mukherjee).

Regarding claim 15, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all of the elements of the current invention as stated above. However, Koh and Aggarwal fail to expressly recite wherein each of the plurality of user-defined invocation phrases is assigned to a respective predefined task flow of the plurality of predefined task flows in accordance with user input received at the electronic device or at one or more other electronic devices, and wherein the electronic device and the one or more other electronic devices are each registered to a same user.

Mukherjee teaches systems and methods related to a crowdsourced digital assistant. (Mukherjee, ¶ [0004]). Regarding claim 15, Mukherjee teaches wherein each of the plurality of user-defined invocation phrases is assigned to a respective predefined task flow of the plurality of predefined task flows (the method can include “training component 250 can further request that the user provide a set of commands(plurality of user defined invocation phrases) that correspond to (is assigned to) the desired operation (respective predefined task flow of the Mukherjee, ¶ [0041]) in accordance with user input received at the electronic device or at one or more other electronic devices (“a user may select (user input) one or more terms included in the received command representations, and define them with a corresponding parameter type selected from a list of custom, predefined, or determined parameter times,” where the input is received at the training component of the digital assistant device; Mukherjee, ¶¶ [0041], [0037]), and wherein the electronic device and the one or more other electronic devices are each registered to a same user (the digital assistant device 110 can include “the digital assistant module 114 [which] provides an interface between a digital assistant device 110 and an associated user,” where the user is associated with the digital assistant device, thus the electronic device is registered to the user; Mukherjee, ¶ [0026]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Mukherjee to include wherein each of the plurality of user-defined invocation phrases is assigned to a respective predefined task flow of the plurality of predefined task flows in accordance with user input received at the electronic device or at one or more other electronic devices, and wherein the electronic device and the one or more other electronic devices are each registered to a same user. The systems and methods described here enables the digital assistant to “learn from its users” while avoiding “privacy concerns typically associated with conventional digital assistants,” as recognized by Mukherjee. (Mukherjee, ¶ [0005]).

Claim 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Koh and Aggarwal as applied to claim 1 above, and further in view of Acero (U.S. Pat. App. Pub. No. 2018/0330723, hereinafter Acero)

Regarding claim 16, the rejection of claim 1 is incorporated. Koh and Aggarwal disclose all of the elements of the current invention as stated above. However, Koh and Aggarwal fail to expressly recite wherein performing natural language processing includes determining, from a plurality of domains of an ontology, a domain corresponding to the text representation, and wherein the method further comprises: resolving, based on the text representation, one or more parameters of the determined domain.

Acero teaches systems and methods for voice interaction with an intelligent automated assistant. (Acero, ¶ [0009]). Regarding claim 16, Acero teaches wherein performing natural language processing includes determining, from a plurality of domains of an ontology, a domain corresponding to the text representation (the method includes the “natural language processing module 732 select[ing]… the domain that has the most ‘triggered’ nodes,” where “if a word or phrase in the candidate text representation is found to be associated with one or more nodes in ontology 760 (via vocabulary index 744), the word or phrase ‘triggers’ or “activates” those nodes; Acero, ¶ [0235]), and wherein the method further comprises: resolving, based on the text representation, one or more parameters of the determined domain (the method further includes “inferring a user intent based on multiple candidate actionable intents determined from multiple candidate text representations of a speech input... [and the] task flow resolver 840…” attempts “to resolve one or more missing flow parameters” where flow parameters are parameters of the structured query generated “to represent the identified actionable intent… (or domain).”; Acero, ¶ [0239]-[0241], [0272]).

It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the corrected voice command system of Koh as modified by the digital assistant and navigation interface of Aggarwal to incorporate the teachings of Acero to include wherein performing natural language processing includes Acero. (Acero, ¶ [0309]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lyon (U.S. Pat. App. Pub. No. 2010/0257129) discloses systems and methods for audio classification using sparse features.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627.  The examiner can normally be reached on 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the 






/SES/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        

04/20/2021