DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on October 14, 2021. Claims 1-5, 9, 11-12, 14-15, 21-22, and 24-25 are pending and have been examined.
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 23, 2021 is being considered by the examiner.

Response to Amendments 
Applicant’s amendment filed on October 14, 2021 has been entered. 
In view of the amendments to the specification, the amendment to paragraph [0049] is acknowledged and entered.
In view of the amendment to paragraph [0049] of the specification, the objection to the specification is withdrawn.
In view of the amendment to the claims, the amendment of claim(s) 1-5, 9, 11-12, 14-15, 21-22, and 24-25 has been acknowledged and entered.  
In light of the amendment to claims 1-3, 11-12, and 21-22, the rejection of claims 1-30 under 35 U.S.C §112(b) is withdrawn.
In light of the amendment of claim(s) 1-5, 9, 11-12, 14-15, 21-22, and 24-25, the rejection of claim(s) 1-30 under 35 U.S.C §103 is amended as described below.

Response to Arguments
The applicant's arguments regarding the prior art rejections under 35 U.S.C. §103, see pages 1-3 of the Remarks for the Response to Non-Final Office Action dated August 6, 2021, received on October 14, 2021 (hereinafter Response and Office Action, respectively), have been fully considered. 
As an initial matter, applicant asserts that only “[c]laims 1-6, 11-16, and 21-26 were rejected under 35 U.S.C. 103” and that “[p]resumably, claims 7-10, 17-20, and 27-30 are deemed to have subject matter allowable over the prior art of record.” (Response, pg. 1). Examiner respectfully disagrees. Claims 7-10, 17-20, and 27-30 are properly rejected, as explained in pages 17-23 of the Office Action. As such, Allowable Subject Matter was not indicated, explicitly or implicitly, in the Office Action.
With respect to the rejection(s) of claim(s) 1-6, 11-16, and 21-26 under 35 U.S.C. §103 as being unpatentable over Seo (U.S. Pat. App. Pub. No. 2020/0035228, hereinafter Seo) in view of Non-Patent Literature to Bender et al. (Oliver Bender, Franz Josef Och, and Hermann Ney. “Maximum entropy models for named entity recognition.” In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03). Association for Computational Linguistics, USA, 148–151. 2003, hereinafter Bender) and Tomar (U.S. Pat. App. Pub. No. 2018/0358005, hereinafter Tomar), applicant’s arguments in light of the amendments have been fully considered but are not persuasive. 
In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., the basis for the asserted confusion “between the claimed variable recognizer and the process of named entity recognition”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). To the point that applicant asserts that Instant Application, ¶¶ [0068] and [0072]). As such, this argument is not persuasive. Applicant is invited to amend the claims, in light of specification support, such that the claims recite the intended limitations.
In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Applicant’s assertion that the “Seo reference describes a conventional speech to text system that has nothing to do with API hits and no teaching of an unconventional use of neural networks” is moot, given that the Seo reference is not cited to teach said elements. Similarly, though applicant asserts that Bender discloses “no special relationship to audio signals or API hits,” these assertions merely attack the references individually, without considering the elements cited in the rejection. As such, these arguments are not persuasive.
Further, applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references. For Bender, in light of the asserted confusion. (Response, pg. 2). However, these assertions amount to little more than an assertion of patentability over the cited references. Said assertions do not address the specifically enumerated elements of the claim, in light of the corresponding segments of the cited references which are understood to read upon said claim elements. Therefore, these arguments are improper and are not persuasive.
Further, applicant indicates that “the limitation of outputting ‘the value of the plurality of enumerated variable values with the highest probability’ is not obvious from the cited prior art.” This argument is not persuasive. As indicated below, “the corresponding NE tag” of Bender is a value of the plurality of enumerated variable values, as explained in the rejection below.  Therefore, the rejection is maintained, as modified in response to the amended claims.
The applicant has not provided any further statement and therefore, the examiner directs the applicant to the below rationale.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill 

Claims 1-6, 11-16, and 21-26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo (U.S. Pat. App. Pub. No. 2020/0035228, hereinafter Seo) in view of Non-Patent Literature to Bender et al. (Oliver Bender, Franz Josef Och, and Hermann Ney, “Maximum entropy models for named entity recognition.” In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL '03). Association for Computational Linguistics, USA, 148–151. 2003, hereinafter Bender) and Tomar (U.S. Pat. App. Pub. No. 2018/0358005, hereinafter Tomar).

Regarding claim 1, Seo discloses a machine for recognizing an intent in speech audio, the machine comprising (The “apparatus for speech recognition” performing the embodiments described with reference to FIGS. 12-14; Seo, ¶¶ [0200], FIGS. 12-14): a variable recognizer that processes speech audio features (the system includes “separating the text into sentences as units (S1310) and identifying (tokenizing) each word in the sentence (S1320),” where the text is produced by “extract[ing] the uttered speech to convert the speech into a text (S1220) {processing speech audio}”; Seo, ¶¶ [0209], [0206]), computes a probability of the speech audio having any of a plurality of enumerated variable values (“and representing the word by a vector and vectorizing the word through part-of-speech tagging and a named entity class (S1330, S1340).” where a determination of any of a plurality of named entities includes a probability for said determination (e.g., confidence in the selection of the named entity).; Seo, ¶¶ [0209]); and an intent recognizer that processes speech audio features (“the utterance intent may be determined, expected accuracy between the determination result and the actual utterance intent may be measured, and the measured expected accuracy may be output through a display unit or an audio output unit of the apparatus for speech recognition” where the utterance intent is determined from the actual utterance {speech audio}; Seo, ¶¶ [0214]). However, Seo fail(s) to 
Bender teaches systems and methods of applying “maximum entropy (ME) models to the task of named entity recognition (NER)”. (Bender, pg. 148, Col. 1, lines 2-3). Regarding claim 1, Bender teaches computes a probability of the speech audio having any of a plurality of enumerated variable values (“we directly factorize the posterior probability and determine the corresponding NE tag {computes a probability of... having any of a plurality of enumerated variable values} for each word of an input sequence {speech audio}”; Bender, Pg. 148, Col. 2, lines 12-15), and outputs the value of the plurality of enumerated variable values with the highest probability (“Given a natural input sequence... We choose the named entity (NE) tag sequence {the value of the enumerated variable values}... with the highest probability among all possible tag sequence”; Bender, Pg. 148, Col. 1, lines 15-21). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring to incorporate the teachings of Bender to include computes a probability of the speech audio having any of a plurality of enumerated variable values, and outputs the value with the highest probability. The maximum entropy approach allows for the extraction of “the named entities and their context information from additional nonannotated data,” which can be used to “improve the recognition accuracy,” as recognized by Bender. (Bender, pg. 148, Col. 1, lines 7-11). However, Seo and Bender fail(s) to expressly recite an intent recognizer that processes speech audio features, computes a probability of the speech audio having the intent, and in response to the probability being above an intent threshold, produces a request for a virtual assistant action. 
Tomar teaches systems and methods for a “vocal user interface… combining a speech to text system and a speech to intent system.” (Tomar, ¶ [0001]). Regarding claim 1, Tomar teaches an intent recognizer that processes speech audio features (The decision fusion module processing the ASR system outputs.; Tomar, ¶¶ [0061]), computes a probability of the speech audio having the intent (“ The decision fusion module 400 in this example receives the STI system outputs 401, which contains both the predicted action by the STI system 107 and a confidence score for the prediction.” where the confidence score is a probability “The STI system outputs 401 are processed using a contextual learning component 403 to improve the predictions, by taking into account any available contextual information. The confidence score of the improved outputs 404 is then compared to a threshold value in a comparator 405.”; Tomar, ¶¶ [0060]), and in response to the probability being above an intent threshold, produces a request for a virtual assistant action. (“If the predicted confidence {probability} in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101, and a semantic representation 407 of the same.”; Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include an intent recognizer that processes speech audio features, computes a probability of the speech audio having the intent, and in response to the probability being above an intent threshold, produces a request for a virtual assistant action. The combination of “text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 2, the rejection of claim 1 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo and Bender 
The relevance of Tomar is described above with relation to claim 1. Regarding claim 2, Tomar further teaches wherein: the variable recognizer indicates the probability of the speech audio having a value from the plurality of enumerated variable values (“The decision fusion module 400 in this example receives the STI system outputs 401, which contains both the predicted action by the STI system 107 and a confidence score for the prediction,” where the prediction is the enumerated variable value and the confidence score is the probability.; Tomar, ¶¶ [0060]); and the intent recognizer conditions its output of a request for an action on the probability of the speech audio having a value from the plurality of enumerated variable values  (“If the predicted confidence {probability} in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101, and a semantic representation 407 of the same,” thus the output of the semantic representation {the output of the request for an action} is conditioned on the predicted confidence {probability} in the predicted intent {enumerated variable values} for the speech input {speech audio} in light of the threshold.; Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include to include wherein: the variable recognizer indicates the probability of the speech audio having an enumerated variable value; and the intent recognizer conditions its output of a request for an action on the probability of the speech audio having a value from the plurality of enumerated variable values . The combination of “text-independent STI Tomar. (Tomar, ¶ [0040]).

Regarding claim 3, the rejection of claim 2 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo and Bender fail(s) to expressly recite wherein the conditioning is based on a delayed indication of the probability of the speech audio having an enumerated variable value.
The relevance of Tomar is described above with relation to claim 1. Regarding claim 3, Tomar further teaches wherein the conditioning is based on a delayed indication of the probability of the speech audio having a value from the plurality of enumerated variable values (“The threshold may be a fixed pre-computed value or variable that can be determined at run-time and may adaptively change throughout system usage,” thus the conditioning is based on delay in the indication of the probability {as the threshold “may adaptively change throughout system usage”, delay in the indication can affect whether the predicted confidence is above the threshold}; Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include to include wherein the conditioning is based on a delayed indication of the probability of the speech audio having an enumerated variable value.. The combination of “text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 4, the rejection of claim 1 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo and Tomar 
The relevance of Bender is described above with relation to claim 1. Regarding claim 4, Bender further teaches wherein the intent recognizer conditions its output of a request for an action on which value of the plurality of enumerated variable values has the highest probability (The output of the intent recognizer, as described in Tomar with reference to claim 1, is conditioned on the received input, where the received input is the “named entity (NE) tag sequence {enumerated variable value}... with the highest probability among all possible tag sequence”; Bender, ¶¶ Pg. 148, Col. 2, lines 12-14). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the “vocal user interface… combining a speech to text system and a speech to intent system” of Tomar, to incorporate the teachings of Bender to include wherein the intent recognizer conditions its output of a request for an action on which value of the variable has the highest probability. The maximum entropy approach allows for the extraction of “the named entities and their context information from additional nonannotated data,” which can be used to “improve the recognition accuracy,” as recognized by Bender. (Bender, pg. 148, Col. 1, lines 7-11). 

Regarding claim 5, the rejection of claim 4 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo and Bender fail(s) to expressly recite wherein the conditioning is based on a delayed indication of which value of the variable has the highest probability.
The relevance of Tomar is described above with relation to claim 1. Regarding claim 5, Tomar further teaches wherein the conditioning is based on a delayed indication of which value of the plurality of enumerated variable values has the highest probability (“The Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include to include wherein the conditioning is based on a delayed indication of which value of the variable has the highest probability. The combination of “text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 6, the rejection of claim 1 is incorporated. Seo further discloses wherein one of the recognizers produces a score (“determining of the utterance intent may be performed” and “accuracy of the utterance intent is... compared to a preset target value.” The comparison to a preset target value indicates that the accuracy of the utterance intent is a value {score}.; Seo, ¶¶ [0018]) and the other recognizer is called in response to the score being above a score threshold (“when accuracy of the utterance intent is small compared to a preset target value,” the “method may further include restructuring the relationship of the named entity to reset the relationship information,” where restructuring of the relationship of the named entity is calling the variable recognizer. It is understood that accuracy can be described in terms of inaccuracy, where accuracy is the inverse of inaccuracy. Thus, the above quotation can be rewritten as an inverse as follows: “when inaccuracy of the utterance intent {score} is large Seo, ¶¶ [0018]). 

Regarding claim 11, Seo discloses a method of recognizing an intent from speech audio by a computer system, the method comprising (The “apparatus for speech recognition” performing the embodiments described with reference to FIGS. 12-14; Seo, ¶¶ [0200], FIGS. 12-14): obtaining speech audio (the system includes receiving “a user input (speech signal)… through the input module (e.g., a microphone).”; Seo, ¶¶ [0156]), processing features of the speech audio (the system includes “separating the text into sentences as units (S1310) and identifying (tokenizing) each word in the sentence (S1320),” where the text is produced by “extract[ing] the uttered speech to convert the speech into a text (S1220) {processing speech audio}”; Seo, ¶¶ [0209], [0206]), to compute a probability of the speech audio having any of a plurality of enumerated variable values (“and representing the word by a vector and vectorizing the word through part-of-speech tagging and a named entity class (S1330, S1340).” where a determination of any of a plurality of named entities includes a probability for said determination (e.g., confidence in the selection of the named entity).; Seo, ¶¶ [0209]); and processing the features of the speech audio to compute …the intent (“the utterance intent may be determined, expected accuracy between the determination result and the actual utterance intent may be measured, and the measured expected accuracy may be output through a display unit or an audio output unit of the apparatus for speech recognition” where the utterance intent is determined from the actual utterance {speech audio}; Seo, ¶¶ [0214]). However, Seo fail(s) to expressly recite to compute a probability of the speech audio having any of a plurality of enumerated variable values; outputting the value with the highest probability; processing the features of the speech audio to compute a probability of the speech audio having the intent; and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action.
Bender is described above with relation to claim 1. Regarding claim 11, Bender teaches compute a probability of the speech audio having any of a plurality of enumerated variable values (“we directly factorize the posterior probability and determine the corresponding NE tag {computes a probability of... having any of a plurality of enumerated variable values} for each word of an input sequence {speech audio}”; Bender, Pg. 148, Col. 2, lines 12-15), outputting the value of the plurality of enumerated variable values with the highest probability (“Given a natural input sequence... We choose the named entity (NE) tag sequence {the value of the plurality of enumerated variable values}... with the highest probability among all possible tag sequence”; Bender, Pg. 148, Col. 1, lines 15-21). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring to incorporate the teachings of Bender to include compute a probability of the speech audio having any of a plurality of enumerated variable values; outputting the value with the highest probability. The maximum entropy approach allows for the extraction of “the named entities and their context information from additional nonannotated data,” which can be used to “improve the recognition accuracy,” as recognized by Bender. (Bender, pg. 148, Col. 1, lines 7-11). However, Seo and Bender fail(s) to expressly recite processing the features of the speech audio to compute a probability of the speech audio having the intent; and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action. 
The relevance of Tomar is described above with relation to claim 1. Regarding claim 11, Tomar teaches processing the features of the speech audio (The decision fusion module processing the ASR system outputs.; Tomar, ¶¶ [0061]), to compute a probability of the speech audio having the intent (“The decision fusion module 400 in this example receives the STI system outputs 401, which contains both the predicted action by the STI system 107 and a confidence score for the prediction.” where the confidence score is a probability “The STI system Tomar, ¶¶ [0060]), and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action (“If the predicted confidence {probability} in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101, and a semantic representation 407 of the same.”; Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include processing the features of the speech audio to compute a probability of the speech audio having the intent; and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action. The combination of “text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 12, the rejection of claim 11 is incorporated. Claim 12 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 13, the rejection of claim 12 is incorporated. Claim 13 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 14, the rejection of claim 11 is incorporated. Claim 14 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 15, the rejection of claim 14 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 16, the rejection of claim 11 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Regarding claim 21, Seo discloses a non-transitory computer readable medium storing code (“the software module may be stored in non-transitory computer readable media that can be read through a computer”; Seo, ¶¶ [0133]) capable of causing one or more computer processors to recognize an intent from speech audio by (The “apparatus for speech recognition” performing the embodiments described with reference to FIGS. 12-14; Seo, ¶¶ [0200], FIGS. 12-14): obtaining speech audio (the system includes receiving “a user input (speech signal)… through the input module (e.g., a microphone).”; Seo, ¶¶ [0156]), processing features of the speech audio (the system includes “separating the text into sentences as units (S1310) and identifying (tokenizing) each word in the sentence (S1320),” where the text is produced by “extract[ing] the uttered speech to convert the speech into a text (S1220) {processing speech audio}”; Seo, ¶¶ [0209], [0206]), to compute a probability of the speech audio having any of a plurality of enumerated variable values (“and representing the word by a vector and vectorizing the word through part-of-speech tagging and a named entity class (S1330, S1340).” where a determination of any of a plurality of named entities includes a probability for said determination (e.g., confidence in the selection of the named entity).; Seo, ¶¶ [0209]); and processing the features of the speech audio to compute …the intent (“the utterance intent may be determined, expected accuracy between the determination result and the actual utterance intent may be measured, and the measured expected accuracy may be output through a display unit or an audio output unit of the apparatus for speech recognition” where the utterance intent is determined from the actual utterance {speech audio}; Seo, ¶¶ [0214]). However, Seo fail(s) to 
The relevance of Bender is described above with relation to claim 1. Regarding claim 21, Bender teaches compute a probability of the speech audio having any of a plurality of enumerated variable values (“we directly factorize the posterior probability and determine the corresponding NE tag {computes a probability of... having any of a plurality of enumerated variable values} for each word of an input sequence {speech audio}”; Bender, Pg. 148, Col. 2, lines 12-15), outputting the value of the plurality of enumerated variable values with the highest probability (“Given a natural input sequence... We choose the named entity (NE) tag sequence {the value of the plurality of enumerated variable values}... with the highest probability among all possible tag sequence”; Bender, Pg. 148, Col. 1, lines 15-21). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring to incorporate the teachings of Bender to include compute a probability of the speech audio having any of a plurality of enumerated variable values; outputting the value with the highest probability. The maximum entropy approach allows for the extraction of “the named entities and their context information from additional nonannotated data,” which can be used to “improve the recognition accuracy,” as recognized by Bender. (Bender, pg. 148, Col. 1, lines 7-11). However, Seo and Bender fail(s) to expressly recite processing the features of the speech audio to compute a probability of the speech audio having the intent; and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action. 
Tomar is described above with relation to claim 1. Regarding claim 21, Tomar teaches processing the features of the speech audio (The decision fusion module processing the ASR system outputs.; Tomar, ¶¶ [0061]), to compute a probability of the speech audio having the intent (“The decision fusion module 400 in this example receives the STI system outputs 401, which contains both the predicted action by the STI system 107 and a confidence score for the prediction.” where the confidence score is a probability “The STI system outputs 401 are processed using a contextual learning component 403 to improve the predictions, by taking into account any available contextual information. The confidence score of the improved outputs 404 is then compared to a threshold value in a comparator 405.”; Tomar, ¶¶ [0060]), and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action (“If the predicted confidence {probability} in the outputs 404 is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 for the acoustic input 101, and a semantic representation 407 of the same.”; Tomar, ¶¶ [0060]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, to incorporate the teachings of Tomar to include processing the features of the speech audio to compute a probability of the speech audio having the intent; and in response to the probability being above an intent threshold, outputting a request for a virtual assistant action. The combination of “text-independent STI and a speech to text based ASR system” can “produce improved recognition accuracy for a VUI system,” as recognized by Tomar. (Tomar, ¶ [0040]).

Regarding claim 22, the rejection of claim 21 is incorporated. Claim 22 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.

Regarding claim 23, the rejection of claim 22 is incorporated. Claim 23 is substantially the same as claim 3 and is therefore rejected under the same rationale as above.

Regarding claim 24, the rejection of claim 21 is incorporated. Claim 24 is substantially the same as claim 4 and is therefore rejected under the same rationale as above.

Regarding claim 25, the rejection of claim 24 is incorporated. Claim 25 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.

Regarding claim 26, the rejection of claim 21 is incorporated. Claim 26 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.

Claims 7, 17, and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo, Bender, and Tomar as applied to claims 1, 11, and 21 above, and further in view of Everman (U.S. Pat. App. Pub. No. 2019/0179890, hereinafter Everman).

Regarding claim 7, the rejection of claim 1 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo, Bender, and Tomar fail to expressly recite further comprising a domain recognizer that processes speech audio features and computes a probability of the speech audio referring to a specific domain, wherein the intent recognizer is associated with the domain.
Everman teaches systems and methods for “processing a speech input to a infer user intent therefrom.” (Everman, ¶ [0002]). Regarding claim 7, Everman teaches further comprising a domain recognizer that processes speech audio features (“The domain... is determined” by the system “based on the particular words in the text string,” where the “ text string corresponding to a speech input”; Everman, ¶¶ [0008]-[0009]) and computes a probability of the speech audio referring to a specific domain, (“determines a confidence score representing how well or to what extent a user input matches a particular domain”; Everman, ¶¶ [0011]) wherein the intent recognizer is associated with the domain (“The confidence score can be used, for example, to help determine which of two candidate domains is most likely to accurately reflect or represent the intent of the input.”; Everman, ¶¶ [0011]) and called in response to the probability of the speech audio referring to a specific domain being above a domain threshold (The score for a recognizer's confidence that the particular item is in correct domain {i.e., referring to a specific domain} is compared against “a predetermined confidence threshold” where “if no candidate domain satisfies a predetermined confidence threshold, the digital assistant will not provide a response.”; Everman, ¶¶ [0011]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, and as modified by the “vocal user interface… combining a speech to text system and a speech to intent system” of Tomar, to incorporate the teachings of Everman to include to expressly recite further comprising a domain recognizer that processes speech audio features and computes a probability of the speech audio referring to a specific domain, wherein the intent recognizer is associated with the domain. The systems and methods taught by Everman can “infer user intent from a speech input so as to account for possible speech recognition errors” which improves speech recognition quality and ease of use by a user. (Everman, ¶ [0005], [0007]).

Regarding claim 17, the rejection of claim 11 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Regarding claim 27, the rejection of claim 21 is incorporated. Claim 27 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.

Claims 8, 18, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo, Bender, and Tomar as applied to claims 1, 11, and 21 above, and further in view of Non-Patent Literature to Serdyuk et al. (D. Serdyuk, Y. Wang, C. Fuegen, A. Kumar, B. Liu and Y. Bengio, “Towards End-to-end Spoken Language Understanding,” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5754-5758, 2018, hereinafter Serdyuk).

Regarding claim 8, the rejection of claim 1 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo, Bender, and Tomar fail to expressly recite wherein no human-readable speech transcription is computed.
Serdyuk teaches “an end-to-end learning system for spoken language understanding.” (Serdyuk, Pg. 5754, Col. 1, lines 10-11). Regarding claim 8, Serdyuk teaches wherein no human-readable speech transcription is computed (Discloses “end-to-end spoken language understanding” for use in “speech-to-domain and speech-to-intent” where the end-to end spoken language understanding system “capture[s] the semantic attention directly from the audio features,” thus without computation of human-readable speech transcription; Serdyuk, Pg. 5755, Col. 2, lines 21-22; Pg. 5754, Col. 1, lines 14-16). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, and as modified by the “vocal user interface… combining a speech to text system and a speech to intent system” of Tomar, to incorporate the teachings of Serdyuk to include wherein no human-readable speech transcription is computed. The use of feature sequences for spoken language Serdyuk. (Serdyuk, Pg. 5755, Col. 1, lines 10-17).

Regarding claim 18, the rejection of claim 11 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Regarding claim 28, the rejection of claim 21 is incorporated. Claim 28 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.

Claims 9-10, 19-20, and 29-30 is/are rejected under 35 U.S.C. 103 as being unpatentable over Seo, Bender, and Tomar as applied to claims 1, 11, and 21 above, and further in view of Gruber (U.S. Pat. App. Pub. No. 2017/0178626, hereinafter Gruber).

Regarding claim 9, the rejection of claim 1 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. Tomar further discloses ...the intent recognizer producing a request for a virtual assistant action… (“If the predicted confidence in the outputs” as produced by the speech to intent (STI) system {the intent recognizer} “is above the threshold, the decision fusion module 400 outputs the predicted intent or action 406 {request for a virtual assistant action} for the acoustic input 101, and a semantic representation 407 of the same.”; Tomar, ¶¶ [0060]). However, Seo, Bender, and Tomar fail to expressly recite further comprising a network client with access to a web API, wherein, in response to the ...request for a virtual assistant action, the network client performs a request to the web API.
Gruber teaches “intelligent automated assistant system” for engagement with “the user in an integrated, conversational manner using natural language dialog.” (Gruber, Abstract). Regarding claim 9, Gruber teaches further comprising a network client with access to a web API, (“assistant 1002 can call external services 1360 that interface with functionality and Gruber, ¶¶ [0087], [0086]) wherein, in response to the...request for a virtual assistant action, the network client performs a request to the web application programming interface (API) (“assistant 1002 can call external services 1360 that interface with functionality and applications on a device via APIs… to perform functions and operations that might otherwise be initiated using a conventional user interface on the device,” where “functions and operations may include, for example, setting an alarm, making a telephone call, sending a text message or email message, adding a calendar event, and the like. Such functions and operations may be performed as add-on functions in the context of a conversational dialog between a user and assistant 1002.”; Gruber, ¶¶ [0087]), the request having, as an argument, the value output by the variable recognizer (“active ontologies 1050 may be operable to perform and/or implement various types of functions, operations, actions,” thus an argument in the context of a web API, and “some nodes of an active ontology may correspond to domain concepts such as restaurant and its property restaurant name,” where the association of “restaurant and its property restaurant name” is a named entity {value output by the variable recognizer}; Gruber, ¶¶ [0197], [0206]). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method of Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, and as modified by the “vocal user interface… combining a speech to text system and a speech to intent system” of Tomar, to incorporate the teachings of Gruber to include further comprising a network client with access to a web API, wherein, in response to the ...request for a virtual assistant action, the network client performs a request to the web API. “The intelligent automated assistant Gruber. (Gruber, ¶ [0010]).

Regarding claim 10, the rejection of claim 9 is incorporated. Seo, Bender, and Tomar disclose all of the elements of the current invention as stated above. However, Seo, Bender, and Tomar fail to expressly recite further comprising a speech synthesis engine, wherein, in response to receiving a response from the web API, the speech synthesis engine synthesizes speech audio containing information from the web API response and outputs the synthesized speech audio for a user of the virtual assistant.  
The relevance of Gruber is described above with relation to claim 9. Regarding claim 10, Gruber teaches further comprising a speech synthesis engine, (“Speech output, may include...Synthesized speech,” thus a speech synthesis engine; Gruber, ¶¶ [0152], [0153]) wherein, in response to receiving a response from the web API, the speech synthesis engine synthesizes speech audio containing information from the web API response (“assistant 1002 can call external services 1360 that interface with functionality and applications on a device via APIs… to perform functions and operations that might otherwise be initiated using a conventional user interface on the device,” where said functions and operations include “output data/information which may be generated by intelligent automated assistant 1002... [including] Speech output... [such as] Synthesized speech”; Gruber, ¶¶ [0087], [0148], [0152], [0153]) and outputs the synthesized speech audio for a user of the virtual assistant (The assistant can include “taking input from the user as voice spoken to the assistant and sending output from the assistant to the user, for example as synthesized speech, in reply.”; Gruber, ¶¶ [0097]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech recognition system and method Seo for improving a sentence intent analysis rate through named entity structuring, as modified by the named entity recognition (NER) using maximum entropy (ME) models of Bender, and as modified by the “vocal user interface… combining a speech to text system and a speech to intent system” of Tomar, to incorporate the teachings of Gruber to include further comprising a speech synthesis engine, wherein, in response to receiving a response from the web API, the speech synthesis engine synthesizes speech audio containing information from the web API response and outputs the synthesized speech audio for a user of the virtual assistant. “The intelligent automated assistant systems of various embodiments of the present invention can unify, simplify, and improve the user's experience with respect to many different applications and functions of an electronic device, and with respect to services that may be available over the Internet,” as recognized by Gruber. (Gruber, ¶ [0010]).

Regarding claim 19, the rejection of claim 11 is incorporated. Claim 19 is substantially the same as claim 9 and is therefore rejected under the same rationale as above.

Regarding claim 20, the rejection of claim 19 is incorporated. Claim 20 is substantially the same as claim 10 and is therefore rejected under the same rationale as above.

Regarding claim 29, the rejection of claim 21 is incorporated. Claim 29 is substantially the same as claim 9 and is therefore rejected under the same rationale as above.

Regarding claim 30, the rejection of claim 29 is incorporated. Claim 30 is substantially the same as claim 10 and is therefore rejected under the same rationale as above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Lin et al. (U.S. Pat. App. Pub. No. 2019/0066668) discloses systems and methods for contextual spoken language understanding including both an intent and domain apparatus.
Kim et al. (U.S. Pat. App. Pub. No. 2021/0082406) discloses determining the meaning of speech using an artificial neural network.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Sean E Serraguard/Patent Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657