Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 19-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
The prior art of record, TSUNOO (US 2020/0410987 A1) teaches training an intent existence neural network and an intent extraction algorithm which does not have a neural network. (pars 54-56 and pars 61-71)
Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) teaches intent existence training text for machine learning module. (sec 3.1)
Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) teaches training an intent extraction long short-term memory neural network. (sec 3)
Ding et al. (Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network) teaches positive and negative text inputs and user-labeled training data. (Consumption Intention Classification, p. 2393)
Nguyen et al. (A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing) teaches a dependency parser for intent extraction long short-term memory neural network (secs 1 and 3.1-3.2)
Manning et al. (The Stanford CoreNLP Natural Language Processing Toolkit) teaches a dependency parser to get labeled verbs and objects. (secs 2 and 4)
Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION) teaches the intent training text not being classified to an intent category. (sec 3)
However, the claims in the application are deemed to be directed to a nonobvious improvement over the prior art of record.

Response to Arguments
Applicant’s arguments with respect to the independent claims 6, 14, 21 have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.

Examiner’s Note
Providing supporting paragraph(s) with a clear explanation for each amended/new claim in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant's claim for the present application filed on 12/11/2018.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 21-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 21
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“generating, utilizing a first set of layers of an intent existence long short-term memory neural network, a plurality of feature vectors from a plurality of words of a text input; 
generating, utilizing a second set of layers of the intent existence long short-term memory neural network, a binary intent existence classification from the plurality of feature vectors; and 
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an unclassified, open intent by extracting a verb object pair from the text input”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, other than reciting “neural network”, nothing in the claim element precludes the step from practically being performed in the mind. 
For example, but for the “neural network” language(s), the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper) of generating a group of features from a sentence; checking if there is an intent based on the group of features; determining an open intent if there is an intent in the sentence.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element – using “neural network”. The neural network in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 22
The claim is rejected under 35 U.S.C. 101 because it only modifies the abstract idea by applying a pooling layer to outputs of a bi-directional neural network, which also does not add significantly more or provide a specific application of the judicial exception.

Regarding claim 23
The claim is rejected under 35 U.S.C. 101 because it only modifies the abstract idea by extracting a verb object pair from a text input utilizing a neural network, which also does not add significantly more or provide a specific application of the judicial exception.

Regarding claim 24
The claim is rejected under 35 U.S.C. 101 because it only modifies the abstract idea by refraining from applying a neural network to a text input when the text input doesn’t have an intent, which also does not add significantly more or provide a specific application of the judicial exception.

Regarding claim 25
The claim is rejected under 35 U.S.C. 101 because it only modifies the abstract idea by generating a digital response based on an open intent, which also does not add significantly more or provide a specific application of the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 9-11, 13 are rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks)

(Note: Hereinafter, if a limitation has brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

Regarding claim 6, 
TSUNOO teaches
A non-transitory computer-readable storage medium storing instructions thereon that, when executed by at least one processor, cause the at least one processor to perform operations comprising:
(TSUNOO, [pars 41 and 52] “CPU … RAM”)

identifying a [text] input via a client [computing device]; 
(TSUNOO, [figs 1-3]; [pars 46-52] “The sensor unit 102 is, for example, a microphone (an example of an input unit) that detects a speech (voice) of a user. Of course, another sensor may be applied as the sensor unit 102.”;)

prior to determining an intent from the [text] input, determining a binary intent existence classification for the [text] input by:
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends. Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice” along with “the result of the voice recognition processing is "inquiry about weather"” may read on “prior to determining an intent from the [text] input, determining a binary intent existence classification for the [text] input”.)

applying an intent existence long short-term memory neural network to the [text] input, 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

wherein the intent existence long short-term memory neural network is trained to determine binary intent existence classifications from intent existence training [text] and corresponding intent existence training markers; and 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”; e.g., “labeled data” may read on “intent existence training marker”.)

in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, extracting an [open] intent by extracting a [verb] object [pair] from the [text] input.
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends. Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice” may read on “in response to determining that one or more intents exist in the text input based on the binary intent existence classification, extracting an [open] intent”.)

However, TSUNOO does not teach
identifying a [text] input via a client [computing device]; 
prior to determining an intent from the [text] input, determining a binary intent existence classification for the [text] input by:
applying an intent existence long short-term memory neural network to the [text] input, 
wherein the intent existence long short-term memory neural network is trained to determine binary intent existence classifications from intent existence training [text] and corresponding intent existence training markers; and 
in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, extracting an [open] intent by extracting a [verb] object [pair] from the [text] input.

(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Yasa teaches
identifying a text input via a client computing device; 
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 29-30] “the device 110b may receive a text input representing a text-based user input from the user 5. The device 110b may generate text data representing the text input and may send the text data to the system(s) 120, which the system(s) 120 receives as first data representing a first user input (132).”’; Note that TSUNOO teaches “identifying a [text] input via a client [computing device]”.)

prior to determining an intent from the text input, determining a binary intent existence classification for the text input by:
(Yasa, [figs 2]; [pars 73-83] “Each recognizer 263 may also include an intent classification (IC) component 264. An IC component 264 parses text data to determine an intent(s) (associated with the skill 290 associated with the recognizer 263 implementing the IC component 264) that potentially represents the user input. An intent represents to an action a user desires be performed.”; e.g., “IC component 264 parses text data to determine an intent(s)” may read on “determining a … intent existence classification for the text input”. Note that TSUNOO teaches “prior to determining an intent from the [text] input, determining a binary intent existence classification for the [text] input”.)

applying an intent existence long short-term memory neural network to the text input, 
(Yasa, [figs 2]; [pars 73-83] “Each recognizer 263 may also include an intent classification (IC) component 264. An IC component 264 parses text data to determine an intent(s) (associated with the skill 290 associated with the recognizer 263 implementing the IC component 264) that potentially represents the user input. An intent represents to an action a user desires be performed.”; Note that TSUNOO teaches “applying an intent existence long short-term memory neural network to the [text] input”.)

wherein the intent existence long short-term memory neural network is trained to determine binary intent existence classifications from intent existence training text and corresponding intent existence training markers; and 
 (Yasa, [figs 2]; [pars 73-83] “Each recognizer 263 may also include an intent classification (IC) component 264. An IC component 264 parses text data to determine an intent(s) (associated with the skill 290 associated with the recognizer 263 implementing the IC component 264) that potentially represents the user input. An intent represents to an action a user desires be performed. … The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent.”; [sec 132] “In order to apply machine learning techniques, the machine learning processes themselves need to be trained. Training a machine learning component requires establishing a "ground truth" for the training examples. In machine learning, the term "ground truth" refers to the accuracy of a training set's classification for supervised learning techniques.”; Note that TSUNOO teaches “wherein the intent existence long short-term memory neural network is trained to determine binary intent existence classifications from intent existence training [text] and corresponding intent existence training markers”.)

in response to determining that one or more intents exist in the text input based on the binary intent existence classification, extracting an [open] intent by extracting a verb object pair from the text input.
 (Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent. … [0.95] Intent: <PlayMusic> ArtistName: Lady Gaga SongName: Poker Face, [0.95] Intent: <PlayVideo> ArtistName: Lady Gaga VideoName: Poker Face”; e.g., “{Verb}: "Play," {Object}: "mother's little helper,"” may read on “verb object pair”. Note that TSUNOO teaches “in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, extracting an [open] intent by extracting a [verb] object [pair] from the [text] input”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO with the text input of Yasa. 
Doing so would lead to improving conversation between the user and the service providing system in order to effectively predicting the user intents and providing proper services. (Yasa, [pars 18-28] “The present disclosure improves such conversation recovery by determining an alternative text for the user input (utterance or text input) that is semantically similar and related to the user input. … As discussed herein, an alternate utterance or alternative utterance may include alternative text that may correspond to an original user input (e.g., an initial spoken utterance), where the alternative text may correspond to a rephrasing of or other way of obtaining the result requested by the original user input.”)

In the alternative, Coucke can also be interpreted to teach the following limitation:
Coucke teaches
wherein the intent existence long short-term memory neural network is trained to determine binary intent existence classifications from intent existence training text and corresponding intent existence training markers; and 
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; [sec 3.2.1] “As explained earlier, the ASR engine is required to understand arbitrary formulations of a finite set of intents described in the dataset.”; [sec 4, p. 17] “For each sentence of the speech corpus, we apply the ASR engine followed by the NLU engine, and compare the predicted output to the ground true intent and slots in the dataset.”; Note that TSUNOO teaches “wherein the intent existence long short-term memory neural network is trained to determine existence of training intents from intent existence training [text] and corresponding intent existence training markers”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Yasa with the text input of Coucke. 
Doing so would lead to efficiently specializing the natural language understanding module while running in real-time on small devices and obtaining sufficient, high-quality training data. (Coucke, [sec 6] “On the acoustic modeling side, we have shown how small-sized neural networks can be trained that enjoy near state-of-the-art accuracy while running in real-time on small devices. On the language modeling side, we have described how to train the language model of the ASR and the NLU in a consistent way, efficiently specializing them to a particular use case. We have also demonstrated the accuracy of the resulting SLU engine on real-world assistants. Finally, we have shown how sufficient, highquality training data can be obtained without compromising user privacy through a combination of crowdsourcing and machine learning.”)

However, the combination of TSUNOO, Yasa, Coucke does not appear to distinctly disclose:
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, extracting an [open] intent by extracting a verb object pair from the text input.

Xia further teaches 
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, extracting an open intent by extracting a verb object pair from the text input.
 (Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism. For emerging intents, INTENTCAPSNET-ZSL builds zero-shot DetectionCaps that utilize the (1) outputs of SemanticCaps, (2) the routing information on existing intents from DetectionCaps, and (3) similarities of the emerging intent label to existing intent labels to discriminate emerging intents like AddToPlayist from RateABook.”; e.g., “emerging intents” may read on “extracting an open intent” since “emerging intents” are not classified during the training process. Note that the combination of TSUNOO, Yasa teaches “in response to determining that one or more intents exist in the text input based on the binary intent existence classification, extracting an [open] intent by extracting a verb object pair from the text input”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Yasa, Coucke with the open intent extraction of Xia. 
Doing so would lead to effectively extracting emerging intents as well as existing intents, instead of annotating utterances of emerging intents and retraining the whole intent detection model. (Xia, [sec 1] “Moreover, it’s labor-intensive and time-consuming to annotate utterances of emerging intents and retrain the whole intent detection model.” [sec 4] “To demonstrate the effectiveness of our proposed models, we apply INTENTCAPSNET to detect existing intents in an intent detection task, and use INTENTCAPSNET-ZSL to detect emerging intents in a zero-shot intent detection task.”)

Regarding claim 9, 
TSUNOO, Yasa, Coucke, Xia teach claim 6. 

extracting the open intent comprises (see claim 6) 

Yasa further teaches 
extracting the verb object pair [without] classifying the verb object pair to a predefined category.
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent. … [0.95] Intent: <PlayMusic> ArtistName: Lady Gaga SongName: Poker Face, [0.95] Intent: <PlayVideo> ArtistName: Lady Gaga VideoName: Poker Face”; e.g., “{Verb}: "Play," {Object}: "mother's little helper,"” may read on “verb object pair”.)

The combination of TSUNOO, Yasa, Coucke, Xia is combinable with Yasa for the same rationale as set forth above with respect to claim 6.

Xia further teaches 
extracting the verb object pair without classifying the verb object pair to a predefined category.
(Xia, [figs 1-2] “Get Weather”, “Play Music”, “Emerging Intents with Unlabeled Utterances” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; 
e.g., “labeled utterances with existing intents like GetWeather and PlayMusic” may read on “without classifying the verb object pair to a predefined category” since “Emerging Intents with Unlabeled Utterances” is not classified to a predefined category, GetWeather or PlayMusic.
Examiner notes that the Instant Specification describes “the term "open intent" refers to an intent identified without reference to a predefined category or classification of intents. In particular, an open intent includes a verb object pair extracted from a text input without classifying the verb object pair to a specific category” in par(s) 33.)

The combination of TSUNOO, Yasa, Coucke, Xia is combinable with Xia for the same rationale as set forth above with respect to claim 6.

Regarding claim 10, 
TSUNOO, Yasa, Coucke, Xia teach claim 6. 

comprising instructions that, when executed by the at least one processor, cause the at least one processor to extract the verb object pair from the text input by: (see clam 6)

TSUNOO further teaches 
applying an intent extraction [long short-term memory neural network] to the [text] input, 
(TSUNOO, [figs 1-3]; [pars 61-71] “Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “voice recognition” along with “the result of the voice recognition processing is ‘inquiry about weather’” may read on “intent extraction [long short-term memory neural network]”.)

wherein the intent extraction [long short-term memory neural network] is [trained] to extract intent [tags] from [intent extraction training text and corresponding intent extraction training markers].
(TSUNOO, [figs 1-3]; [pars 61-71] “Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “inquiry about weather” may read on “extract intent”.)

Yasa further teaches 
applying an intent extraction [long short-term memory neural network] to the text input, 
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent.”)

wherein the intent extraction [long short-term memory neural network] is trained to extract intent tags from intent extraction training text and corresponding intent extraction training markers.
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “Each recognizer 263 may also include an intent classification (IC) component 264. … For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent.”; [sec 132] “In order to apply machine learning techniques, the machine learning processes themselves need to be trained. Training a machine learning component requires establishing a "ground truth" for the training examples. In machine learning, the term "ground truth" refers to the accuracy of a training set's classification for supervised learning techniques.”; e.g., “{Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones."” may read on “intent tags”.)

TSUNOO, Yasa, Coucke, Xia are combinable with Yasa for the same rationale as set forth above with respect to claim 6.

Xia further teaches
applying an intent extraction long short-term memory neural network to the text input, 
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “intent detection classifier” may read on “applying an intent extraction long short-term memory neural network to the text input”. Note that TSUNOO and Yasa teach “applying an intent extraction [long short-term memory neural network] to the text input”.)

wherein the intent extraction long short-term memory neural network is trained to extract intent tags from intent extraction training text and corresponding intent extraction training markers.
 (Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “intent detection classifier” may read on “intent extraction long short-term memory neural network”. In addition, e.g., “labeled utterances with existing intents” may read on “intent extraction training text and corresponding intent extraction training markers”. Note that TSUNOO and Yasa teach “wherein the intent extraction [long short-term memory neural network] is trained to extract intent tags from intent extraction training text and corresponding intent extraction training markers”.)

TSUNOO, Yasa, Coucke, Xia are combinable with Xia for the same rationale as set forth above with respect to claim 6.

Regarding claim 11, 
TSUNOO, Yasa, Coucke, Xia teach claim 10. 

applying the intent extraction long short-term memory neural network to the text input comprises: (see claim 10)

Xia further teaches 
embedding the text input into at least one neural network input vector; and 
(Xia, [figs 1-2]; [sec 3, p. 3] “Given an input utterance x = (w1, w2, ..., wT) of T words, each word is represented by a vector of dimension DW that can be pre-trained using a skip-gram language model (Mikolov et al., 2013).”)

generating at least one vector representation by analyzing the at least one neural network input vector via a plurality of long short-term memory units of the intent extraction long short-term memory neural network.
(Xia, [fig 1], [fig 2] “During training, utterances with existing intents are fed into the SemanticCaps which output vectorized semantic features, i.e. semantic vectors. Then DetectionCaps combine these features into higher-level prediction vectors and output an activation vector for intent detection on each existing intent.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”)

TSUNOO, Yasa, Coucke, Xia are combinable with Xia for the same rationale as set forth above with respect to claim 6.

Regarding claim 13, 
TSUNOO, Yasa, Coucke, Xia teach claim 6. 

Yasa further teaches 
in response to extracting a verb object pair from the text input: 
querying a customer support database based on the verb object pair; 
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent.”; [pars 85-94] “If at least one NLU hypothesis (represented in the NLU results data 244) satisfies a condition (e.g., a threshold confidence), the orchestrator component 230 may send at least a portion of the NLU results data 244 to a skill 290, thereby invoking the skill 290 to perform an action responsive to the user input. … Types of skills include … music playback skills ( e.g., skills that enable a user to control various music playback applications such as Spotify, Pandora, Amazon Prime Music, etc.) video skills, knowledge base skill (e.g., skill that enables a user to request information on a topic)”)

generating a digital response to the text input based on query results; and 
(Yasa, [figs 1-2]; [pars 85-94] “If at least one NLU hypothesis (represented in the NLU results data 244) satisfies a condition (e.g., a threshold confidence), the orchestrator component 230 may send at least a portion of the NLU results data 244 to a skill 290, thereby invoking the skill 290 to perform an action responsive to the user input. … Types of skills include … music playback skills ( e.g., skills that enable a user to control various music playback applications such as Spotify, Pandora, Amazon Prime Music, etc.) video skills, knowledge base skill (e.g., skill that enables a user to request information on a topic)”)

providing the generated digital response to the client computing device.
(Yasa, [figs 1-2] “Device 110”; [fig 11] “110a” – “110j”; [pars 85-94] “If at least one NLU hypothesis (represented in the NLU results data 244) satisfies a condition (e.g., a threshold confidence), the orchestrator component 230 may send at least a portion of the NLU results data 244 to a skill 290, thereby invoking the skill 290 to perform an action responsive to the user input. … Types of skills include … music playback skills ( e.g., skills that enable a user to control various music playback applications such as Spotify, Pandora, Amazon Prime Music, etc.) video skills, knowledge base skill (e.g., skill that enables a user to request information on a topic)”)

TSUNOO, Yasa, Coucke, Xia are combinable with Yasa for the same rationale as set forth above with respect to claim 6.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ding et al. (Densely Connected Bidirectional LSTM with Applications to Sentence Classification, hereinafter Ding2018)

Regarding claim 7, 
TSUNOO, Yasa, Coucke, Xia teach claim 6. 

applying the intent existence long short-term memory neural network to the text input comprises: (See claim 6)

TSUNOO further teaches 
generating the binary intent existence classification by analyzing the one or more neural network input [vectors] via a plurality of long short-term memory units of the intent existence long short-term memory neural network.
(TSUNOO, [figs 1-3]; see also [pars 61-71]; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”)

However, TSUNOO, Yasa, Coucke, Xia do not teach
embedding the text input into one or more neural network input vectors; and 
generating the binary intent existence classification by analyzing the one or more neural network input [vectors] via a plurality of long short-term memory units of the intent existence long short-term memory neural network.

Ding2018 teaches
embedding the text input into one or more neural network input vectors; and 
(Ding2018, [figs 1-2]; [sec 3] “The input of our model is a variable-length sentence, which can be represented as S = {w1, w2, . . . , ws}. Like other deep learning models, each word is represented as a dense vector extracted from a word embedding matrix.”)

generating the binary intent existence classification by analyzing the one or more neural network input vectors via a plurality of long short-term memory units of the intent existence long short-term memory neural network.
(Ding2018, [figs 1-2]; [sec 3] “The input of our model is a variable-length sentence, which can be represented as S = {w1, w2, . . . , ws}. Like other deep learning models, each word is represented as a dense vector extracted from a word embedding matrix.”; Note that TSUNOO teaches “generating the binary intent existence classification by analyzing the one or more neural network input [vectors] via a plurality of long short-term memory units of the intent existence long short-term memory neural network”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Yasa, Coucke, Xia with the text input embedding of Ding2018. Doing so would lead to obtaining significant improvements over a traditional approach with the same or even less parameters.
(Ding2018, [sec Abs] “We evaluate our proposed model on five benchmark datasets of sentence classification. DC-Bi-LSTM with depth up to 20 can be successfully trained and obtain significant improvements over the traditional Bi-LSTM with the same or even less parameters. Moreover, our model has promising performance compared with the state-of-the-art approaches.”)

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ding et al. (Densely Connected Bidirectional LSTM with Applications to Sentence Classification, hereinafter Ding2018) further in view of Shu et al. (Investigating Lstm with k-Max Pooling for Text Classification)

Regarding claim 8, 
TSUNOO, Yasa, Coucke, Xia, Ding2018 teach claim 7. 

TSUNOO further teaches 
generating the binary intent existence classification further comprises applying a [max pooling layer] to outputs of the plurality of long short-term memory units; and 
(TSUNOO, [figs 1-3]; see also [pars 61-71]; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”)

the plurality of long short-term memory units are organized [bi-directionally in two layers].
(TSUNOO, [figs 1-3]; see also [pars 61-71]; [pars 54-56] as cited above)

Ding2018 further teaches 
generating the intent prediction further comprises applying a [max] pooling layer to outputs of the plurality of long short-term memory units; and 
(Ding2018, [figs 1-2] “Average Pooling”; [sec 3] “This module consists of multiple Bi-LSTM layers. For the first Bi-LSTM layer, the input is a word vector sequence {e(w1), e(w2), . . . , e(ws)}, and the output is 
    PNG
    media_image1.png
    74
    657
    media_image1.png
    Greyscale
, in which 
    PNG
    media_image2.png
    69
    259
    media_image2.png
    Greyscale
 as  described in Section 3.2. For the second Bi-LSTM layer, the input is not the sequence {h11 , h12 , . . . , h1s} (the way stacked RNNs use)”; Note that TSUNOO teaches “generating the binary intent existence classification further comprises applying a [max pooling layer] to outputs of the plurality of long short-term memory units”.)

the plurality of long short-term memory units are organized bi-directionally in two layers.
(Ding2018, [figs 1-2]; [sec 3] “This module consists of multiple Bi-LSTM layers. For the first Bi-LSTM layer, the input is a word vector sequence {e(w1), e(w2), . . . , e(ws)}, and the output is 
    PNG
    media_image1.png
    74
    657
    media_image1.png
    Greyscale
, in which 
    PNG
    media_image2.png
    69
    259
    media_image2.png
    Greyscale
 as  described in Section 3.2. For the second Bi-LSTM layer, the input is not the sequence {h11 , h12 , . . . , h1s} (the way stacked RNNs use)”; Note that TSUNOO teaches “the plurality of long short-term memory units are organized [bi-directionally in two layers]”.)

TSUNOO, Yasa, Coucke, Xia, Ding2108 are combinable with Ding2108 for the same rationale as set forth above with respect to claim 7.

However, TSUNOO, Yasa, Coucke, Xia, Ding2108 do not teach
generating the binary intent existence classification further comprises applying a [max] pooling layer to outputs of the plurality of long short-term memory units.

Shu teaches
generating the binary intent existence classification further comprises applying a max pooling layer to outputs of the plurality of long short-term memory units; and 
(Shu, [fig 1]; [sec III] “To overcome the limitations of a regular RNN. we propose a bidirectional recurrent neural network (BRNN) … bidirectional Long Short-Term Memory. … we next describe a pooling operation that is a generalization of the max pooling over the time dimension used in the Max-TDNN sentence model … k-max pooling”; Note that TSUNOO and Ding2018 teach “generating the binary intent existence classification further comprises applying a [max] pooling layer to outputs of the plurality of long short-term memory units”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Yasa, Coucke, Xia, Ding2018 with the max pooling layer of Shu. Doing so would lead to improving the accuracy of the text classification by dynamically selecting the pooling parameter.
(Shu, [sec IV] “In the experiments, we evaluate and compare our model with several strong baseline methods including: CNN-rand[3], DCNN[2], LSTM/Bi-LSTM[26], fasttext[27], very deep convolutional network (VD-CNN)[23], and character-level convolutional network (CL-CNN) [21].”; [sec 3] “But, as we see next, at intermediate convolutional layers the pooling parameter k is not fixed, but is dynamically selected in order to allow for a smooth extraction of higher order and longer-range features.”)

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ma et al. (End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF)

Regarding claim 12, 
TSUNOO, Yasa, Coucke and Xia teach claim 11. 

applying the intent extraction long short-term memory neural network to the text input further comprises: (see claim 10)

Yasa further teaches 
applying a conditional random field layer [to the at least one vector representation from the plurality of long short-term memory units of the intent extraction long short-term memory neural network] to identify at least one intent tag; and 
 (Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent.”; e.g., “{Object Preposition}: "by," and {Object Modifier}: "the rolling stones."” may read on “at least one intent tag”.)

determining the verb object pair based on the at least one intent tag.
(Yasa, [figs 2]; [pars 73-83] as cited above; e.g., “{Verb}: "Play," {Object}: "mother's little helper,"” along with “{Object Preposition}: "by," and {Object Modifier}: "the rolling stones."” may read on “determining the verb object pair based on the at least one intent tag”.)

TSUNOO, Yasa, Coucke and Xia are combinable with Yasa for the same rationale as set forth above with respect to claim 6.

Xia further teaches 
applying a conditional random field layer [to] the at least one vector representation from the plurality of long short-term memory units of the intent extraction long short-term memory neural network to identify at least one intent tag; and
 (Xia, [fig 1], [fig 2] “During training, utterances with existing intents are fed into the SemanticCaps which output vectorized semantic features, i.e. semantic vectors. Then DetectionCaps combine these features into higher-level prediction vectors and output an activation vector for intent detection on each existing intent.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; Note that Yasa teaches “applying a conditional random field layer [to the at least one vector representation from the plurality of long short-term memory units of the intent extraction long short-term memory neural network] to identify at least one intent tag”.)

TSUNOO, Yasa, Coucke and Xia are combinable with Xia for the same rationale as set forth above with respect to claim 6.

However, TSUNOO, Yasa, Coucke and Xia do not teach
applying a conditional random field layer [to] the at least one vector representation from the plurality of long short-term memory units of the intent extraction long short-term memory neural network to identify at least one intent tag; and

Ma teaches
applying a conditional random field layer to the at least one vector representation from the plurality of long short-term memory units of the intent extraction long short-term memory neural network to identify at least one intent tag; and
(Ma, [figs 1-3]; [sec 2.3-2.4] “BLSTM-CNNs-CRF”; [sec 4, p. 5] “As mentioned before, we evaluate our neural network model on two sequence labeling tasks: POS tagging and NER.”)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Yasa, Coucke and Xia with the CRF and LSTM of Ma. Doing so would lead to improving the accuracy of POS tagging over the previous state-of-the-art systems and easily applying to a wide range of sequence labeling tasks on different languages and domains.
(Ma, [sec 1] “Thus, our model can be easily applied to a wide range of sequence labeling tasks on different languages and domains. We first use convolutional neural networks (CNNs) (LeCun et al., 1989) to encode character-level information of a word into its character-level representation. … Our end-to-end model outperforms previous state-of-the-art systems, obtaining 97.55% accuracy for POS tagging and 91.21% F1 for NER.”)

Claims 14, 15 are rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ding et al. (Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network) further in view of Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION)

Regarding claim 14, 
TSUNOO teaches
A system comprising: 
at least one memory device; and 
(TSUNOO, [pars 41 and 52] “CPU … RAM”)

at least one processing device coupled to the at least one memory device, the at least one processing device to perform operations comprising: 
(TSUNOO, [pars 41 and 52] “CPU … RAM”)

receiving intent existence training [text] comprising an intent existence training marker and an intent extraction training [text] comprising an [open intent extraction training marker], wherein the intent existence training [text] and the intent extraction training [text] are not classified to an intent category;
(TSUNOO, [figs 1-3]; [pars 61-71] “In step ST17, an acoustic feature amount of a voice input during the speech acceptance period is extracted. The processing then proceeds to step ST18. … in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends. Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.” [pars 54-56] “The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”; e.g., “labeled data” may read on “intent existence training marker”. In addition, e.g., “voice input during the speech acceptance period” and “supervised learning with a large amount of labeled data” may read on “intent existence training [text]” since it is appreciated by one of ordinary skill in the art that a supervised learning uses input data and desired output data for training. Furthermore, e.g., “the device operation intention determination unit 101c determines that the user has no operation intention for the agent” may read on “the intent existence training [text] … are not classified to an intent category”.)

training an intent existence long short-term memory neural network to determine whether [text] comprises an intent by: 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

applying the intent existence long short-term memory neural network to the intent existence training [text] to generate a prediction of whether the intent existence training [text] comprises at least one training intent; 
(TSUNOO, [figs 1-3]; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

comparing the prediction of whether the intent existence training [text] comprises the at least one training intent with the intent existence training marker to modify parameters of the intent existence long short-term memory neural network; 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”; e.g., “labeled data” may read on “intent existence training marker”. In addition, e.g., “determines that the user has the operation intention” and “supervised learning with a large amount of labeled data” may read on “comparing the prediction of whether the intent existence training [text] comprises the at least one training intent with the intent existence training marker”.)

[training] an intent extraction [long short-term memory neural network] to extract one or more intents from [text] input by: 
(TSUNOO, [figs 1-3]; [pars 61-71] “Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “inquiry about weather” may read on “intent extraction”.)

TSUNOO, Ding does not teach
receiving intent existence training [text] comprising an intent existence training marker and an intent extraction training [text] comprising an [open intent extraction training marker], wherein the intent existence training [text] and the intent extraction training [text] are [not] classified to an intent category;
training an intent existence long short-term memory neural network to determine whether [text] comprises an intent by:
applying the intent existence long short-term memory neural network to the intent existence training [text] to generate a prediction of whether the intent existence training [text] comprises at least one training intent; 
comparing the prediction of whether the intent existence training [text] comprises the at least one training intent with the intent existence training marker to modify parameters of the intent existence long short-term memory neural network; 
[training] an intent extraction [long short-term memory neural network] to extract one or more intents from [text] input by:
applying the intent extraction long short-term memory neural network to intent extraction training text comprising a training intent to generate an intent comprising a verb and an object; and 
comparing the intent comprising the verb and the object with an intent extraction training marker comprising a training verb and training object to modify parameters of the intent extraction long short-term memory neural network.

Coucke teaches
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an [open intent extraction training marker], wherein the intent existence training text and the intent extraction training text are [not] classified to an intent category;
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; Note that TSUNOO teaches “receiving intent existence training [text] comprising an intent existence training marker and an intent extraction training [text] comprising an [open intent extraction training marker], wherein the intent existence training [text] and the intent extraction training [text] are [not] classified to an intent category”.)

training an intent existence long short-term memory neural network to determine whether text comprises an intent by: 
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; e.g., fig 2 may read on “determine whether text comprises an intent”. Note that TSUNOO teaches “train an intent existence long short-term memory neural network to determine whether [text] comprises an intent”.)

applying the intent existence long short-term memory neural network to the intent existence training text to generate a prediction of whether the intent existence training text comprises at least one training intent; 
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; [sec 3.2.1] “As explained earlier, the ASR engine is required to understand arbitrary formulations of a finite set of intents described in the dataset.”; [sec 4, p. 17] “For each sentence of the speech corpus, we apply the ASR engine followed by the NLU engine, and compare the predicted output to the ground true intent and slots in the dataset.”; Note that TSUNOO teaches “train an intent existence long short-term memory neural network to determine whether [text] comprises an intent”.)

comparing the prediction of whether the intent existence training text comprises the at least one training intent with the intent existence training marker to modify parameters of the intent existence long short-term memory neural network; 
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; [sec 3.2.1] “As explained earlier, the ASR engine is required to understand arbitrary formulations of a finite set of intents described in the dataset.”; [sec 4, p. 17] “For each sentence of the speech corpus, we apply the ASR engine followed by the NLU engine, and compare the predicted output to the ground true intent and slots in the dataset.”; Note that TSUNOO teaches “comparing the prediction of whether the intent existence training [text] comprises the at least one training intent with an intent existence training marker to modify parameters of the intent existence long short-term memory neural network”.)

TSUNOO is combinable with Coucke for the same rationale as set forth above with respect to claim 6.

TSUNOO and Coucke do not teach
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an [open intent extraction training marker], wherein the intent existence training text and the intent extraction training text are [not] classified to an intent category;
[training] an intent extraction [long short-term memory neural network] to extract one or more intents from [text] input by:
applying the intent extraction long short-term memory neural network to the intent extraction training text comprising a training intent to generate an intent comprising a verb and an object; and 
comparing the intent comprising the verb and the object with the open intent extraction training marker comprising a training verb and training object to modify parameters of the intent extraction long short-term memory neural network.

Xia teaches
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an open intent extraction training marker, wherein the intent existence training text and the intent extraction training text are [not] classified to an intent category;
(Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “labeled utterances with existing intents like GetWeather and PlayMusic” may read on “open intent extraction training marker” since they are used as markers in the open intent extraction training.)

training an intent extraction long short-term memory neural network to extract one or more intents from text input by: 
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; Note that TSUNOO teaches “[train] an intent extraction [long short-term memory neural network] to extract one or more intents from [text] input by”.)

applying the intent extraction long short-term memory neural network to the intent extraction training text comprising a training intent to generate an intent comprising a verb and an object; and 
(Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., discriminate intents like “Get Weather” and “Play Music” may read on “generate an intent comprising a verb and an object” since an intent is generated based on the utterance.)

comparing the intent comprising the verb and the object with the open intent extraction training marker comprising a training verb and training object to modify parameters of the intent extraction long short-term memory neural network. 
(Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “loss” may read on “comparing”. In addition, e.g., “labeled utterances with existing intents like GetWeather and PlayMusic” may read on “open intent extraction training marker comprising a training verb and training object” since they are used as markers in the open intent extraction training.)

TSUNOO and Coucke are combinable with Xia for the same rationale as set forth above with respect to claim 6.

In the alternative, Ding can also be interpreted to teach the following limitation:
Ding teaches
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an open intent extraction training marker, wherein the intent existence training text and the intent extraction training text are not classified to an intent category;
(Ding, [sec Training, p. 2392] “The training algorithm repeats for several iterations over the training data, which is a set of sentences annotated with gold standard labels that indicate whether the sentence contains user consumption intention or not.”; [sec Consumption Intention Classification, p. 2393] “In this paper, we formulate consumption intention mining as a binary classification problem, i.e., whether the sentence contains user consumption intention or not.”; e.g., “whether the sentence contains user consumption intention or not” may read on “the intent existence training text … are not classified to an intent category”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Coucke, Xia with the training text of Ding. 
Doing so would lead to improving the accuracy of the binary classification of user intents. (Ding, [sec Introduction] “We propose a new method that exploits domain adaptive CIMM for user consumption intention classification. We report the results that significantly outperform two baseline systems.”; [sec Consumption Intention Classification] “In this paper, we formulate consumption intention mining as a binary classification problem, i.e., whether the sentence contains user consumption intention or not. We use accuracy as the evaluation metric.”)

TSUNOO, Coucke, Xia, Ding do not teach
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an open intent extraction training marker, wherein the intent existence training text and the intent extraction training text are [not] classified to an intent category;

Ravuri teaches
receiving intent existence training text comprising an intent existence training marker and an intent extraction training text comprising an open intent extraction training marker, wherein the intent existence training text and the intent extraction training text are not classified to an intent category;
(Ravuri, [figs 1-3]; [sec 3] “Our second corpus, by contrast is from an actual deployed speech understanding system with open-ended uses and orders of magnitude more data. In both cases, our aim was to define the task such that it was independent of class priors and as similar as possible in nature. … Utterances directed at the system need to be routed to different semantic subsystems based on the domain of discourse (such as communication, weather, etc.), and those for which no specialized handling is available are treated as web search queries. The binary classification task we chose for our study is the detection of web search queries versus all others domains.”; e.g., “those for which no specialized handling is available” may read on “intent extraction training text are not classified to an intent category” since they are not classified for semantic subsystems.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Coucke, Xia, Ding with the open intent of Ravuri. Doing so would lead to demonstrating the effective comparison among different algorithms for binary classification with the open intent extraction training set, and showing that gated unit networks outperform the state-of-the-art classifiers.
(Ravuri, [sec 3] “The binary classification task we chose for our study is the detection of web search queries versus all others domains.” [sec 5] “The relative ordering of models according to performance was quite consistent: gated recurrent networks (GRU and LSTM) were best, with roughly equivalent performance, followed by regular recurrent networks, followed by feedforward networks.”)

Regarding claim 15, 
TSUNOO, Coucke, Xia, Ding, Ravuri teach claim 14. 

TSUNOO further teaches 
applying the intent existence long short-term memory neural network to the intent existence training [text] to generate a prediction of whether the intent existence training [text] comprises at least one training intent comprises applying the intent existence long short-term memory neural network to: a plurality of positive [text] inputs comprising at least one training intent, and a plurality of [negative text] inputs [comprising no] training intent.
(TSUNOO, [figs 1-3]; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

Coucke further teaches 
applying the intent existence long short-term memory neural network to the intent existence training text to generate a prediction of whether the intent existence training text comprises at least one training intent comprises applying the intent existence long short-term memory neural network to: a plurality of positive text inputs comprising at least one training intent, and a plurality of [negative] text inputs [comprising no] training intent.
(Coucke, [figs 1-2 and 6]; [tables 9 and 11]; [sec 2.1] “To train the acoustic model, we need several hundreds to thousands of hours of audio data with corresponding transcripts.”; [sec 3.1] “The dataset used to train both the LM and NLU contains written queries exemplifying intents that depend on entities.”; [sec 3.2.1] “As explained earlier, the ASR engine is required to understand arbitrary formulations of a finite set of intents described in the dataset.”; [sec 4, p. 17] “For each sentence of the speech corpus, we apply the ASR engine followed by the NLU engine, and compare the predicted output to the ground true intent and slots in the dataset.”;)

TSUNOO, Coucke, Xia, Ding, Ravuri are combinable with Coucke for the same rationale as set forth above with respect to claim 6.

Ding teaches
applying the intent existence long short-term memory neural network to the intent existence training text to generate a prediction of whether the intent existence training text comprises at least one training intent comprises applying the intent existence long short-term memory neural network to: a plurality of positive text inputs comprising at least one training intent, and a plurality of negative text inputs comprising no training intent.
(Ding, [sec Training, p. 2392] “The training algorithm repeats for several iterations over the training data, which is a set of sentences annotated with gold standard labels that indicate whether the sentence contains user consumption intention or not.”; [sec Consumption Intention Classification, p. 2393] “In this paper, we formulate consumption intention mining as a binary classification problem, i.e., whether the sentence contains user consumption intention or not.”)

TSUNOO, Coucke, Xia, Ding, Ravuri are combinable with Ding for the same rationale as set forth above with respect to claim 14.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ding et al. (Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network) further in view of Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION) further in view of Nguyen et al. (A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing)

Regarding claim 16, 
TSUNOO, Coucke, Xia, Ding, Ravuri teach claim 15.

Xia further teaches 
applying the intent extraction long short-term memory neural network to the intent extraction training text comprises (see claim 14)
applying the intent extraction long short-term memory neural network to [dependency parser] training data. 
(Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”)

TSUNOO, Coucke, Xia, Ding, Ravuri are combinable with Xia for the same rationale as set forth above with respect to claim 6.

However, TSUNOO, Coucke, Xia, Ding, Ravuri do not teach
applying the intent extraction long short-term memory neural network to [dependency parser] training data. 

Nguyen teaches
applying the intent extraction long short-term memory neural network to dependency parser training data. 
(Nguyen, [fig 1] “LSTM” and “Illustration of our jPTDP for joint POS tagging and graph-based dependency parsing”; [sec 1] “we propose a novel neural architecture for joint POS tagging and graph-based dependency parsing. Our model learns latent feature representations shared for both POS tagging and dependency parsing tasks by using BiLSTM— the bidirectional LSTM (Schuster and Paliwal, 1997; Hochreiter and Schmidhuber, 1997).”; [secs 3.1-3.2] “we conduct multilingual experiments on 19 languages from the Universal Dependencies (UD) treebanks1 v1.2 (Nivre et al., 2015), using the universal POS tagset (Petrov et al., 2012)”; [sec 4, p. 5] “Our team MQuni participated with jPTDP in the CoNLL 2017 shared task on multilingual parsing from raw text to universal dependencies (Zeman et al., 2017). Training data are 60+ universal dependency treebanks for 40+ languages from UD v2.0 (Nivre et al., 2017a).”)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Coucke, Xia, Ding, Ravuri with the training markers of Nguyen. Doing so would lead to improving the accuracy of POS tag extraction and achieving improved parsing performance.
(Nguyen, [sec 1] “joint learning both POS tagging and dependency parsing has gained more attention because: i) more accurate POS tags could lead to improved parsing performance and ii) the the syntactic context of a parse tree could help resolve POS ambiguities … our joint model performs better than strong baselines and especially outperforms the neural network-based Stack-propagation model for joint POS tagging and transition-based dependency parsing (Zhang and Weiss, 2016), achieving a new state of the art.”)

Claims 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Coucke et al. (Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces) further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ding et al. (Mining User Consumption Intention from Social Media Using Domain Adaptive Convolutional Neural Network) further in view of Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION) further in view of Nguyen et al. (A Novel Neural Network Model for Joint POS Tagging and Graph-based Dependency Parsing) further in view of Manning et al. (The Stanford CoreNLP Natural Language Processing Toolkit)

Regarding claim 17, 
TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen teach claim 16.
	
Nguyen further teaches 
the dependency parser training data comprises training sentences labeled for verbs and objects via a dependency parsing model from unlabeled sentences.
(Nguyen, [fig 1] “LSTM” and “Illustration of our jPTDP for joint POS tagging and graph-based dependency parsing”; [sec 1] “we propose a novel neural architecture for joint POS tagging and graph-based dependency parsing. Our model learns latent feature representations shared for both POS tagging and dependency parsing tasks by using BiLSTM— the bidirectional LSTM (Schuster and Paliwal, 1997; Hochreiter and Schmidhuber, 1997).”; [secs 3.1-3.2] “we conduct multilingual experiments on 19 languages from the Universal Dependencies (UD) treebanks1 v1.2 (Nivre et al., 2015), using the universal POS tagset (Petrov et al., 2012)”; [sec 4, p. 5] “Our team MQuni participated with jPTDP in the CoNLL 2017 shared task on multilingual parsing from raw text to universal dependencies (Zeman et al., 2017). Training data are 60+ universal dependency treebanks for 40+ languages from UD v2.0 (Nivre et al., 2017a). … For parsing from raw text to universal dependencies, we utilize CoNLL-U test files preprocessed by the baseline UDPipe 1.1 (Straka et al., 2016).”; Note that teaches Straka et al. (UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing) teaches “UDPipe, a pipeline processing CoNLL-U-formatted files, performs tokenization, morphological analysis, part-of-speech tagging, lemmatization and dependency parsing for nearly all treebanks of Universal Dependencies 1.2” in Abstract.)

TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen are combinable with Nguyen for the same rationale as set forth above with respect to claim 16.

In the alternative, Manning can also be interpreted to teach the following limitation:
Manning teaches
the dependency parser training data comprises training sentences labeled for verbs and objects via a dependency parsing model from unlabeled sentences.
(Manning, [figs 1-2] “Raw text” and “Annotated text”; [sec 4] “parse: Provides full syntactic analysis, including both constituent and dependency representation, based on a probabilistic parser”; Note that Nguyen teaches “the dependency parser training data comprises training sentences labeled for verbs and objects [via a dependency parsing model] from unlabeled sentences”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the intent prediction system of TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen with the dependency parsing model of Manning. Doing so would lead to quickly and painlessly get linguistic annotations for a text with a lightweight framework.
(Manning, [sec 2] “The motivations were: • To be able to quickly and painlessly get linguistic annotations for a text. • To hide variations across components behind a common API. • To have a minimal conceptual footprint, so the system is easy to learn. • To provide a lightweight framework, using plain Java objects (rather than something of heavier weight, such as XML or UIMA’s Common Analysis System (CAS) objects).”)

Regarding claim 18, 
TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen, Manning teach claim 17.

Xia further teaches 
applying the intent extraction long short-term memory neural network to the intent extraction training text further comprises (see claim 14)
applying the intent extraction long short-term memory neural network to [user-labeled] training data.
(Xia, [figs 1-2] “Get Weather”, “Play Music” and “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states”; [sec 4] “SNIPS Natural Language Understanding benchmark (SNIPS-NLU) and a Commercial Voice Assistant (CVA) dataset. The statistical information on two datasets are shown in Table 2. SNIPS-NLU1 is an English natural language corpus collected in a crowdsourced fashion to benchmark the performance of voice assistants.”)

TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen, Manning are combinable with Xia for the same rationale as set forth above with respect to claim 6.

Ding further teaches 
applying the intent extraction long short-term memory neural network to user-labeled training data.
(Ding, [figs ]; [sec Problem Statement] “we manually labelled 1,000 tweets from Sina Weibo (the most popular microblogging service in China) that contain user consumption intention, of which 625 tweets contain implicit consumption intention and 375 tweets contain explicit consumption intention.”; [sec Experiments - Data Description] “To our knowledge, there is no public corpus for evaluating the task of consumption intention mining. Hence, we constructed a manually annotated sub corpus.”; Note that Xia teaches “applying the intent extraction long short-term memory neural network to [user-labeled] training data”.)

TSUNOO, Coucke, Xia, Ding, Ravuri, Nguyen, Manning are combinable with Ding for the same rationale as set forth above with respect to claim 14.

Claims 21, 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION)

Regarding claim 21
TSUNOO teaches
A computer-implemented method comprising: 
generating, utilizing a first set of layers of an intent existence long short-term memory neural network, a plurality of feature vectors from a plurality of words of a [text] input;
 generating, utilizing a second set of layers of the intent existence long short-term memory neural network, a binary intent existence classification from the plurality of feature vectors; and 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. By this processing, vector information is calculated from each of a previously stored activation word and the current acoustic feature amount, and the vector information is input in parallel to a neural network of multiple layers at a latter stage. In the present example, two vectors are simply concatenated and input as one vector. In a final layer, a two-dimensional value indicating whether or not there is an operation intention for the agent 10 is calculated, and a discrimination result is output by a softmax function or the like. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a [verb] object [pair] from the [text] input.
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends. Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. Then, processing according to a result of the voice recognition processing is performed under control of the control unit 101. The processing according to the result of the voice recognition processing can be appropriately changed in accordance with a function of the agent 10. For example, in a case where the result of the voice recognition processing is "inquiry about weather", for example, the control unit 101 controls the communication unit 104 to acquire information regarding weather from an external device.”; e.g., “in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice” may read on “in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, determining an [unclassified, open] intent”.)

	However, TSUNOO does not appear to distinctly disclose:
generating, utilizing a first set of layers of an intent existence long short-term memory neural network, a plurality of feature vectors from a plurality of words of a [text] input;
in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a [verb] object [pair] from the [text] input.

Yasa teaches
generating, utilizing a first set of layers of an intent existence long short-term memory neural network, a plurality of feature vectors from a plurality of words of a text input;
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a verb object pair from the text input.
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent. … [0.95] Intent: <PlayMusic> ArtistName: Lady Gaga SongName: Poker Face, [0.95] Intent: <PlayVideo> ArtistName: Lady Gaga VideoName: Poker Face”; e.g., “{Verb}: "Play," {Object}: "mother's little helper,"” may read on “verb object pair”. Note that TSUNOO teaches “generating, utilizing a first set of layers of an intent existence long short-term memory neural network, a plurality of feature vectors from a plurality of words of a [text] input; in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a [verb] object [pair] from the [text] input”.)

	The combination of TSUNOO is combinable with Yasa for the same rationale as set forth above with respect to claim 6.

	However, the combination of TSUNOO, Yasa does not appear to distinctly disclose:
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a verb object pair from the text input.

	Xia teaches
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an unclassified, open intent by extracting a verb object pair from the text input.
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism. For emerging intents, INTENTCAPSNET-ZSL builds zero-shot DetectionCaps that utilize the (1) outputs of SemanticCaps, (2) the routing information on existing intents from DetectionCaps, and (3) similarities of the emerging intent label to existing intent labels to discriminate emerging intents like AddToPlayist from RateABook.”; e.g., “emerging intents” may read on “determining an unclassified, open intent” since “emerging intents” are not classified during the training process. Note that the combination of TSUNOO, Yasa teaches “in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an [unclassified, open] intent by extracting a verb object pair from the text input”.)

In the alternative, Ravuri can also be interpreted to teach the following limitation:
Ravuri teaches
in response to determining that one or more intents exist in the text input based on the binary intent existence classification, determining an unclassified, open intent by extracting a verb object pair from the text input
(Ravuri, [figs 1-3]; [sec 3] “Utterances directed at the system need to be routed to different semantic subsystems based on the domain of discourse (such as communication, weather, etc.), and those for which no specialized handling is available are treated as web search queries. The binary classification task we chose for our study is the detection of web search queries versus all others domains.”; e.g., “no specialized handling is available” may read on “determining an unclassified, open intent”.)

	The combination of TSUNOO, Yasa, Xia is combinable with Ravuri for the same rationale as set forth above with respect to claim 14.

Regarding claim 23
	The combination of TSUNOO, Yasa, Xia, Ravuri teaches claim 21.
determining the unclassified, open intent comprises (see claim 21)

Yasa further teaches 
extracting the verb object pair from the text input utilizing an intent extraction [long short-term memory] neural network.
(Yasa, [figs 2] “Text Data 213” and “Device 110”; [pars 73-83] “For example, an NER component 262 may parse text data to identify words as subject, object, verb, preposition, etc. based on grammar rules and/or models prior to recognizing named entities in the text data. … An NER component 262 may parse text data using heuristic grammar rules, or a model may be constructed using techniques such as Hidden Markov Models, maximum entropy models, log linear models, conditional random fields (CRF), and the like. For example, an NER component 262 implemented by a music skill recognizer may parse and tag text data corresponding to "play mother's little helper by the rolling stones" as {Verb}: "Play," {Object}: "mother's little helper," {Object Preposition}: "by," and {Object Modifier}: "the rolling stones." The NER component 262 identifies "Play" as a verb based on a word database associated with the music skill, which an IC component 264 (also implemented by the music skill recognizer) may determine corresponds to a <PlayMusic> intent. … [0.95] Intent: <PlayMusic> ArtistName: Lady Gaga SongName: Poker Face, [0.95] Intent: <PlayVideo> ArtistName: Lady Gaga VideoName: Poker Face” [par(s) 47] “the wakeword detection component 220 may be built on deep neural network (DNN)/recursive neural network (RNN) structures directly”; e.g., “{Verb}: "Play," {Object}: "mother's little helper,"” may read on “verb object pair”. Note that TSUNOO teaches “in response to determining that one or more intents exist in the [text] input based on the binary intent existence classification, extracting an [open] intent by extracting a [verb] object [pair] from the [text] input”.)

The combination of TSUNOO, Yasa, Xia, Ravuri is combinable with Yasa for the same rationale as set forth above with respect to claim 6.

Xia further teaches 
extracting the verb object pair from the text input utilizing an intent extraction long short-term memory neural network.
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “intent detection classifier” may read on “intent extraction long short-term memory neural network”. Note that TSUNOO and Yasa teach “applying an intent extraction [long short-term memory neural network] to the text input”.)

The combination of TSUNOO, Yasa, Xia, Ravuri is combinable with Xia for the same rationale as set forth above with respect to claim 6.

Regarding claim 24
	The combination of TSUNOO, Yasa, Xia, Ravuri teaches claim 23.

TSUNOO further teaches 
	generating an additional binary intent existence classification from an additional [text] input utilizing the intent existence long short-term memory neural network; and 
(TSUNOO, [figs 1-3]; [pars 61-71] “in a case where the device operation intention determination unit 101c determines that the user has the operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “1”, and in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends”; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”;)

in response to determining that no intent exists in the additional [text] input based on the additional binary intent existence classification, refraining from applying the intent extraction [long short-term memory neural network] to the additional [text] input. 
 (TSUNOO, [figs 1-3]; [pars 61-74] “in a case where the device operation intention determination unit 101c determines that the user has no operation intention for the agent 10, the device operation intention determination unit 101c outputs a logical value of “0”. Then, the processing ends. Note that, in a case where it is determined that the user has the operation intention for the agent 10, the voice recognition unit 101d performs voice recognition processing on an input voice although the processing is not illustrated in FIG. 3. … it is possible to determine the presence or absence of the operation intention for the agent without waiting for a result of voice recognition processing involving matching with a plurality of patterns. Furthermore, it is possible to prevent the agent from malfunctioning due to a speech without the operation intention for the agent. … when the presence or absence of the operation intention for the agent is determined, the voice recognition involving matching with a plurality of patterns is not directly used, and thus it is possible to a determination by simple processing.”; e.g., “prevent the agent from malfunctioning due to a speech without the operation intention for the agent” and “when the presence or absence of the operation intention for the agent is determined, the voice recognition involving matching with a plurality of patterns is not directly used” may read on “in response to determining that no intent exists in the additional [text] input based on the additional binary intent existence classification, refraining from applying the intent extraction [long short-term memory neural network] to the additional [text] input”.)

Xia further teaches 
generating an additional binary intent existence classification from an additional text input utilizing the intent existence long short-term memory neural network; and 
in response to determining that no intent exists in the additional text input based on the additional binary intent existence classification, refraining from applying the intent extraction long short-term memory neural network to the additional text input. 
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism.”; [sec 3] “A recurrent neural network such as a bidirectional LSTM (Hochreiter and Schmidhuber, 1997) is applied to sequentially encode the utterance into hidden states … In this paper, a capsule-based model, namely INTENTCAPSNET, is first introduced to harness the advantages of capsule models for text modeling in a hierarchical manner: semantic features are extracted from the utterances with self-attention, and aggregated via the dynamic routing-by-agreement mechanism to obtain utterance-level intent representations.”; e.g., “intent detection classifier” may read on “applying an intent extraction long short-term memory neural network to the text input”. Note that TSUNOO and Yasa teach “applying an intent extraction [long short-term memory neural network] to the text input”.)

The combination of TSUNOO, Yasa, Xia, Ravuri is combinable with Xia for the same rationale as set forth above with respect to claim 6.

Regarding claim 25
	The combination of TSUNOO, Yasa, Xia, Ravuri teaches claim 21.

	Yasa further teaches 
	generating a digital text response based on the [unclassified, open] intent.
	(Yasa, [par(s) 135] e.g., “the orchestrator 230 may generate output data based on the alternative utterance text 452, including, but not limited to, output audio/text data requesting a confirmation from the user to proceed with the alternative utterance text.”;)

The combination of TSUNOO, Yasa, Xia, Ravuri is combinable with Yasa for the same rationale as set forth above with respect to claim 6.

	Xia further teaches 
	generating a digital text response based on the unclassified, open intent.
(Xia, [figs 1-2] “labeled utterances with existing intents like GetWeather and PlayMusic are used to train an intent detection classifier among existing intents, in which SemanticCaps extract intepretable semantic features and DetectionCaps dynamically aggregate semantic features for intent detection using a novel routing-by-agreement mechanism. For emerging intents, INTENTCAPSNET-ZSL builds zero-shot DetectionCaps that utilize the (1) outputs of SemanticCaps, (2) the routing information on existing intents from DetectionCaps, and (3) similarities of the emerging intent label to existing intent labels to discriminate emerging intents like AddToPlayist from RateABook.”; e.g., “emerging intents” may read on “determining an unclassified, open intent” since “emerging intents” are not classified during the training process. In addition, e.g., “AddToPlayist from RateABook” may read on “generating a digital text response”. Note that Yasa teaches “generating a digital text response based on the [unclassified, open] intent”.
Examiner notes that par(s) 51 of the Instant Specification describe(s) “For example, the open intent system 102 can: generate a digital text response (e.g., in a text chat), perform a requested function (e.g., perform a requested edit of a digital, generate a requested report or digital visualization), identify and report common issues reported to customer support or public forms; recognize and report missing functionality in applications with conversational or natural language interfaces; highlight calls to action in emails, documents, or recorded meetings and conversations; generate a digital summary; route inquiries to specialized services; assign customer support requests to experts; and/or recommend content to the user of the client computing device 112”.)

The combination of TSUNOO, Yasa, Xia, Ravuri is combinable with Xia for the same rationale as set forth above with respect to claim 6.

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over TSUNOO (US 2020/0410987 A1) in view of Yasa et al. (US 2020/0184959 A1), further in view of Xia et al. (Zero-shot User Intent Detection via Capsule Neural Networks) further in view of Ravuri et al. (A COMPARATIVE STUDY OF NEURAL NETWORK MODELS FOR LEXICAL INTENT CLASSIFICATION) further in view of Ding et al. (Densely Connected Bidirectional LSTM with Applications to Sentence Classification, hereinafter Ding2018) further in view of Shu et al. (Investigating Lstm with k-Max Pooling for Text Classification)

Regarding claim 22
	The combination of TSUNOO, Yasa, Xia, Ravuri teaches claim 21.

TSUNOO further teaches 
generating the binary intent existence classification further comprises applying a [max pooling layer] to outputs of the intent existence long short-term memory neural network; and 
(TSUNOO, [figs 1-3]; see also [pars 61-71]; [pars 54-56] “In processing at a former stage, conversion processing is performed on the extracted acoustic feature amount by a neural network (NN) of multiple layers, and then processing of accumulating information in a time series direction is performed. For this processing, statistics such as average and variance may be calculated, or a time series processing module such as long short time memory (LSTM) may be used. … The device operation intention determination unit 101c described above learns parameters by performing supervised learning with a large amount of labeled data in advance. Learning the former and latter stages in an integrated manner enables more optimal learning of a discriminator.”)

the first set of layers and the second set of layers of the intent existence long short-term memory neural network are organized [bi-directionally]. 
(TSUNOO, [figs 1-3]; see also [pars 61-71]; [pars 54-56] as cited above)

However, the combination of TSUNOO, Yasa, Xia, Ravuri does not appear to distinctly disclose:
generating the binary intent existence classification further comprises applying a [max pooling layer] to outputs of the intent existence long short-term memory neural network; and 
the first set of layers and the second set of layers of the intent existence long short-term memory neural network are organized [bi-directionally]. 

Ding2018 further teaches 
generating the binary intent existence classification further comprises applying a [max] pooling layer to outputs of the intent existence long short-term memory neural network; and
(Ding2018, [figs 1-2] “Average Pooling”; [sec 3] “This module consists of multiple Bi-LSTM layers. For the first Bi-LSTM layer, the input is a word vector sequence {e(w1), e(w2), . . . , e(ws)}, and the output is 
    PNG
    media_image1.png
    74
    657
    media_image1.png
    Greyscale
, in which 
    PNG
    media_image2.png
    69
    259
    media_image2.png
    Greyscale
 as  described in Section 3.2. For the second Bi-LSTM layer, the input is not the sequence {h11 , h12 , . . . , h1s} (the way stacked RNNs use)”; Note that TSUNOO teaches “generating the binary intent existence classification further comprises applying a [max pooling layer] to outputs of the intent existence long short-term memory neural network”.)

the first set of layers and the second set of layers of the intent existence long short-term memory neural network are organized bi-directionally.
(Ding2018, [figs 1-2]; [sec 3] “This module consists of multiple Bi-LSTM layers. For the first Bi-LSTM layer, the input is a word vector sequence {e(w1), e(w2), . . . , e(ws)}, and the output is 
    PNG
    media_image1.png
    74
    657
    media_image1.png
    Greyscale
, in which 
    PNG
    media_image2.png
    69
    259
    media_image2.png
    Greyscale
 as  described in Section 3.2. For the second Bi-LSTM layer, the input is not the sequence {h11 , h12 , . . . , h1s} (the way stacked RNNs use)”; Note that TSUNOO teaches “the first set of layers and the second set of layers of the intent existence long short-term memory neural network are organized [bi-directionally]”.)

TSUNOO, Yasa, Xia, Ravuri are combinable with Ding2108 for the same rationale as set forth above with respect to claim 7.

However, TSUNOO, Yasa, Xia, Ravuri, Ding2108 do not teach
generating the binary intent existence classification further comprises applying a [max] pooling layer to outputs of the intent existence long short-term memory neural network.

Shu teaches
generating the binary intent existence classification further comprises applying a max pooling layer to outputs of the intent existence long short-term memory neural network; and 
(Shu, [fig 1]; [sec III] “To overcome the limitations of a regular RNN. we propose a bidirectional recurrent neural network (BRNN) … bidirectional Long Short-Term Memory. … we next describe a pooling operation that is a generalization of the max pooling over the time dimension used in the Max-TDNN sentence model … k-max pooling”; Note that TSUNOO and Ding2018 teach “generating the binary intent existence classification further comprises applying a [max] pooling layer to outputs of the intent existence long short-term memory neural network”.)

TSUNOO, Yasa, Xia, Ravuri, Ding2018 are combinable with Shu for the same rationale as set forth above with respect to claim 8.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kim et al. (INTENT DETECTION USING SEMANTICALLY ENRICHED WORD EMBEDDINGS) teaches verbs and nouns for detecting intents.
Zhao et al. (US 2020/0251091 A1) teaches zero-shot intent estimation.
Straka et al. (UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing) teaches POS tagging and parsing.
Ding et al. (Densely Connected Bidirectional LSTM with Applications to Sentence Classification, hereinafter Ding2018) teaches a bi-directional LSTM in two layers.
Abdel-Hamid et al. (Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition) teaches a soft max pooling layer.
Ma et al. (End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF) teaches a conditional random field layer.
Liu et al. (Using Artificial Intelligence (Watson for Oncology) for Treatment Recommendations Amongst Chinese Patients with Lung Cancer: Feasibility Study) teaches not recommending for a case.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.K./Examiner, Art Unit 2129                                                                                                                                                                                                        
8/18/2022
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129