DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: Intent Prediction by Machine Learning with Word and Sentence Features for Routing User Requests.
The disclosure is objected to because it contains an embedded hyperlink and/or other form of browser-executable code.  Applicants are required to delete the embedded hyperlink and/or other form of browser-executable code.  See MPEP §608.01.
In ¶[0023], there is browser executable code that must be deleted.
In ¶[0058], there is browser executable code that must be deleted.
The disclosure is objected to because of the following informalities:
In ¶[0020], “tax from field names” should be “tax form field names”.
In ¶[0024], “reflect the important” should be “reflect the importance”.
In ¶[0039], “raking approach” should be “ranking approach”.
In ¶[0040], “may modified” should be “may be modified”.
In ¶[0041], “efficiently resole” should be “efficiently resolve”.
In ¶[0041], “handle request” should be “handle a request”.
In ¶[0047], “Table 1 below” should be “Table 2 below”.
In ¶[0051], there should be a reference to Step 404 of Figure 4 instead of two references to Step 402.  
In ¶0052], “models the determine” should be “models to determine”.
In ¶[0054], “cutomer base” should be “customer base”.
In ¶[0057], “into as word embeddings” should be “into word embeddings” or “as word embeddings”.
In ¶[0057], “At step 506, The” should be “At step 506, the”.
In ¶[0059], “or retaining” should be “or retraining”.
In ¶[0060], “To retain the model” should be “To retrain the model”.
In ¶[0060], “by be retrained” should be “may be retrained”.
Appropriate correction is required.

Claim Objections
Claims 9 and 19 are objected to because of the following informalities:
Claims 9 and 19 set forth a limitation of “the plurality of classification layers”, which lacks express antecedent basis.  The claims depend upon independent claims 1 and 11, which set forth “a set of classification layers”, but not “a plurality of classification layers”.
Appropriate correction is required.




Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 2, 6 to 9, 11 to 12, and 16 to 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bilgory et al. (U.S. Patent Publication 2019/0130905) in view of Bhattacharya et al. (U.S. Patent Publication 2021/0174015).
Concerning independent claims 1 and 11, Bilgory et al. discloses a method and system for routing a verbal input to a selected handler using an intent feature, comprising:
“acquiring a user input including a string of text data describing a user request and session data describing a context of the string of text data” – a verbal input is received from a user (Abstract); verbal inputs are routed to verbal content handlers based on content and optionally context of the verbal inputs (“a context of the string of text data”) (¶[0001]); an entity is extracted from verbal input using one or more verbal analysis tools; an entity may be a target device, a target application; a selected handler is selected based on one or more context attributes, where the context attributes can include a geographic location of the user obtained from one or more location detection tools (¶[0024] - ¶[0029]); routing of verbal inputs, e.g., a textual input (“text data”), may be based on a target application (¶[0070]); routing is based on context attributes that may include geographical location (¶[0073]); Compare Specification, ¶[0013], ¶[0018], ¶[0021], ¶[0022], and ¶[0033], which describes session data as including context information of geography, or a UI screen; implicitly, verbal inputs are “a string of text”;
 “determining an intent prediction for the user input based on providing the context features [and the sentence features to a set of classification layers included in the machine learning model], wherein the intent prediction describes information for facilitating a resolution of the user request” – routing the verbal inputs to their designated verbal content handler is done by estimating and/or predicting the intent (“an intent prediction”), intention, goal, and/or objective of the user as expressed in the received verbal input (¶[0053]); predefined verbal inputs may be identified and/or characterized by one or more features including the intent, intention, goal, and/or objective (¶[0077]: Figure 2); predefined verbal inputs may be identified and characterized by one or more predefined feature, e.g., the intent, intention, goal, and/or objective (¶[0083]: Figure 2);
“routing the user request based on the intent prediction” – each of the verbal content identifiers is adapted to evaluate an association of the verbal input with a respective one of a plurality of handlers by computing a match confidence value for one or more features including the intent expressed by the user and routing the verbal input to a selected one of the handlers based on the matching confidence value (Abstract); a selected handler is selected based on detection of one or more mandatory entities predefined to appear in the verbal input in conjunction with the intent (¶[0030]); routing is based on analyzing verbal input and extracting one or more features for intent, intention, purpose, objective, goal, etc. (¶[0070]).
Concerning independent claims 1 and 11, Bilgory et al. generally discloses determining an intent from verbally inputted text and session data describing a content of a verbal input so that an intent prediction can be used to route a user request to an appropriate handler.  Here, Bilgory et al.’s “session data describing a context of the string of text data” is disclosed to include context attributes of a geographical location or a target application.  Compare Applicants’ Specification, which describes session data as including context information that may be geography or UI screen.  However, Bilgory et al. does not expressly disclose “manipulating, by a preprocessing package, the string of text data to generate a plurality of text elements”, “manipulating, by a preprocessing package, the session data to generate context features”, and omits “determining word features based on providing the plurality of text elements to a first set of feature extraction layers included in a machine learning model”, and “determining sentence features based on providing the word features to a second set of feature extraction layers included in the machine learning model”.  Additionally, Bilgory et al. does not disclose performing intent prediction by “providing context features and the sentence features to a set of classification layers included in the machine learning model”.  That is, Bilgory et al. is not directed to details of using a machine learning model to determine an intent using preprocessing of word features and using sentence features.  Still, Bilgory et al. arguably discloses “preprocessing” of text data and session data simply because these are extracted from user input.    
Concerning independent claims 1 and 11, Bhattacharya et al. teaches whatever limitations are omitted by Bilgory et al.  Generally, Bhattacharya et al. teaches artificial intelligence for identifying content related to specific tasks using a machine learning model trained to rank sentences based on their relevance to a task.  The machine learning model may comprise an embedding layer for generating an embedding of each word, a sentence aggregation layer for aggregating the embeddings for each word into a distinct embedding for each of the sentences, a contextual aggregation layer for aggregating each distinct embedding for each of the sentences into a contextual embedding, and a scoring layer for scoring and ranking each of the sentences based on their relevance to the task.  (Abstract)  The machine learning model may have a plurality of layers.  (¶[0004])  Natural language models are trained to identify and/or classify textual content into task types, e.g., a schedule meeting task type, an insert object task type, a summarize content task type, and an identify sentiment task type.  (¶[0020])  Artificial intelligence is used to identify relevant content to schedule a meeting task intent.  (¶[0028]: Figure 1)  Extraction layer 228 may identify and/or extract entities that are relevant to a schedule meeting intent.  (¶[0045]: Figure 2)  Bhattacharya et al., then, teaches that a task is equivalent to an intent.  Specifically, Bhattacharya et al. teaches: 
“manipulating, by a preprocessing package, the string of text data to generate a plurality of text elements” – a tokenizer is applied to an email 304, and sentence fragments are tagged as sentences by a tokenizer (¶[0050]: Figure 3); sentences are tokenized from natural language input by a sentence tokenizer (¶[0069]: Figure 3); broadly, tokenizing is an operation of “preprocessing”;
“manipulating, by the preprocessing package, the session data to generate context features” – a contextual aggregation layer aggregates each distinct embedding for each of the sentences into a contextual embedding (¶[0035]); machine learning model 232 including contextual word embedding layer 214 and contextual sentence aggregation layer 218 (¶[0039]: Figure 2); contextual sentence aggregation layer 218 aggregates each distinct embedding for each of sentences 202 into a contextual embedding for each of sentences 202 (¶[0042]: Figure 2); here, an embedding of contextual information produces “context features”; 
“determining word features based on providing the plurality of text elements to a first set of feature extraction layers included in a machine learning model” – a machine learning model may have a plurality of layers; an embedding layer generates an embedding for each word in each of the sentences (¶[0004]); machine learning model 128 includes an embedding layer for generating an embedding for each word in each of the sentences (¶[0035]: Figure 1); here, a word embedding layer in machine learning is “a first set of feature extraction layers”; that is, an embedding layer extracts features; Compare Specification, ¶[0025] - ¶[0026];
“determining sentence features based on providing the word features to a second set of feature extraction layers included in the machine learning model” – a machine learning model may include a distinct sentence aggregation layer for aggregating the embeddings for each word into a distinct embedding for each of the sentences (¶[0004]); a sentence aggregation layer aggregates embeddings for each word into a distinct embedding for each of the sentences (¶[0035]: Figure 1); distinct sentence aggregation layer 216 aggregates the embeddings for each word in sentences 202 into a distinct embedding for each of sentences 202 (¶[0041]: Figure 2); here, a sentence embedding layer in machine learning is “a second set of feature extraction layers”; that is, an embedding layer extracts features; Compare Specification, ¶[0025] - ¶[0026];
“determining an intent prediction for the user input based on providing the context features and the sentence features to a set of classification layers included in the machine learning model” – a scoring layer scores and ranks each of the sentences based on their relevance to a schedule meeting task (Abstract); a machine learning model may comprise a scoring layer for scoring and ranking each of the sentences based on their relevance to a task; in scoring and ranking each sentence, the scoring layer may apply a classifier function to each contextual embedding for each of the plurality of sentences (“a set of classification layers”) (¶[0025]); response modules 130 may include a plurality of processing layers which process the sentences that are identified as being relevant to a schedule meeting task (¶[0037]: Figure 1); response matching layer 232 may identify one or more responses, actions, and/or operations to perform in relation to a task type (¶[0045]: Figure 2). 
Concerning independent claims 1 and 11, Bhattacharya et al., then, teaches “determining word features” and “determining sentence features” by a word embedding layer and a sentence embedding layer, and then uses a sentence scoring layer 220 and layers of a response module 224 for “determining an intent prediction” of an intent task using sentence embeddings and context embedding with “a set of classification layers included in the machine learning model”.  Broadly, tokenizing sentences can be construed as “preprocessing” “to generate a plurality of text elements” and analogous “preprocessing” can be applied to context attributes that are extracted as entities by Bilgory et al.  An objective is to identify relevant portions of large text documents related to specific task types to reduce false positives, reduce back-and-forth email messages, and reduce processing costs.  (¶[0027])  It would have been obvious to one having ordinary skill in the art to determine a task intent using machine learning with word embeddings and sentence embeddings as taught by Bhattacharya et al. to route a verbal input to a selected handler in Bilgory et al. for a purpose of reducing false positives and processing costs.

Concerning claims 2 and 12, Bilgory et al. discloses at least “routing the user request to a handling agent trained to resolve received user inputs including requests having an intent prediction that matches the intent prediction of the user input” – an intent expressed by the user is extracted from the verbal input, and the verbal input is routed to a selected one of the handlers based on the matching confidence value computed by a plurality of verbal content identifiers (Abstract); routing the verbal inputs to their designated verbal content handler is done by estimating and/or predicting the intent of the user as expressed in the received verbal input (¶[0053]).
Concerning claims 6 and 16, Bhattacharya et al. teaches “wherein the intent prediction generated by the set of classification layers classifies the user input into a class selected from a plurality of classes, wherein each class corresponds to a particular type of information required to resolve the user request expressed in the user input” – natural language models are trained to identify and/or classify textual content into task types, e.g., a schedule meeting task type, an insert object task type, a summarize content task type, and an identify sentiment task type (¶[0020]); in scoring and ranking each sentence, the scoring layer may apply a classifier function to each contextual embedding for each of the plurality of sentences (¶[0025]); response modules 130 may include a plurality of processing layers which process the sentences that are identified as being relevant to a schedule meeting task (¶[0037]: Figure 1); response matching layer 232 may identify one or more responses, actions, and/or operations to perform in relation to a task type (¶[0045]: Figure 2). 
Concerning claims 7 and 17, Bilgory et al. discloses that a handler may be selected based on operational attributes including routing information relating to one or more previous verbal inputs (¶[0031] - ¶[0035]); each of the verbal content identifiers may be trained to focus on identifying the predefined features of its respective verbal content handler 212; one or more of the verbal content identifiers may employ one or more classifiers trained using training sample datasets (“training . . . using a training dataset”) (¶[0083]: Figure 2); implicitly, training datasets include “a plurality of previously received user inputs and correct intent predictions for the plurality of previously received user inputs”; that is, training data is a previously labeled set of user inputs and correct intents.  Bhattacharya et al. teaches a machine learning model with “the set of classification layers”.
Concerning claims 8 and 18, Bhattacharya et al. teaches “wherein the set of classification layers determine the intent prediction for the user input by comparing a plurality of sets of features to the word features and context features determined for the user input, wherein the plurality of sets of features includes a set of features for each class of intent prediction included in the training dataset” – natural language models are trained to identify and/or classify textual content into task types, e.g., a schedule meeting task type, an insert object task type, a summarize content task type, and an identify sentiment task type (¶[0020]); machine learning model 128 includes a plurality of layers; an embedding layer generates an embedding for each word, and a contextual aggregation layer aggregates each distinct embedding for each of the sentences (¶[0035]: Figure 1); sentence scoring layer 220 may apply a classifier function to each contextual embedding for each of the plurality of sentences (¶[0043]: Figure 2); machine learning model 618 includes an embedding layer for generating an embedding for each word in each of the sentences, a distinct sentence aggregation layer for aggregating the embedding for each word into a distinct embedding for each of the sentences, and a contextual aggregation layer for aggregating each distinct embedding for each of the sentences into a contextual embedding; a scoring layer scores and ranks each of the sentences (¶[0056]: Figure 6); here, embeddings provide features, i.e., a word embedding generates word features, a sentence embedding generates sentence features, and contextual embedding generates context features.  
Concerning claims 9 and 19, Bhattacharya et al. teaches “generating, by one or more feature vector generation layers, a user input feature vector that combines the sentence features with the context features to provide a learned representation of the user input that is received by the plurality of classification layers” – a contextual aggregation layer aggregates each distinct embedding for each of the sentences (¶[0035]: Figure 1); machine learning model 618 includes a contextual aggregation layer for aggregating each distinct embedding for each of the sentences into a contextual embedding; a scoring layer scores and ranks each of the sentences (¶[0056]: Figure 6); here, embedding layers generate feature vectors, i.e., a contextual aggregation layer is an embedding layer that generates feature vectors for sentence features and context features.  

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Bilgory et al. (U.S. Patent Publication 2019/0130905) in view of Bhattacharya et al. (U.S. Patent Publication 2021/0174015) as applied to claims 1 and 11 above, and further in view of Huang et al. (U.S. Patent Publication 2021/0042372).
Bilgory et al. discloses intent prediction that is integrated into an information service, but omits the limitations of “applying a hotfix to the intent prediction, wherein the hotfix modifies the intent prediction determined by the set of classification layers to generate a modified intent prediction.”  However, Huang et al. teaches a question-and-answer service that provides an intent classification score by a machine classification system.  (Abstract)  A hotfix creates a list of queries that always result in a particular experience, where a hotfix serves as a work-around for popular queries that might, for whatever reason, produce an undesirable result.  (¶[0031])  The classifier includes a machine classification system, e.g., a neural network or support vector machine (SVM).  Once trained, a machine classification system can assign an intent classification score to an unlabeled query.  (¶[0060])  Hotfix component 248 acts as an override to system components.  Specific queries may be added to a hotfix list and linked to a desired output.  When a received query matches a query on the hotfix list, then the linked output is executed without further analysis or regardless of a determination made by rule engine 244.  (¶[0073]: Figure 2)  Huang et al., then, teaches a hotfix to an intent classification.  An objective is to provide a work-around for popular queries that might produce an undesirable result.  (¶[0031])  It would have been obvious to one having ordinary skill in the art to apply a hotfix as taught by Huang et al. to perform intent prediction in Bilgory et al. for a purpose of providing a work-around for popular queries that might produce and undesirable result.

Claims 4 to 5 and 14 to 15 are rejected under 35 U.S.C. 103 as being unpatentable over Bilgory et al. (U.S. Patent Publication 2019/0130905) in view of Bhattacharya et al. (U.S. Patent Publication 2021/0174015) as applied to claims 1 and 11 above, and further in view of Modi et al. (U.S. Patent Publication 2021/0357835).
Bhattacharya et al. does not expressly teach that word features include a learned representation that reflects a meaning of particular words as a particular word is used within user input, and omits training a first set of feature extraction layers with a large corpus of documents, and tuning a first set of feature extraction layers using a specialized corpus of documents related to a particular subject matter, wherein the specialized corpus of documents is smaller than a larger corpus of documents and the specialized corpus of documents has a greater number of documents related to the particular subject matter relative to the large corpus of documents.  However, it is known that word embeddings represent a meaning of words in machine learning.
Specifically, Modi et al. teaches predictions in machine learning, where machine learning models are often trained on a large amount of data.  One approach to address the need for large training datasets is transfer learning.  Using transfer learning, models are trained using a generic large corpus of data (“training . . . using a large corpus of documents”), and these trained models are further refined on specific domain and task data (“tuning . . . using a specialized corpus of documents that is smaller”).  Word based models generate semantic embeddings for individual words.  Word2vec and Glove rely on a distributional hypothesis, where words that occur in the same context tend to have similar meaning (“each of the word features includes a learned representation of text that reflects a meaning of a particular word as the particular word is used within the user input”).  (¶[0078] - ¶[0079])  Word embedding composition models trained on a large public corpus of data can be used in combination with a contextual embedding model.  The former model can provide a strong baseline which yields good results, and the latter can be fine-tuned to the enterprise domain (“tuning . . using a specialized corpus of documents related to a particular subject matter”) once it is in the system/available.  (¶[0081])  Modi et al., then, teaches training using a large corpus, and then tuning using a specialized corpus of documents in the particular domain.  An objective is to create a strong baseline model which can help reduce the effort required for generating models.  (¶[0078])  It would have been obvious to one having ordinary skill in the art to train feature extraction layers of Bhattacharya et al. using a large corpus of documents to train a first set of feature extraction layers and a specialized corpus of document to tune a first set of feature extraction layers as taught by Modi et al. to reduce effort required for generating models.

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bilgory et al. (U.S. Patent Publication 2019/0130905) in view of Bhattacharya et al. (U.S. Patent Publication 2021/0174015) as applied to claims 1 and 11 above, and further in view of Balduino et al. (U.S. Patent Publication 2019/0362021).
Bilgory et al. discloses integrating intent prediction into an information service for handling requests, but omits reducing a transfer rate.  Still, it would appear implicit that if intent prediction for handling a request is performed efficiently, then it would reduce a transfer rate, i.e., a rate of transferring calls between handling agents, in Bilgory et al.  Specifically, Balduino et al. teaches discovery of salient topics during a customer interaction, where improved understanding of these topics leads to improved customer experience and reduced interaction handling time and number of transfers, thereby reducing costs.  (¶[0002])  By understanding the intent and changes in the intent in interactions, systems can reduce handling time, reduce transfers, and improve customer experience.  (¶[0027])  It would have been obvious to one having ordinary skill in the art to reduce a transfer rate as taught by Balduino et al. with intent prediction of Bilgory et al. for a purpose of improving a customer experience and reducing costs.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Pham et al., Tagra et al., Roy et al., Lin, Gkikas et al., Suwandy et al., Zhu et al., and Krishnamoorthy disclose related prior art.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        June 20, 2022