DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed May 26, 2022 has been entered.  Claims 1 – 20 remain pending in the application.  Applicant’s amendments to the Drawings, Specification, and Claims have overcome each and every objection, 35 U.S.C. 112(b) rejection, and 35 U.S.C. 101 rejection previously set forth in the Non-Final Office Action mailed March 11, 2022.
Response to Arguments
Applicant’s arguments filed May 26, 2022 with respect to claims 1 – 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is: “a communication interface” in claim 1.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 6 and 15 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.
Claim 6 depends from claim 1, and fails to further limit the subject matter of claim 1 because the limitation “wherein the embedding for each word concatenates a word-level embedding and at least one partial word embedding” is covered by the claim 1 limitation “generating, via a pre-processing layer, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in a specific word”.
Claim 15 depends from claim 10, and fails to further limit the subject matter of claim 10 because the limitation “wherein the embedding for each word concatenates a word-level embedding and at least one partial word embedding” is covered by the claim 10 limitation “generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in a specific word”.
Applicant may cancel the claims, amend the claims to place the claims in proper dependent form, rewrite the claims in independent form, or present a sufficient showing that the dependent claims complies with the statutory requirements.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 2 and 6 – 7 are rejected under 35 U.S.C. 103 as being unpatentable over Varsha et al. ("Translating Natural Language Sentences into Database Query"), hereinafter Varsha, in view of Lan et al. (“Character-Based Neural Networks for Sentence Pair Modeling”), hereinafter Lan, Chang et al. (US Patent No. 10,459,928), hereinafter Chang, Kant (US Patent No. 9,760,792), Nagabhushan (US Patent Application Publication No. 2019/0236135), and Zhang et al. (US Patent Application Publication No. 2018/0107928), hereinafter Zhang '928.
Regarding claim 1, Varsha discloses a system for intent determination for an unstructured text input sequence, the system comprising:
a communication interface that receives the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.");
a memory storing a neural network and a plurality of processor-executed instructions; and one or more processors that read from the memory and execute the instructions (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.") to perform operations comprising:
Varsha does not specifically disclose: generating, via a pre-processing layer, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in a specific word; generating, via an encoder stack comprising a plurality of encoding layers, encodings for the embeddings; generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Lan teaches generating, via a pre-processing layer, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in a specific word (Section 3.4, lines 1-4, "In addition, we experimented with combining the pretrained word embeddings and subword models with various strategies: concatenation, weighted average, adaptive models").  Lan teaches concatenating word and sub-word embeddings in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha and Lan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Lan to concatenate word and sub-word embeddings.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Varsha in view of Lan does not specifically disclose: generating, via an encoder stack comprising a plurality of encoding layers, encodings for the embeddings; generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Chang teaches generating, via an encoder stack comprising a plurality of encoding layers, encodings for the embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Lan, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Lan and Chang does not specifically disclose: generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Kant teaches generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database.  (Column 11, line 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Lan, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Varsha in view of Lan, Chang, and Kant does not specifically disclose: providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Nagabhushan teaches providing, via a fully connected layer weights for determining the probable classification (Paragraph 0075, lines 1-5, "As shown by reference number 550, the convolutional neural network includes an inference layer to classify the text (e.g., associating classifications, categories, labels, or the like) using the features and weights provided by the fully connected layers.").  Nagabhushan teaches the use of a fully connected layer in a neural network used for classification to improve the computing efficiency of the classification (Paragraph 0013, lines 7-12, "The text classification performed by the text classification platform may improve the efficiency of text classification, e.g., by performing text classification using fewer computing resources, such as processing resources, memory resources, or the like, than other text classification techniques.").
Varsha, Lan, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, and Kant to incorporate the teachings of Nagabhushan to include, in the neural network, a fully connected layer configured to provide weights for the classification.  Doing so would improve the computing efficiency of the classification.
Varsha in view of Lan, Chang, Kant, and Nagabhushan does not disclose: wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Zhang '928 teaches:
wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification (Paragraph 0066, lines 1-2, "In some such embodiments, the deep learning model is configured as a deep residual network."; Paragraph 0066, lines 14-18, "A deep residual net may be created by taking a plain neural network structure that includes convolutional layers and inserting shortcut connections which thereby takes the plain neural network and turns it into its residual learning counterpart."; Paragraph 0066, lines 13-14, "Shortcut connections are connections that skip one or more layers.");
and the bypass path is configured to bypass the fully connected layer for some training data sets (Paragraph 0066, lines 2-6, "For example, like some other networks described herein, a deep residual network may include convolutional layers followed by fully connected layers, which are, in combination, configured and trained for image classification."; Paragraph 0066, lines 14-18, "A deep residual net may be created by taking a plain neural network structure that includes convolutional layers and inserting shortcut connections which thereby takes the plain neural network and turns it into its residual learning counterpart.").  
Zhang '928 teaches the use of a bypass layer in a neural network used for classification, where the bypass layer is used to bypass a fully connected layer during training to allow deeper layers of the neural network to learn functions related to the network inputs (Paragraph 0066, "In a deep residual network, the layers are configured to learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. In particular, instead of hoping each few stacked layers directly fit a desired underlying mapping, these layers are explicitly allowed to fit a residual mapping, which is realized by feedforward neural networks with shortcut connections.").
Varsha, Lan, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to incorporate the teachings of Zhang '928 to include, in the neural network, a bypass layer, where the bypass layer is used to bypass a fully connected layer during training.  Doing so would allow that deeper layers of the neural network to learn functions related to the network inputs.
Regarding claim 2, the combination of Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.  Varsha further discloses:
wherein the unstructured text input sequence takes a form of natural language (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.").
Regarding claim 6, the combination of Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.  Lan further teaches:
wherein the embedding for each word concatenates a word-level embedding and at least one partial word embedding (Section 3.4, lines 1-4, "In addition, we experimented with combining the pretrained word embeddings and subword models with various strategies: concatenation, weighted average, adaptive models").
Lan teaches using word embeddings that concatenate word and sub-word embeddings in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha, Lan, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to further incorporate the teachings of Lan to use word embeddings that concatenate word and sub-word embeddings.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Regarding claim 7, Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.   Lan further teaches:
wherein the instructions further comprise generating a plurality of partial word embeddings for each word in the unstructured text input sequence, each of the plurality of partial word embeddings corresponding to two or more characters in the word, a number of characters corresponding to each one of the partial word embeddings being less than a total number of characters in the word (Section 2.2, lines 1-7, "Our subword models only involve modification of the input representation layer in the pairwise interaciton model. Let c1, ..., ck be the subword (character unigram, bigram and trigram) sequence of a word w. The subword embedding matrix is C ϵ Rd'*k, where each subword is encoded into the d'-dimension vector.").
Lan teaches using sub-word embeddings corresponding to two or more characters in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha, Lan, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to further incorporate the teachings of Lan to use sub-word embeddings corresponding to two or more characters.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Bhatt et al. (US Patent No. 10,430,407), hereinafter Bhatt.
Regarding claim 3, Varsha in view of Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database.
Bhatt teaches: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database (Column 4, lines 51-57, "In another arrangement, structured query generator 110 may include one or more additional annotators that are configured to determine database management system operations from natural language text. As defined herein, the term “database management system operation” or “database operation” means a create, read, update, or delete (CRUD) operation for the database management system.").  Bhatt teaches interpreting natural language text input to access a database so that the database can be accessed with unstructured input (Column 2, lines 30-39, "This disclosure relates to generating queries and, more particularly, to generating structured queries from natural language text. In accordance with the inventive arrangements disclosed herein, natural language text may be received and operated upon to generate a structured query for a database management system. In one arrangement, the natural language text may be directed to a particular database management system to request information. The natural language text may be expressed as free form or unstructured text.").
Bhatt is considered to be analogous to the claimed invention because it is in the same field of interpreting a natural language input for database access.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Bhatt to provide for updating, modifying, adding, or deleting an item from the database.  Doing so would allow the database to be accessed and updated with unstructured natural language input.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Zhang et al. (US Patent No. 10,657,962), hereinafter Zhang '962.
Regarding claim 4, Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein each encoding layer comprises a plurality of gated recurrent units, each gated recurrent unit configured to generate a vector related to at least one word in the unstructured text input sequence.
Zhang '962 teaches: wherein each encoding layer comprises a plurality of gated recurrent units, each gated recurrent unit configured to generate a vector related to at least one word in the unstructured text input sequence (Column 7, line 41-42, "To encode an utterance u=(w.sub.1, w.sub.2, . . . , w.sub.N) of N words, we use a RNN with Gated Recurrent Units").  Zhang '962 teaches the use of gated recurrent units in a neural network encoding layer that generate encoding vectors for words in the input to improve the performance of the neural network processing (Column 4, line 36-39, "SI-RNN redesigns the dialog encoder by updating speaker embeddings in a role-sensitive way. Speaker embeddings are updated in different GRU-based units depending on their roles (sender, addressee, observer)."; Column 10, lines 24-27, "As shown in Table 2 (FIG. 5), our discovery and development of SI-RNN significantly improves upon the previous state-of-the-art. In particular, addressee selection (ADR) benefits most,"; Column 10, lines 30-32, "Response selection (RES) is also improved, suggesting role-sensitive GRUs and joint selection are helpful for response selection as well.").
Zhang '962 is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Zhang '962 to include, in the neural network, gated recurrent units to generate encoding vectors for words in the input.  Doing so would improve the performance of the neural network processing.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Jagannatha ("Bidirectional Recurrent Neural Networks for Medical Event Detection in Electronic Health Records").
Regarding claim 5, Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose wherein each encoding layer comprises: a first row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a first direction to generate respective first vectors; a second row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a second direction to generate respective second vectors; and a concatenating layer configured to concatenate the first and second vectors.
Jagannatha teaches:
wherein each encoding layer comprises: a first row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a first direction to generate respective first vectors; a second row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a second direction to generate respective second vectors (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and a concatenating layer configured to concatenate the first and second vectors (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches the use of encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Jagannatha to include, in the neural network, encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928, and further in view of Brundage (US Patent No. 10,606,885), hereinafter Brundage.
Regarding claim 8, Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations.
Brundage teaches: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations (Column 5, lines 62-66, "In some implementations, databases herein can store information from one or more tenants into tables of a common database image to form an on-demand database service (ODDS), which can be implemented in many ways, such as a multi-tenant database system (MTDS).").  Brundage teaches the use of a multi-tenant database in a neural network system used to access a database to improve the user access time and streaming media quality when there are multiple users (Column 4, lines 56-60, "The request-routing mechanism allocates servers in the content delivery infrastructure to the requesting client devices of users 118a-n in a way that, for web content delivery, minimizes a given client's response time and, for streaming media delivery, provides for the highest quality.").
Brundage is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to access a database.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Brundage to include the use of a multi-tenant database.  Doing so would improve the user access time and streaming media quality when there are multiple users.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, Zhang '928, and Brundage, and further in view of Millius (US Patent No. 10,642,830), hereinafter Millius.
Regarding claim 9, Varsha in view of Lan, Chang, Kant, Nagabhushan, Zhang '928, and Brundage discloses the system as claimed in claim 8, but does not specifically disclose: wherein training of the neural network is individually configured by at least one of the separate organizations.
Millius teaches: wherein training of the neural network is individually configured by at least one of the separate organizations (Column 13, lines 28-35, "Outputs can be user-customized by training a machine-learned context determination model and/or a machine-learned text extraction model using training data including labeled device data obtained from a mobile computing device associated with a particular user, thus providing tailored results that are targeted towards specific text message content and/or user contexts associated with a particular user.").  Millius teaches the use of a neural network capable of being individually trained for different users to improve the accuracy of determining user context (Column 13, lines 35-45, "More complex and customized nuances in text extraction determinations and/or user context determinations can thus be afforded using the disclosed machine learning techniques. When machine learned models include deep neural networks as described, such models can better model complex text extraction functions and/or user context determination functions as compared to polynomials. As such, the text extraction models and/or context determination models of the present disclosure can provide superior prediction accuracy if trained properly.").
Millius is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, Zhang '928, and Brundage to incorporate the teachings of Millius to include the use of a neural network capable of being individually trained for different users.  Doing so would improve the accuracy of determining user context.
Claims 10 – 11 and 15 – 16 are rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, and Nagabhushan.
Regarding claim 10, Varsha discloses a method for determining an intent associated with an unstructured text input sequence, the method performed by a processor executing instructions relating to a neural network (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.") and comprising:
receiving, by a pre-processing layer of the neural network, the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.").
Varsha does not specifically disclose: generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in a specific word; generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings; based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Lan teaches generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in a specific word (Section 3.4, lines 1-4, "In addition, we experimented with combining the pretrained word embeddings and subword models with various strategies: concatenation, weighted average, adaptive models").  Lan teaches concatenating word and sub-word embeddings in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha and Lan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Lan to concatenate word and sub-word embeddings.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Varsha in view of Lan does not specifically disclose: generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings; based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Chang teaches generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Lan, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Lan and Chang does not specifically disclose: based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Kant teaches generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database.  (Column 11, line 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Lan, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Varsha in view of Lan, Chang, and Kant does not specifically disclose: providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Nagabhushan teaches providing, by a fully connected layer of the neural network, weights for determining the probable classification (Paragraph 0075, lines 1-5, "As shown by reference number 550, the convolutional neural network includes an inference layer to classify the text (e.g., associating classifications, categories, labels, or the like) using the features and weights provided by the fully connected layers.").  Nagabhushan teaches the use of a fully connected layer in a neural network used for classification to improve the computing efficiency of the classification (Paragraph 0013, lines 7-12, "The text classification performed by the text classification platform may improve the efficiency of text classification, e.g., by performing text classification using fewer computing resources, such as processing resources, memory resources, or the like, than other text classification techniques.").
Varsha, Lan, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, and Kant to incorporate the teachings of Nagabhushan to include, in the neural network, a fully connected layer configured to provide weights for the classification.  Doing so would improve the computing efficiency of the classification.
Regarding claim 11, the combination of Varsha in view of Lan, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10.  Varsha further discloses: wherein the unstructured text input sequence takes a form of natural language (Varsha, Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.").
Regarding claim 15, the combination of Varsha in view of Lan, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10.  Lan further teaches:
wherein the embedding for each word concatenates a word-level embedding and at least one partial word embedding (Section 3.4, lines 1-4, "In addition, we experimented with combining the pretrained word embeddings and subword models with various strategies: concatenation, weighted average, adaptive models").
Lan teaches using word embeddings that concatenate word and sub-word embeddings in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha, Lan, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to further incorporate the teachings of Lan to use word embeddings that concatenate word and sub-word embeddings.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Regarding claim 16, Varsha in view of Lan, Chang, Kant, and Nagabhushan, discloses the method as claimed in claim 10.  Lan further teaches: 
further comprising generating, a plurality of partial embeddings for each word in the unstructured text input sequence, each of the plurality of partial word embeddings corresponding to two or more characters in the word, a number of characters corresponding to each one of the partial word embeddings being less than a total number of characters in the word (Section 2.2, lines 1-7, "Our subword models only involve modification of the input representation layer in the pairwise interaciton model. Let c1, ..., ck be the subword (character unigram, bigram and trigram) sequence of a word w. The subword embedding matrix is C ϵ Rd'*k, where each subword is encoded into the d'-dimension vector.").
Lan teaches using sub-word embeddings corresponding to two or more characters in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha, Lan, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to further incorporate the teachings of Lan to use sub-word embeddings corresponding to two or more characters.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, and Nagabhushan, and further in view of Bhatt.
Regarding claim 12, Varsha in view of Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database.
Bhatt teaches: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database (Column 4, lines 51-57, "In another arrangement, structured query generator 110 may include one or more additional annotators that are configured to determine database management system operations from natural language text. As defined herein, the term “database management system operation” or “database operation” means a create, read, update, or delete (CRUD) operation for the database management system.").  Bhatt teaches interpreting natural language text input to access a database so that the database can be accessed with unstructured input (Column 2, lines 30-39, "This disclosure relates to generating queries and, more particularly, to generating structured queries from natural language text. In accordance with the inventive arrangements disclosed herein, natural language text may be received and operated upon to generate a structured query for a database management system. In one arrangement, the natural language text may be directed to a particular database management system to request information. The natural language text may be expressed as free form or unstructured text.").
Bhatt is considered to be analogous to the claimed invention because it is in the same field of interpreting a natural language input for database access.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to incorporate the teachings of Bhatt to provide for updating, modifying, adding, or deleting an item from the database.  Doing so would allow the database to be accessed and updated with unstructured natural language input.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, and Nagabhushan, and further in view of Zhang '962.
Regarding claim 13, Varsha in view of Lan, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein generating encodings for the plurality of embeddings comprises generating a vector related to at least one word in the unstructured text input sequence.
Zhang '962 teaches: wherein generating encodings for the plurality of embeddings comprises generating a vector related to at least one word in the unstructured text input sequence (Column 7, line 41-42, "To encode an utterance u=(w.sub.1, w.sub.2, . . . , w.sub.N) of N words, we use a RNN with Gated Recurrent Units").  Zhang '962 teaches the use of a neural network that generates encoding vectors for words in the input to improve the performance of the neural network processing (Column 4, line 36-39, "SI-RNN redesigns the dialog encoder by updating speaker embeddings in a role-sensitive way. Speaker embeddings are updated in different GRU-based units depending on their roles (sender, addressee, observer)."; Column 10, lines 24-27, "As shown in Table 2 (FIG. 5), our discovery and development of SI-RNN significantly improves upon the previous state-of-the-art. In particular, addressee selection (ADR) benefits most,"; Column 10, lines 30-32, "Response selection (RES) is also improved, suggesting role-sensitive GRUs and joint selection are helpful for response selection as well.").
Zhang '962 is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to incorporate the teachings of Zhang '962 to include the use of a neural network that generates encoding vectors for words in the input.  Doing so would improve the performance of the neural network processing.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, and Nagabhushan, and further in view of Jagannatha.
Regarding claim 14, Varsha in view of Lan, Chang, Kant, and Nagabhushan, discloses the method as claimed in claim 10, but does not specifically disclose wherein generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the embeddings comprises: serially processing the words in the unstructured text input sequence in a first direction to generate respective first vectors; serially processing the words in the unstructured text input sequence in a second direction to generate respective second vectors; and concatenating the first and second vectors.
Jagannatha teaches:
wherein generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the embeddings comprises: serially processing the words in the unstructured text input sequence in a first direction to generate respective first vectors; serially processing the words in the unstructured text input sequence in a second direction to generate respective second vectors (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and concatenating the first and second vectors (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches generating encodings using two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to incorporate the teachings of Jagannatha to include generating encodings using two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, and Nagabhushan, and further in view of Brundage.
Regarding claim 17, Varsha in view of Lan, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations.
Brundage teaches: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations (Column 5, lines 62-66, "In some implementations, databases herein can store information from one or more tenants into tables of a common database image to form an on-demand database service (ODDS), which can be implemented in many ways, such as a multi-tenant database system (MTDS).").  Brundage teaches the use of a multi-tenant database in a neural network system used to access a database to improve the user access time and streaming media quality when there are multiple users (Column 4, lines 56-60, "The request-routing mechanism allocates servers in the content delivery infrastructure to the requesting client devices of users 118a-n in a way that, for web content delivery, minimizes a given client's response time and, for streaming media delivery, provides for the highest quality.").
Brundage is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to access a database.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, and Nagabhushan to incorporate the teachings of Brundage to include the use of a multi-tenant database.  Doing so would improve the user access time and streaming media quality when there are multiple users.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, Kant, Nagabhushan, and Brundage, and further in view of Millius.
Regarding claim 18, Varsha in view of Lan, Chang, Kant, Nagabhushan, and Brundage discloses the method as claimed in claim 17, but does not specifically disclose: wherein training of the neural network is individually configured by at least one of the separate organizations.
Millius teaches: wherein training of the neural network is individually configured by at least one of the separate organizations (Column 13, lines 28-35, "Outputs can be user-customized by training a machine-learned context determination model and/or a machine-learned text extraction model using training data including labeled device data obtained from a mobile computing device associated with a particular user, thus providing tailored results that are targeted towards specific text message content and/or user contexts associated with a particular user.").  Millius teaches the use of a neural network capable of being individually trained for different users to improve the accuracy of determining user context (Column 13, lines 35-45, "More complex and customized nuances in text extraction determinations and/or user context determinations can thus be afforded using the disclosed machine learning techniques. When machine learned models include deep neural networks as described, such models can better model complex text extraction functions and/or user context determination functions as compared to polynomials. As such, the text extraction models and/or context determination models of the present disclosure can provide superior prediction accuracy if trained properly.").
Millius is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, Kant, Nagabhushan, and Brundage to incorporate the teachings of Millius to include the use of a neural network capable of being individually trained for different users.  Doing so would improve the accuracy of determining user context.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, and Kant.
Regarding claim 19, Varsha discloses a method for determining an intent associated with an unstructured text input sequence (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages."), the method comprising:
receiving, by a pre-processing layer, the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.").
Varsha does not specifically disclose: generating, by the pre-processing layer, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in a specific word; generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Lan teaches generating, by the pre-processing layer, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in a specific word (Section 3.4, lines 1-4, "In addition, we experimented with combining the pretrained word embeddings and subword models with various strategies: concatenation, weighted average, adaptive models").  Lan teaches concatenating word and sub-word embeddings in order to perform sentence pair modeling without using pretrained word embeddings (Section 5, lines 1-4, "We presented a focused study on the effectiveness of subword models in sentence pair modeling and showed competitive results without using pretrained word embeddings.").
Varsha and Lan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Lan to concatenate word and sub-word embeddings.  Doing so would allow for performing sentence pair modeling without using pretrained word embeddings.
Varsha in view of Lan does not specifically disclose: generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Chang teaches generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Lan, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Lan and Chang does not specifically disclose: generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Kant teaches generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database.  (Column 11, line 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Lan, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Lan, Chang, and Kant, and further in view of Jagannatha.
Regarding claim 20, Varsha in view of Lan, Chang, and Kant discloses the method as claimed in claim 19, but does not specifically disclose: wherein the generating, by the encoder stack comprising the plurality of encoding layers, encodings for the plurality of embeddings comprises: generating, by a first row of gate recurrent units, a first hidden state vector by processing the plurality of embeddings in a first order; generating, by a second row of gate recurrent units, a second hidden state vector by processing the plurality of embeddings in a second order; and obtaining the encodings by concatenating the first hidden state vector and the second hidden state vector.
Jagannatha teaches:
wherein the generating, by the encoder stack comprising the plurality of encoding layers, encodings for the plurality of embeddings comprises: generating, by a first row of gate recurrent units, a first hidden state vector by processing the plurality of embeddings in a first order (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
generating, by a second row of gate recurrent units, a second hidden state vector by processing the plurality of embeddings in a second order (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and obtaining the encodings by concatenating the first hidden state vector and the second hidden state vector (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches the use of encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Lan, Chang, and Kant to incorporate the teachings of Jagannatha to include, in the neural network, encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657