DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed November 8, 2022 with respect to the 35 U.S.C. 112(f) interpretation of claim 1 have been fully considered but they are not persuasive.
On page 7 of Applicant’s response, Applicant argues “The Office Action asserts
claim 1 triggers 35 U.S.C. §112(f), six paragraph. Specifically, the Office Action asserts the claimed "a communication interface" is to be interpreted as a generic placeholder under 35 U.S.C. §112(f), six paragraph. Applicant respectfully traverses, and submits at least paragraphs [0021]-[0058] and FIGs. 1-3 provide detailed description of the claimed "a communication interface." Accordingly, Applicant respectfully requests reconsideration and withdrawal of claim interpretation of the noted claim elements under 35 U.S.C. §112(f), six paragraph.”  However, the claim 1 limitation “a communication interface that receives the unstructured text input sequence” is being interpreted under 35 U.S.C. 112(f) because “interface” is a generic placeholder that is coupled with the functional language “receives the unstructured text input sequence” without reciting sufficient structure to perform the function, and the modifier “communication” does not provide structure for the term “interface”.  Since this claim 1 limitation is being interpreted under 35 U.S.C. 112(f), it is being interpreted to cover the corresponding structure described in the specification as performing the claimed function, but the supporting detailed description of the structure of the claimed interface in the specification does not change the 35 U.S.C. 112(f) interpretation, it just provides guidance regarding the structure of the interface.
Applicant’s remaining arguments filed November 8, 2022 with respect to claims 1 – 5, 7 – 14 and 16 - 22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Objections
Claim 1 is objected to because of the following informalities:
In claim 1, line 17, “into a plurality of embedding” should read “into a plurality of embeddings”.
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is: “a communication interface” in claim 1.
Because this claim limitation is being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it is being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10 – 14, 16 – 18 and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 10 recites the limitation “generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in each word, each of the concatenated partial word embeddings is a n-gram embedding” in lines 8-12.  This limitation is indefinite because it is unclear if “concatenated partial word embeddings” means that the partial word embeddings are concatenated before they are concatenated with the word-level embeddings, or if the “concatenated partial word embeddings” are the partial word embeddings that are concatenated with the word embeddings.  Amending the claim limitation to “generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in each word, each of the partial word embeddings is a n-gram embedding” would resolve the indefiniteness.  For examination purposes, “concatenated partial word embeddings” will be interpreted as partial word embeddings that are concatenated with word embeddings.
Claims 11 – 14, 16 – 18 and 22 depend from claim 10, and thus recite the limitations of claim 10, and do not resolve the indefinite language from claim 10.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 2, 7 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Varsha et al. ("Translating Natural Language Sentences into Database Query"), hereinafter Varsha, in view of Z. Zhang et al. (“Subword-augmented Embedding for Cloze Reading Comprehension”), hereinafter Z. Zhang, Chang et al. (US Patent No. 10,459,928), hereinafter Chang, Kant (US Patent No. 9,760,792), Nagabhushan (US Patent Application Publication No. 2019/0236135), and Zhang et al. (US Patent Application Publication No. 2018/0107928), hereinafter Zhang '928.
Regarding claim 1, Varsha discloses a system for intent determination for an unstructured text input sequence, the system comprising:
a communication interface that receives the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.");
a memory storing a neural network and a plurality of processor-executed instructions; and one or more processors that read from the memory and execute the instructions to perform operations (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.").
Varsha does not specifically disclose: generating, via a pre-processing layer, a plurality of word-level embeddings for each word in the unstructured text input sequence; generating, via the pre-processing layer, a plurality of partial word embeddings for one or more characters in each word from the plurality of words, wherein each of the partial word embeddings is a n-gram embedding; concatenating the plurality of word-level embeddings and the plurality of partial word embeddings into a plurality of embeddings; generating, via an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Z. Zhang teaches:
generating, via a pre-processing layer, a plurality of word-level embeddings for each word in the unstructured text input sequence (Section 1, lines 35-41, “In this paper, we present various simple yet accurate subword-augmented embedding (SAW) strategies and propose SAW Reader as an instance. Specifically, we adopt subword information to enrich word embedding and survey different SAW operations to integrate word-level and subword-level embedding for a fine-grained representation. To ensure adequate training of OOV and low-frequency words, we employ a short list mechanism. Our evaluation will be performed on three public Chinese reading comprehension datasets and one English benchmark dataset for showing our method is also effective in multi-lingual case.”; Section 2.2, lines 15-19, “To alleviate the OOV issues, we keep a short list H for specific words. H = {w1, w2, . . . , wn} If w is in H, the immediate word embedding WE(w) is indexed from word lookup table Mw ϵ Rd x s where s denotes the size (recorded words) of lookup table. Otherwise, it will be represented as the randomly initialized default word (denoted by a specific mark UNK).”; Evaluating the subword-augmented embedding method on an English benchmark dataset demonstrates that the word embeddings are generated for each word in an unstructured text input sequence.  The method for determining the word embedding WE(w) reads on a pre-processing layer.);
generating, via the pre-processing layer, a plurality of partial word embeddings for one or more characters in each word from the plurality of words (Section 1, lines 35-41, “In this paper, we present various simple yet accurate subword-augmented embedding (SAW) strategies and propose SAW Reader as an instance. Specifically, we adopt subword information to enrich word embedding and survey different SAW operations to integrate word-level and subword-level embedding for a fine-grained representation. To ensure adequate training of OOV and low-frequency words, we employ a short list mechanism. Our evaluation will be performed on three public Chinese reading comprehension datasets and one English benchmark dataset for showing our method is also effective in multi-lingual case.”; Section 2.2, lines 30-31, “The subword embedding SE(w) is generated by taking the final outputs of a bidirectional gated recurrent unit (GRU) (Cho et al., 2014) applied to the embeddings from a lookup table of subwords.”; Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”; Evaluating the subword-augmented embedding method on an English benchmark dataset demonstrates that the subword embeddings are generated for each word in an unstructured text input sequence.  The bidirectional gated recurrent unit for generating the subword embedding SE(w) reads on a pre-processing layer, and character n-gram subwords read on partial word embeddings for one or more characters.),
wherein each of the partial word embeddings is a n-gram embedding (Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”);
concatenating the plurality of word-level embeddings and the plurality of partial word embeddings into a plurality of embeddings (Section 2.2, lines 1-9, “After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w. AE(w) = WE(w) ♢ SE(w) where ♢ denotes the detailed integration operation. In this work, we investigate concatenation (concat), element-wise summation (sum) and element-wise multiplication (mul). Thus, each document D and query Q is represented as Rd x k matrix where d denotes the dimension of word embedding and k is the number of words in the input.”; The word embedding reads on the word-level embeddings, the subword embedding reads on the partial word embeddings, and the augmented embedding reads on the concatenated embeddings.).
Z. Zhang teaches generating word embeddings and sub-word embeddings, and concatenating the word embeddings and sub-word embeddings, in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha and Z. Zhang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Z. Zhang to generate word embeddings and sub-word embeddings, and concatenate the word embeddings and sub-word embeddings.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Varsha in view of Z. Zhang does not specifically disclose: generating, via an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Chang teaches generating, via an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Z. Zhang, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Z. Zhang and Chang does not specifically disclose: generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Kant teaches generating, via a softmax layer, based at least in part on the encodings, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database (Column 11, lines 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Z. Zhang, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Varsha in view of Z. Zhang, Chang, and Kant does not specifically disclose: providing, via a fully connected layer weights for determining the probable classification; wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Nagabhushan teaches providing, via a fully connected layer weights for determining the probable classification (Paragraph 0075, lines 1-5, "As shown by reference number 550, the convolutional neural network includes an inference layer to classify the text (e.g., associating classifications, categories, labels, or the like) using the features and weights provided by the fully connected layers.").  Nagabhushan teaches the use of a fully connected layer in a neural network used for classification to improve the computing efficiency of the classification (Paragraph 0013, lines 7-12, "The text classification performed by the text classification platform may improve the efficiency of text classification, e.g., by performing text classification using fewer computing resources, such as processing resources, memory resources, or the like, than other text classification techniques.").
Varsha, Z. Zhang, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, and Kant to incorporate the teachings of Nagabhushan to include, in the neural network, a fully connected layer configured to provide weights for the classification.  Doing so would improve the computing efficiency of the classification.
Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan does not disclose: wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification, and a bypass path is configured to bypass the fully connected layer for one or more of the training data sets.
Zhang '928 teaches:
wherein the neural network is operable to be trained on a plurality of training data sets and the fully connected layer is configured to determine features in a given training data set that correlates to a particular classification (Paragraph 0066, lines 1-2, "In some such embodiments, the deep learning model is configured as a deep residual network."; Paragraph 0066, lines 14-18, "A deep residual net may be created by taking a plain neural network structure that includes convolutional layers and inserting shortcut connections which thereby takes the plain neural network and turns it into its residual learning counterpart."; Paragraph 0066, lines 13-14, "Shortcut connections are connections that skip one or more layers.");
and the bypass path is configured to bypass the fully connected layer for some training data sets (Paragraph 0066, lines 2-6, "For example, like some other networks described herein, a deep residual network may include convolutional layers followed by fully connected layers, which are, in combination, configured and trained for image classification."; Paragraph 0066, lines 14-18, "A deep residual net may be created by taking a plain neural network structure that includes convolutional layers and inserting shortcut connections which thereby takes the plain neural network and turns it into its residual learning counterpart.").  
Zhang '928 teaches the use of a bypass layer in a neural network used for classification, where the bypass layer is used to bypass a fully connected layer during training to allow deeper layers of the neural network to learn functions related to the network inputs (Paragraph 0066, "In a deep residual network, the layers are configured to learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. In particular, instead of hoping each few stacked layers directly fit a desired underlying mapping, these layers are explicitly allowed to fit a residual mapping, which is realized by feedforward neural networks with shortcut connections.").
Varsha, Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to incorporate the teachings of Zhang '928 to include, in the neural network, a bypass layer, where the bypass layer is used to bypass a fully connected layer during training.  Doing so would allow that deeper layers of the neural network to learn functions related to the network inputs.
Regarding claim 2, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.  Varsha further discloses:
wherein the unstructured text input sequence takes a form of natural language (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.").
Regarding claim 7, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.
Z. Zhang further teaches:
wherein each of the plurality of partial word embeddings corresponding to two or more characters in the word, a number of characters corresponding to each one of the partial word embeddings being less than a total number of characters in the word (Abstract, lines 4-5, “In this paper, we propose to use subword rather than character for word embedding enhancement.”; Section 2.1, lines 1-2, “Word in most languages usually can be split into meaningful subword units despite of the writing form. For example, “indispensable” could be split into the following subwords: < in, disp, ens, able >.”; Section 2.2, lines 1-4, “Our subwords are also formed as character n-grams, do not cross word boundaries. After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w.”; The n-gram subword embeddings read on partial word embeddings corresponding to two or more characters in the word, and splitting each word into subwords reads on the partial word embeddings being less than a total number of characters in the word.).
Z. Zhang teaches using sub-word embeddings corresponding to two or more characters and less than a total number of characters in the word in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha, Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to further incorporate the teachings of Z. Zhang to use sub-word embeddings corresponding to two or more characters and less than a total number of characters in the word.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Regarding claim 21, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1.
Z. Zhang further teaches:
wherein n is a fixed integer value no greater than a word length for the plurality of partial word embeddings (Section 2.2, lines 1-4, “Our subwords are also formed as character n-grams, do not cross word boundaries. After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w.”; Splitting each word into subwords reads on the partial word embeddings being less than a total number of characters in the word.).
The specification cites, in paragraph 0072, “The partial word embeddings may be referred to as "n-gram" embeddings, where n is the maximum number of characters in the word that are considered. Thus, using the word "where" as an example, and n=3, it will be represented by the character n-grams: <wh, whe, her, ere, and re>.”.  With this definition of an “n-gram”, n being a fixed integer value no greater than a word length is interpreted to mean that the partial words are less than the word length, not that all of the partial words are the same number of characters.
Z. Zhang teaches using n-gram sub-word embeddings where n is a fixed integer value no greater than a word length in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.").
Varsha, Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to further incorporate the teachings of Z. Zhang to use n-gram sub-word embeddings where n is a fixed integer value no greater than a word length.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Bhatt et al. (US Patent No. 10,430,407), hereinafter Bhatt.
Regarding claim 3, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database.
Bhatt teaches: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database (Column 4, lines 51-57, "In another arrangement, structured query generator 110 may include one or more additional annotators that are configured to determine database management system operations from natural language text. As defined herein, the term “database management system operation” or “database operation” means a create, read, update, or delete (CRUD) operation for the database management system.").  Bhatt teaches interpreting natural language text input to access a database so that the database can be accessed with unstructured input (Column 2, lines 30-39, "This disclosure relates to generating queries and, more particularly, to generating structured queries from natural language text. In accordance with the inventive arrangements disclosed herein, natural language text may be received and operated upon to generate a structured query for a database management system. In one arrangement, the natural language text may be directed to a particular database management system to request information. The natural language text may be expressed as free form or unstructured text.").
Bhatt is considered to be analogous to the claimed invention because it is in the same field of interpreting a natural language input for database access.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Bhatt to provide for updating, modifying, adding, or deleting an item from the database.  Doing so would allow the database to be accessed and updated with unstructured natural language input.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Zhang et al. (US Patent No. 10,657,962), hereinafter Zhang '962.
Regarding claim 4, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein each encoding layer comprises a plurality of gated recurrent units, each gated recurrent unit configured to generate a vector related to at least one word in the unstructured text input sequence.
Zhang '962 teaches: wherein each encoding layer comprises a plurality of gated recurrent units, each gated recurrent unit configured to generate a vector related to at least one word in the unstructured text input sequence (Column 7, line 41-42, "To encode an utterance u=(w.sub.1, w.sub.2, . . . , w.sub.N) of N words, we use a RNN with Gated Recurrent Units").  Zhang '962 teaches the use of gated recurrent units in a neural network encoding layer that generate encoding vectors for words in the input to improve the performance of the neural network processing (Column 4, line 36-39, "SI-RNN redesigns the dialog encoder by updating speaker embeddings in a role-sensitive way. Speaker embeddings are updated in different GRU-based units depending on their roles (sender, addressee, observer)."; Column 10, lines 24-27, "As shown in Table 2 (FIG. 5), our discovery and development of SI-RNN significantly improves upon the previous state-of-the-art. In particular, addressee selection (ADR) benefits most,"; Column 10, lines 30-32, "Response selection (RES) is also improved, suggesting role-sensitive GRUs and joint selection are helpful for response selection as well.").
Zhang '962 is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Zhang '962 to include, in the neural network, gated recurrent units to generate encoding vectors for words in the input.  Doing so would improve the performance of the neural network processing.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 and further in view of Jagannatha ("Bidirectional Recurrent Neural Networks for Medical Event Detection in Electronic Health Records").
Regarding claim 5, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose wherein each encoding layer comprises: a first row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a first direction to generate respective first vectors; a second row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a second direction to generate respective second vectors; and a concatenating layer configured to concatenate the first and second vectors.
Jagannatha teaches:
wherein each encoding layer comprises: a first row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a first direction to generate respective first vectors; a second row of gated recurrent units configured to serially process the words in the unstructured text input sequence in a second direction to generate respective second vectors (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and a concatenating layer configured to concatenate the first and second vectors (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches the use of encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Jagannatha to include, in the neural network, encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928, and further in view of Brundage (US Patent No. 10,606,885), hereinafter Brundage.
Regarding claim 8, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 discloses the system as claimed in claim 1, but does not specifically disclose: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations.
Brundage teaches: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations (Column 5, lines 62-66, "In some implementations, databases herein can store information from one or more tenants into tables of a common database image to form an on-demand database service (ODDS), which can be implemented in many ways, such as a multi-tenant database system (MTDS).").  Brundage teaches the use of a multi-tenant database in a neural network system used to access a database to improve the user access time and streaming media quality when there are multiple users (Column 4, lines 56-60, "The request-routing mechanism allocates servers in the content delivery infrastructure to the requesting client devices of users 118a-n in a way that, for web content delivery, minimizes a given client's response time and, for streaming media delivery, provides for the highest quality.").
Brundage is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to access a database.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Zhang '928 to incorporate the teachings of Brundage to include the use of a multi-tenant database.  Doing so would improve the user access time and streaming media quality when there are multiple users.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, Zhang '928, and Brundage, and further in view of Millius (US Patent No. 10,642,830), hereinafter Millius.
Regarding claim 9, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, Zhang '928, and Brundage discloses the system as claimed in claim 8, but does not specifically disclose: wherein training of the neural network is individually configured by at least one of the separate organizations.
Millius teaches: wherein training of the neural network is individually configured by at least one of the separate organizations (Column 13, lines 28-35, "Outputs can be user-customized by training a machine-learned context determination model and/or a machine-learned text extraction model using training data including labeled device data obtained from a mobile computing device associated with a particular user, thus providing tailored results that are targeted towards specific text message content and/or user contexts associated with a particular user.").  Millius teaches the use of a neural network capable of being individually trained for different users to improve the accuracy of determining user context (Column 13, lines 35-45, "More complex and customized nuances in text extraction determinations and/or user context determinations can thus be afforded using the disclosed machine learning techniques. When machine learned models include deep neural networks as described, such models can better model complex text extraction functions and/or user context determination functions as compared to polynomials. As such, the text extraction models and/or context determination models of the present disclosure can provide superior prediction accuracy if trained properly.").
Millius is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, Zhang '928, and Brundage to incorporate the teachings of Millius to include the use of a neural network capable of being individually trained for different users.  Doing so would improve the accuracy of determining user context.
Claims 10 – 11, 16 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan.
Regarding claim 10, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha discloses a method for determining an intent associated with an unstructured text input sequence, the method performed by a processor executing instructions relating to a neural network (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.") and comprising:
receiving, by a pre-processing layer of the neural network, the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.").
Varsha does not specifically disclose: generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in each word, each of the concatenated partial word embeddings is a n-gram embedding; generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings; based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Z. Zhang teaches:
generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and concatenated partial word embeddings for one or more characters in each word (Section 1, lines 35-41, “In this paper, we present various simple yet accurate subword-augmented embedding (SAW) strategies and propose SAW Reader as an instance. Specifically, we adopt subword information to enrich word embedding and survey different SAW operations to integrate word-level and subword-level embedding for a fine-grained representation. To ensure adequate training of OOV and low-frequency words, we employ a short list mechanism. Our evaluation will be performed on three public Chinese reading comprehension datasets and one English benchmark dataset for showing our method is also effective in multi-lingual case.”; Section 2.2, lines 15-19, “To alleviate the OOV issues, we keep a short list H for specific words. H = {w1, w2, . . . , wn} If w is in H, the immediate word embedding WE(w) is indexed from word lookup table Mw ϵ Rd x s where s denotes the size (recorded words) of lookup table. Otherwise, it will be represented as the randomly initialized default word (denoted by a specific mark UNK).”; Section 2.2, lines 30-31, “The subword embedding SE(w) is generated by taking the final outputs of a bidirectional gated recurrent unit (GRU) (Cho et al., 2014) applied to the embeddings from a lookup table of subwords.”; Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”; Section 2.2, lines 1-9, “After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w. AE(w) = WE(w) ♢ SE(w) where ♢ denotes the detailed integration operation. In this work, we investigate concatenation (concat), element-wise summation (sum) and element-wise multiplication (mul). Thus, each document D and query Q is represented as Rd x k matrix where d denotes the dimension of word embedding and k is the number of words in the input.”; Evaluating the subword-augmented embedding method on an English benchmark dataset demonstrates that the word embeddings and subword embeddings are generated for each word in an unstructured text input sequence.  The method for determining the word embedding WE(w) reads on a pre-processing layer, the bidirectional gated recurrent unit for generating the subword embedding SE(w) reads on a pre-processing layer, and character n-gram subwords read on partial word embeddings for one or more characters.),
each of the concatenated partial word embeddings is a n-gram embedding (Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”).
Z. Zhang teaches generating word embeddings and sub-word embeddings, and concatenating the word embeddings and sub-word embeddings, in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha and Z. Zhang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Z. Zhang to generate word embeddings and sub-word embeddings, and concatenate the word embeddings and sub-word embeddings.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Varsha in view of Z. Zhang does not specifically disclose: generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings; based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Chang teaches generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the plurality of embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Z. Zhang, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Z. Zhang and Chang does not specifically disclose: based at least in part on the encodings, generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database; and providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Kant teaches generating, by a softmax layer of the neural network, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database (Column 11, lines 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Z. Zhang, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Varsha in view of Z. Zhang, Chang, and Kant does not specifically disclose: providing, by a fully connected layer of the neural network, weights for determining the probable classification.
Nagabhushan teaches providing, by a fully connected layer of the neural network, weights for determining the probable classification (Paragraph 0075, lines 1-5, "As shown by reference number 550, the convolutional neural network includes an inference layer to classify the text (e.g., associating classifications, categories, labels, or the like) using the features and weights provided by the fully connected layers.").  Nagabhushan teaches the use of a fully connected layer in a neural network used for classification to improve the computing efficiency of the classification (Paragraph 0013, lines 7-12, "The text classification performed by the text classification platform may improve the efficiency of text classification, e.g., by performing text classification using fewer computing resources, such as processing resources, memory resources, or the like, than other text classification techniques.").
Varsha, Z. Zhang, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, and Kant to incorporate the teachings of Nagabhushan to include, in the neural network, a fully connected layer configured to provide weights for the classification.  Doing so would improve the computing efficiency of the classification.
Regarding claim 11, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10.  Varsha further discloses: wherein the unstructured text input sequence takes a form of natural language (Varsha, Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages.").
Regarding claim 16, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, discloses the method as claimed in claim 10.
Z. Zhang further teaches:
wherein each of the plurality of partial word embeddings corresponding to two or more characters in the word, a number of characters corresponding to each one of the partial word embeddings being less than a total number of characters in the word (Abstract, lines 4-5, “In this paper, we propose to use subword rather than character for word embedding enhancement.”; Section 2.1, lines 1-2, “Word in most languages usually can be split into meaningful subword units despite of the writing form. For example, “indispensable” could be split into the following subwords: < in, disp, ens, able >.”; Section 2.2, lines 1-4, “Our subwords are also formed as character n-grams, do not cross word boundaries. After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w.”; The n-gram subword embeddings read on partial word embeddings corresponding to two or more characters in the word, and splitting each word into subwords reads on the partial word embeddings being less than a total number of characters in the word.).
Z. Zhang teaches using sub-word embeddings corresponding to two or more characters and less than a total number of characters in the word in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha, Z. Zhang, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to further incorporate the teachings of Z. Zhang to use sub-word embeddings corresponding to two or more characters and less than a total number of characters in the word.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Regarding claim 22, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, discloses the method as claimed in claim 10.
Z. Zhang further teaches:
wherein n is a fixed integer value no greater than a word length for the plurality of partial word embeddings (Section 2.2, lines 1-4, “Our subwords are also formed as character n-grams, do not cross word boundaries. After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w.”; Splitting each word into subwords reads on the partial word embeddings being less than a total number of characters in the word.).
The specification cites, in paragraph 0072, “The partial word embeddings may be referred to as "n-gram" embeddings, where n is the maximum number of characters in the word that are considered. Thus, using the word "where" as an example, and n=3, it will be represented by the character n-grams: <wh, whe, her, ere, and re>.”.  With this definition of an “n-gram”, n being a fixed integer value no greater than a word length is interpreted to mean that the partial words are less than the word length, not that all of the partial words are the same number of characters.
Z. Zhang teaches using n-gram sub-word embeddings where n is a fixed integer value no greater than a word length in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha, Z. Zhang, Chang, Kant, and Nagabhushan are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to further incorporate the teachings of Z. Zhang to use n-gram sub-word embeddings where n is a fixed integer value no greater than a word length.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, and further in view of Bhatt.
Regarding claim 12, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database.
Bhatt teaches: wherein the action item comprises one of updating, modifying, adding, or deleting an item of the database (Column 4, lines 51-57, "In another arrangement, structured query generator 110 may include one or more additional annotators that are configured to determine database management system operations from natural language text. As defined herein, the term “database management system operation” or “database operation” means a create, read, update, or delete (CRUD) operation for the database management system.").  Bhatt teaches interpreting natural language text input to access a database so that the database can be accessed with unstructured input (Column 2, lines 30-39, "This disclosure relates to generating queries and, more particularly, to generating structured queries from natural language text. In accordance with the inventive arrangements disclosed herein, natural language text may be received and operated upon to generate a structured query for a database management system. In one arrangement, the natural language text may be directed to a particular database management system to request information. The natural language text may be expressed as free form or unstructured text.").
Bhatt is considered to be analogous to the claimed invention because it is in the same field of interpreting a natural language input for database access.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to incorporate the teachings of Bhatt to provide for updating, modifying, adding, or deleting an item from the database.  Doing so would allow the database to be accessed and updated with unstructured natural language input.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, and further in view of Zhang '962.
Regarding claim 13, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein generating encodings for the plurality of embeddings comprises generating a vector related to at least one word in the unstructured text input sequence.
Zhang '962 teaches: wherein generating encodings for the plurality of embeddings comprises generating a vector related to at least one word in the unstructured text input sequence (Column 7, line 41-42, "To encode an utterance u=(w.sub.1, w.sub.2, . . . , w.sub.N) of N words, we use a RNN with Gated Recurrent Units").  Zhang '962 teaches the use of a neural network that generates encoding vectors for words in the input to improve the performance of the neural network processing (Column 4, line 36-39, "SI-RNN redesigns the dialog encoder by updating speaker embeddings in a role-sensitive way. Speaker embeddings are updated in different GRU-based units depending on their roles (sender, addressee, observer)."; Column 10, lines 24-27, "As shown in Table 2 (FIG. 5), our discovery and development of SI-RNN significantly improves upon the previous state-of-the-art. In particular, addressee selection (ADR) benefits most,"; Column 10, lines 30-32, "Response selection (RES) is also improved, suggesting role-sensitive GRUs and joint selection are helpful for response selection as well.").
Zhang '962 is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to incorporate the teachings of Zhang '962 to include the use of a neural network that generates encoding vectors for words in the input.  Doing so would improve the performance of the neural network processing.
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, and further in view of Jagannatha.
Regarding claim 14, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, discloses the method as claimed in claim 10, but does not specifically disclose wherein generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the embeddings comprises: serially processing the words in the unstructured text input sequence in a first direction to generate respective first vectors; serially processing the words in the unstructured text input sequence in a second direction to generate respective second vectors; and concatenating the first and second vectors.
Jagannatha teaches:
wherein generating, by an encoder stack of the neural network that comprises a plurality of encoding layers, encodings for the embeddings comprises: serially processing the words in the unstructured text input sequence in a first direction to generate respective first vectors; serially processing the words in the unstructured text input sequence in a second direction to generate respective second vectors (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and concatenating the first and second vectors (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches generating encodings using two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to incorporate the teachings of Jagannatha to include generating encodings using two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan, and further in view of Brundage.
Regarding claim 17, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan discloses the method as claimed in claim 10, but does not specifically disclose: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations.
Brundage teaches: wherein the database comprises a multi-tenant database accessible by a plurality of separate organizations (Column 5, lines 62-66, "In some implementations, databases herein can store information from one or more tenants into tables of a common database image to form an on-demand database service (ODDS), which can be implemented in many ways, such as a multi-tenant database system (MTDS).").  Brundage teaches the use of a multi-tenant database in a neural network system used to access a database to improve the user access time and streaming media quality when there are multiple users (Column 4, lines 56-60, "The request-routing mechanism allocates servers in the content delivery infrastructure to the requesting client devices of users 118a-n in a way that, for web content delivery, minimizes a given client's response time and, for streaming media delivery, provides for the highest quality.").
Brundage is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to access a database.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, and Nagabhushan to incorporate the teachings of Brundage to include the use of a multi-tenant database.  Doing so would improve the user access time and streaming media quality when there are multiple users.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Brundage, and further in view of Millius.
Regarding claim 18, as best understood based on the 35 U.S.C. 112(b) issues identified above, Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Brundage discloses the method as claimed in claim 17, but does not specifically disclose: wherein training of the neural network is individually configured by at least one of the separate organizations.
Millius teaches: wherein training of the neural network is individually configured by at least one of the separate organizations (Column 13, lines 28-35, "Outputs can be user-customized by training a machine-learned context determination model and/or a machine-learned text extraction model using training data including labeled device data obtained from a mobile computing device associated with a particular user, thus providing tailored results that are targeted towards specific text message content and/or user contexts associated with a particular user.").  Millius teaches the use of a neural network capable of being individually trained for different users to improve the accuracy of determining user context (Column 13, lines 35-45, "More complex and customized nuances in text extraction determinations and/or user context determinations can thus be afforded using the disclosed machine learning techniques. When machine learned models include deep neural networks as described, such models can better model complex text extraction functions and/or user context determination functions as compared to polynomials. As such, the text extraction models and/or context determination models of the present disclosure can provide superior prediction accuracy if trained properly.").
Millius is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, Kant, Nagabhushan, and Brundage to incorporate the teachings of Millius to include the use of a neural network capable of being individually trained for different users.  Doing so would improve the accuracy of determining user context.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, and Kant.
Regarding claim 19, Varsha discloses a method for determining an intent associated with an unstructured text input sequence (Abstract, lines 1-7, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database. In this work, for sequence translation, an RNN auto-encoder is used which has been the foundation for several online translation between human languages."), the method comprising:
receiving, by a pre-processing layer, the unstructured text input sequence (Section IIIA, lines 9-11, "The following diagram represents the system architecture of the framework for conversion of natural language text into database query."; Section IIIB, lines 3-4, "The input data is initially pre-processed."),
wherein the unstructured text input sequence comprises a plurality of words (Section IIIA, lines 1-6, "The aim of the project is to translate natural language query into SQL statements i.e., mapping a sequence of input natural language sentences q = x1, x2 , x3 , ...xn to a sequence of SQL statements s = y1 , y2 , y3 ....yq , where x1, x2 , x3 , ...xn are sequences of input and y1 , y2 , y3 ....yq are sequences of output."),
wherein at least a portion of the unstructured text input sequence relates to an action item to be taken with respect to modifying a database (Abstract, lines 1-4, "The aim of this work is to transcribe natural language statements into logical forms, specifically SQL statements. The purpose of such conversion is to efficiently interact with the database.").
Varsha does not specifically disclose: generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in each word, each of the partial word embeddings is a n-gram embedding; generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Z. Zhang teaches:
generating, by the pre-processing layer of the neural network, a plurality of embeddings that concatenate word-level embeddings for each word in the unstructured text input sequence and partial word embeddings for one or more characters in each word (Section 1, lines 35-41, “In this paper, we present various simple yet accurate subword-augmented embedding (SAW) strategies and propose SAW Reader as an instance. Specifically, we adopt subword information to enrich word embedding and survey different SAW operations to integrate word-level and subword-level embedding for a fine-grained representation. To ensure adequate training of OOV and low-frequency words, we employ a short list mechanism. Our evaluation will be performed on three public Chinese reading comprehension datasets and one English benchmark dataset for showing our method is also effective in multi-lingual case.”; Section 2.2, lines 15-19, “To alleviate the OOV issues, we keep a short list H for specific words. H = {w1, w2, . . . , wn} If w is in H, the immediate word embedding WE(w) is indexed from word lookup table Mw ϵ Rd x s where s denotes the size (recorded words) of lookup table. Otherwise, it will be represented as the randomly initialized default word (denoted by a specific mark UNK).”; Section 2.2, lines 30-31, “The subword embedding SE(w) is generated by taking the final outputs of a bidirectional gated recurrent unit (GRU) (Cho et al., 2014) applied to the embeddings from a lookup table of subwords.”; Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”; Section 2.2, lines 1-9, “After using unsupervised segmentation methods to split each word into a subword sequence, an augmented embedding (AE) is to straightforwardly integrate word embedding WE(w) and subword embedding SE(w) for a given word w. AE(w) = WE(w) ♢ SE(w) where ♢ denotes the detailed integration operation. In this work, we investigate concatenation (concat), element-wise summation (sum) and element-wise multiplication (mul). Thus, each document D and query Q is represented as Rd x k matrix where d denotes the dimension of word embedding and k is the number of words in the input.”; Evaluating the subword-augmented embedding method on an English benchmark dataset demonstrates that the word embeddings and subword embeddings are generated for each word in an unstructured text input sequence.  The method for determining the word embedding WE(w) reads on a pre-processing layer, the bidirectional gated recurrent unit for generating the subword embedding SE(w) reads on a pre-processing layer, and character n-gram subwords read on partial word embeddings for one or more characters.),
each of the partial word embeddings is a n-gram embedding (Section 2.2, line 1, “Our subwords are also formed as character n-grams, do not cross word boundaries.”).
Z. Zhang teaches generating word embeddings and sub-word embeddings, and concatenating the word embeddings and sub-word embeddings, in order to improve the performance of neural network models for a reading comprehension task (Section 6, lines 1-8, “This paper presents an effective neural architecture, called subword-augmented word embedding to enhance the model performance for the cloze-style reading comprehension task. The proposed SAW Reader uses subword embedding to enhance the word representation and limit the word frequency spectrum to train rare words efficiently. With the help of the short list, the model size will also be reduced together with training speedup. Unlike most existing works, which introduce either complex attentive architectures or many manual features, our model is much more simple yet effective. Giving state-of-the-art performance on multiple benchmarks, the proposed reader has been proved effective for learning joint representation at both word and subword level and alleviating OOV difficulties.”).
Varsha and Z. Zhang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha to incorporate the teachings of Z. Zhang to generate word embeddings and sub-word embeddings, and concatenate the word embeddings and sub-word embeddings.  Doing so would allow for improving the performance of neural network models for a reading comprehension task.
Varsha in view of Z. Zhang does not specifically disclose: generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings; generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Chang teaches generating, by an encoder stack comprising a plurality of encoding layers, encodings for the plurality of embeddings (Column 6, lines 6-8, "Each word in the document is fed into embedding layer 310, embedding the words into hidden states, h1, h2, and h3 through the encoding layers 320.").  Chang teaches a plurality of encoding layers that generate encodings for the embeddings in order to increase the amount of information included in the neural network encodings (Column 7, line 64 - Column 8, line 5, "The encoding layers 320 then generate hidden vectors h1, h2, and h3 which are fed into decoder 307. The encoding layers 320 generate hidden vectors h1, h2, and h3 by sequentially taking previous hidden vectors as an input and also inputting the next word from the embedding layer 320. At each stage in the encoder 305, the hidden vector grows as all of the previous information is combined with the new information for the new document word, until the model finally ends up with the hidden vectors h1, h2, and h3.").
Varsha, Z. Zhang, and Chang are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang to incorporate the teachings of Chang to include, in the neural network, a plurality of encoding layers that generate encodings for the embeddings.  Doing so would increase the amount of information included in the neural network encodings.
Varsha in view of Z. Zhang and Chang does not specifically disclose: generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings.
Kant teaches generating, by a softmax layer, a probable classification for the intent associated with the unstructured text input sequence regarding an action item to be taken with respect to modifying the database based on the encodings (Column 9, line 52-56, "the data processing system 120 or the DNN can include a softmax layer, (e.g., a normalized exponential or other logistic function) that normalizes the inferences of each of the predicted classification categories"; Column 10, line 3-6, "This information can be provided to the database 220 where it can be accessed by the data processing system 120 to correlate this particular object with another object 110.").  Kant teaches the use of a softmax layer in a neural network used for classification and database access to reduce the complexity of the classification categories, resulting in lower latency and bandwidth requirements when accessing a database (Column 11, lines 54-67, "Relative to a multi-level (or higher level such as second level or beyond) classification categories, the data processing system 120 that identifies the correlation between objects 110 can conserve processing power or bandwidth by limiting evaluation to a single or lower or coarser (e.g., first) level classification category as fewer search, analysis, or database 220 retrieval operations are performed. This can improve operation of the system 100 including the data processing system 120 by reducing latency and bandwidth for communications between the data processing system 120 or its components and the database 220 (or with the end user computing device 225, and minimizes processing operations of the data processing system 120, which reduces power consumption.").
Varsha, Z. Zhang, Chang, and Kant are considered to be analogous to the claimed invention because they are in the same field of using neural network processing to interpret a natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang and Chang to incorporate the teachings of Kant to include, in the neural network, a softmax layer that normalizes the classifications.  Doing so would result in lower latency and bandwidth requirements when accessing a database.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Varsha in view of Z. Zhang, Chang, and Kant, and further in view of Jagannatha.
Regarding claim 20, Varsha in view of Z. Zhang, Chang, and Kant discloses the method as claimed in claim 19, but does not specifically disclose: wherein the generating, by the encoder stack comprising the plurality of encoding layers, encodings for the plurality of embeddings comprises: generating, by a first row of gate recurrent units, a first hidden state vector by processing the plurality of embeddings in a first order; generating, by a second row of gate recurrent units, a second hidden state vector by processing the plurality of embeddings in a second order; and obtaining the encodings by concatenating the first hidden state vector and the second hidden state vector.
Jagannatha teaches:
wherein the generating, by the encoder stack comprising the plurality of encoding layers, encodings for the plurality of embeddings comprises: generating, by a first row of gate recurrent units, a first hidden state vector by processing the plurality of embeddings in a first order (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
generating, by a second row of gate recurrent units, a second hidden state vector by processing the plurality of embeddings in a second order (Section 4.1, page 476, column 2, lines 7-11, "The words are mapped into their corresponding vector representations and fed into the LSTM layer. The LSTM layer consists of two LSTM chains, one propagating in the forward direction and other in the backward direction."; Section 4.2, page 477, column 1, lines 23-25, "We use GRU with the same Neural Network structure as shown in Figure 1 by replacing the LSTM nodes with GRU.");
and obtaining the encodings by concatenating the first hidden state vector and the second hidden state vector (Section 4.1, page 476, column 2, lines 11-13, "We concatenate the output from the two chains to form a combined representation of the word and its context.").
Jagannatha teaches the use of encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results to improve the performance of the neural network processing (Section 6, page 479, column 1, lines 3-8, "All RNN models significantly outperform the baseline (CRF-context). Compared to the baseline system, our best system (GRU-document) improved the recall (0.8126), precision (0.7938) and F-score (0.8031) by 19%, 2% and 11% respectively. Clearly the improvement in recall contributes more to the overall increase in system performance.”).
Jagannatha is considered to be analogous to the claimed invention because it is in the same field of using neural network processing to interpret natural language input.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Varsha in view of Z. Zhang, Chang, and Kant to incorporate the teachings of Jagannatha to include, in the neural network, encoding layers that contain two rows of gated recurrent units to process the input in two different directions and concatenate the results.  Doing so would improve the performance of the neural network processing.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657