DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/1/2021 and 9/13/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 9 and 17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1 and 9 of U.S. Patent No. US 10,839,284 B2. Although the claims at issue are not identical, they are not patentably distinct from each other because Claims 1 and 9 of the ‘284 patent are narrower and anticipate claims 1 and 9 of the instant application. Claims 1 and 9 of the ‘284 patent have a POS label embedding layer, POS label embedding vectors, and POS state vectors which corresponds to a first embedding layer, a first label embeddings, and a first state vectors of claims 1 and 9 of the instant application, a chunk embedding layer that corresponds to a second embedding layer, a second label embeddings, and a second state vectors of claims 1 and 9 of the instant application, and a dependency parent identification and dependency relationship label embedding layer which corresponds to the third embedding layer, a third label embeddings, and a third state vectors of claims 1 and 9. In addition, claim 17 of the instant application recite a computer readable medium. The difference would have been obvious because the computer readable medium is needed to store the instructions in order to execute the method of claim 9 of the instant application.
Claims 1, 9, and 17 are rejected as being unpatentable over claims 1, 9, and 9 respectively, of the ‘284 patent.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, 9, 13-14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Fancellu et al. (“Neural networks for negation scope detection”) in view of Sun et al. (“Voice conversion using deep bidirectional long short-term memory based recurrent neural networks”) and Chua et al. (US-20170358293-A1).
Regarding Claim 1,
Fancellu teaches a multi-layer neural network system running on hardware that processes a sequence of tokens in an input sequence, the system comprising: 
a stacked long-short-term-memory (LSTM) token sequence processor, running on hardware, stacked in layers according to an analytical hierarchy, the stacked layers including: 
a first embedding layer implemented as a first bi-directional LSTM (figure 1; pg. 497, section 4; We then turned to a BiLSTM because a better fit for the task. BiLSTM are sequential models that operate both in forward and backwards fashion;) and a first label classifier (pg. 496, section 2; the classifier must correctly classify each word as being inside or outside the scope and assign each word to the correct scope; And pg. 498, section 4; where l is the length of the sentence n ∈ I, x(wi) the probability for the word wi to belong to either the I or O class and y (wi) its gold label. ), the first embedding layer: 
receiving token embeddings representing tokens in the sequence of tokens (pg. 498, section 4; The input to the network for each word wi are the word-embedding vector xwi and the cue-embedding vector cwi , where wi constitutes a time step.), and generating first label embeddings (pg. 498, section 4; 
    PNG
    media_image1.png
    30
    260
    media_image1.png
    Greyscale
 σ the sigmoid activation function and g the softmax operation (g(zm)= ezm/ P k e zk ) to assign a probability to the input of belonging to either the inside (I) or outside (O) of the scope classes. A softmax function is used to determine a class (i.e. label) for the embedding similar to the ones shown in Applicant’s Drawings for figures 4A and 5A of the instant application.) and first state vectors of the tokens from the token embeddings (pg. 498, section 4; 
    PNG
    media_image2.png
    30
    227
    media_image2.png
    Greyscale
 where the Ws are the weight matrices, ht−1 the hidden layer state a time t-1, it , ft , ot the input, forget and the output gate at the time t and hback ; hf orw the concatenation of the backward and forward hidden layers. H denotes the BLSTM state (i.e. first state vectors).); 
While Fancellu discloses a bidirectional LSTM with a classifier that receives token input and outputs label embeddings and state vectors, Fancellu does not explicitly disclose
receiving, at a second embedding layer overlying the first embedding layer and implemented as a second bi-directional LSTM and a second label classifier, the token embeddings representing the input sequence, the first label embeddings, and the first state vectors generated by the first embedding layer; 
generating, using the second embedding layer, second label embeddings and second state vectors from the token embeddings, the first label embeddings, and the first state vectors; 
receiving, at a third embedding layer overlying the second embedding layer and implemented as a third bi-directional LSTM and a third label classifier, the token embeddings, the first label embeddings, the second label embeddings and the second state vectors; 
generating, using the third embedding layer, third label embeddings and third state vectors from the token embeddings, the first label embeddings, the second label embeddings and the second state vectors; and 
providing, using an output processor, results reflecting the third label embeddings for the tokens in the input sequence.
However, Sun (“VOICE CONVERSION USING DEEP BIDIRECTIONAL LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORKS”) teaches
a second embedding layer overlying the first embedding layer and implemented as a second bi-directional LSTM (fig. 4; figure 4 shows a first BLSTM layer and a second BLSTM layer. pg. 4870, section 2; Further, motivated the success of deep network architectures, Deep BLSTM-RNNs are considered to build up high level representation of input features. Similar to the structure of DNNs and Deep RNNs [23], Deep BLSTM can be created by stacking multiple BLSTM hidden layers.) and a second label classifier, the second embedding layer: 
receiving the token embeddings representing the input sequence (pg. 4870; 
    PNG
    media_image3.png
    107
    437
    media_image3.png
    Greyscale
 x denotes the input embeddings which are also taught by Fancellu), the first label embeddings, and the first state vectors generated by the first embedding layer (pg. 4870; 
    PNG
    media_image4.png
    26
    438
    media_image4.png
    Greyscale
 Ht denoes the state vector output by the first layer.); and 
generating second label embeddings (pg. 4870; 
    PNG
    media_image5.png
    39
    404
    media_image5.png
    Greyscale
 Yt denotes the second label embedding which is also taught by Fancellu.) and second state vectors (fig. 4 & 5; pg. 4870; 
    PNG
    media_image4.png
    26
    438
    media_image4.png
    Greyscale
 state vector from first BILSTM forwarded to second layer BILSTM as shown in figure 4 & 5.) from the token embeddings, the first label embeddings, and the first state vectors (pg. 4870; 
    PNG
    media_image6.png
    127
    407
    media_image6.png
    Greyscale
 X denotes token embeddings, h denotes the state vector, and little y denotes previous (first label embeddings) used to calculate Yt (i.e. second label embedding.); 
Fancellu and Sun are analogous because they are both directed towards bi-directional LSTM neural networks.
It would have been obvious to one of ordinary skill in art before the effective filing date to modify the BILSTM of Fancellu with the stacked BILSTM of  Sun.
Doing so would allow for making use of context information in forward and backward directions. This has shown to outperform standard RNNs in numerous tasks such as sequence modelling, context-sensitive language learning, and speech recognition (pg. 4869, introduction;).
Chua (US 20170358293 A1) teaches
a third embedding layer overlying the second embedding layer and implemented as a third bi-directional LSTM (para [0076] Each of the bidirectional LSTMs 237a-d may contain one or more pairs of forward and backward LSTM layers. 237c (i.e. third LSTM). ) and a third label classifier (para [0106] The pronunciation system 106 may use a connectionist temporal classification (CTC) objective function for the third recurrent neural network 200c. ), the third embedding layer: 
receiving the token embeddings (para [0073] The input spelling layer 232 may have a number of cells selected based on the language for the input data 230. For instance, when the language is American English, the input spelling layer 232 may have eighty-six cells, e.g., where each cell corresponds to either a lowercase letter, an uppercase letter, a digit, a padding value, or a symbol such as “\”, “ø”, “CE”, or “ū” when some of the symbols may be found in technical terms, names, Internet identifiers, or foreign or loan words.), the first label embeddings, the second label embeddings (fig. 2C; para [0022] The one-hot vectors may include one label for each letter in a particular alphabet for a language, one value for each phoneme for the language, filler labels, e.g., for padding or stress locations, a value for syllable boundaries, or a combination of two or more of these. And para [0074] For instance, the first bidirectional LSTM 237a may receive one vector from the forward LSTM 234 and one vector from the backward LSTM 236 and concatenate the two vectors to create a single input vector. Vector from LSTM 234 (i.e. first label embedding). Vector from 236 (i.e. second label embedding).) and the second state vectors (para [0036] Both the forward LSTM 206 and the backward LSTM 208 may be unrolled so that there is one state of each layer for each item in the input data 202. For instance, each of the LSTMs may include one state for each one-hot vector in the input sequence of one-hot vectors that form the input data 202.); and 
generating third label embeddings and third state vectors from the token embeddings, the first label embeddings, the second label embeddings and the second state vectors (fig. 2C; para [0076] The forward LSTM 234, the backward LSTM 236, and all of the bidirectional LSTMs 237a-d may be unrolled so that there is one state of each layer for each item in the input data 230. For instance, each of the LSTMs may include one state for each one-hot vector in the input sequence of one-hot vectors that form the input data 230. Each of the bidirectional LSTMs 237a-d may contain one or more pairs of forward and backward LSTM layers. Fancellu and Sun teach that a BILSTM generates a label and embedding and state vector as shown above. Chua further teaches that 237c (i.e. third BILSTM) can receive output from 237a (i.e. first BILSTM) and 237b (i.e. second BILSTM).); and 
an output processor that outputs at least results reflecting the third label embeddings for the tokens in the input sequence (para [0082] The output pronunciation and stress pattern layer 238 uses both sets of data from the third forward LSTM 234c and the third backward LSTM 236c as input to generate a single set of output data 240. For instance, the output pronunciation and stress pattern layer 238 concatenates the data from the third forward LSTM 234c with the data from the third backward LSTM 236c and uses the concatenated data as input. The output pronunciation and stress pattern layer 238 uses the concatenated data to generate the output data 240.).
Fancellu, Sun, and Chua are analogous because they are both directed towards bi-directional LSTM neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu and Sun with the recurrent neural network of Chua.
Doing so would allow for training the neural network using a dropout technique to improve robustness. The dropout technique is used to reduce overfitting so that the neural network can accurately classify unseen data (para [0109]).
Regarding Claim 5,
Fancellu, Sun, and Chua teach the multi-layer neural network system of claim 1. Chua further teaches wherein the third embedding layer generates the third state vectors by passing the token embeddings, the first label embeddings, the second label embeddings and the second state vectors through the third bi-directional LSTM (fig. 2C; para [0076] The forward LSTM 234, the backward LSTM 236, and all of the bidirectional LSTMs 237a-d may be unrolled so that there is one state of each layer for each item in the input data 230. For instance, each of the LSTMs may include one state for each one-hot vector in the input sequence of one-hot vectors that form the input data 230. Each of the bidirectional LSTMs 237a-d may contain one or more pairs of forward and backward LSTM layers. Fancellu and Sun teach that a BILSTM generates a label and embedding and state vector as shown above. Chua further teaches that 237c (i.e. third BILSTM) can receive output from 237a (i.e. first BILSTM) and 237b (i.e. second BILSTM).).
Fancellu, Sun, and Chua are analogous because they are both directed towards bi-directional LSTM neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu and Sun with the recurrent neural network of Chua.
Doing so would allow for training the neural network using a dropout technique to improve robustness. The dropout technique is used to reduce overfitting so that the neural network can accurately classify unseen data (para [0109]).
Regarding Claim 6,
Fancellu, Sun, and Chua teach the multi-layer neural network system of claim 5. Fancellu further teaches wherein the third state vectors are forward and backward state vectors (pg. 498, section 4; 
    PNG
    media_image2.png
    30
    227
    media_image2.png
    Greyscale
 where the Ws are the weight matrices, ht−1 the hidden layer state a time t-1, it , ft , ot the input, forget and the output gate at the time t and hback ; hf orw the concatenation of the backward and forward hidden layers. H denotes the BLSTM forward and backward vectors.).
Regarding Claim 9,
Claim 9 is the method claim corresponding to the system of claim 1. Claim 9 is substantially similar to claim 1 and is rejected on the same grounds.
Regarding Claim 13,
Claim 13 is the method claim corresponding to the system of claim 1. Claim 13 is substantially similar to claim 5 and is rejected on the same grounds.

Regarding Claim 14,
Claim 14 is the method claim corresponding to the system of claim 1. Claim 14 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 17,
Claim 17 is the computer readable medium claim corresponding to the system of claim 1. Claim 17 is substantially similar to claim 1 and is rejected on the same grounds.

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Fancellu/Sun/Chua, as applied above, and further in view of Willson et al. (US-20180114112-A1).
Regarding Claim 2,
Fancellu, Sun, and Chua teach the multi-layer neural network system of claim 1.
	Fancellu, Sun, and Chua do not explicitly disclose 
further comprising: Page 58 of 64Nonprovisional Patent Application Atty. Docket No. 70689.20US04 / A1948US1C1 a joint embedder implemented as a word embedder and a character embedder, the joint embedder generating the token embeddings, each token embedding comprising word embeddings generated by the word embedder and character embeddings generated by the character embedder.
However, Willson (US 20180114112 A1) teaches
a joint embedder implemented as a word embedder and a character embedder, the joint embedder generating the token embeddings, each token embedding comprising word embeddings generated by the word embedder and character embeddings generated by the character embedder (para [0050] The retrieved character embedding vectors for a given word are fed into a set of one-dimensional convolution filters. The maximum output of each one-dimensional convolution over the length of the word is then obtained. These maximum outputs are fed through additional dense feedforward layers (such as a highway layer) of the neural network to yield a word embedding for the word in question. This is one example of an architecture based on convolutions and others are possible.).
Fancellu, Sun, Chua, and Willson are analogous because they are both directed towards LSTM neural networks.
	It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu, Sun, and Chua with the character and word embeddings of Willson.
	Doing so would allow for compressing data in a lossless compression protocol. Neural network embeddings and weights may be quantized before sending the data to the client. For example, 4 byte embeddings may be quantized and reduced to 1 byte (para [0054]).
Regarding Claim 10,
Claim 10 is the method claim corresponding to the system of claim 1. Claim 10 is substantially similar to claim 2 and is rejected on the same grounds.
Regarding Claim 18,
Claim 18 is the computer readable medium claim corresponding to the system of claim 1. Claim 18 is substantially similar to claim 2 and is rejected on the same grounds.

Claims 3-4, 8, 11-12, 16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Fancellu/Sun/Chua, as applied above, and further in view of Le et al. (US-20170316775-A1).
Regarding Claim 3,
Fancellu, Sun, and Chua teach the multi-layer neural network system of claim 1. Fancellu further teaches
	wherein the first embedding layer generates the first state vectors (pg. 498, section 4; 
    PNG
    media_image2.png
    30
    227
    media_image2.png
    Greyscale
 where the Ws are the weight matrices, ht−1 the hidden layer state a time t-1, it , ft , ot the input, forget and the output gate at the time t and hback ; hf orw the concatenation of the backward and forward hidden layers. H denotes the BLSTM state (i.e. first state vectors) of the tokens by passing the tokens through the first bi-directional LSTM (pg. 498, section 4; The input to the network for each word wi are the word-embedding vector xwi and the cue-embedding vector cwi , where wi constitutes a time step.)
	Fancellu, Sun, and Chua do not explicitly disclose
and generates the first label embeddings from the first state vectors using exponential normalization and the first label classifier.
However, Le (US 20170316775 A1) teaches
and generates the first label embeddings from the first state vectors using exponential normalization and the first label classifier (para [0024] The distributions output by the plurality of LMs 42 are normalized by normalization functions 46, e.g. softmax functions 46 in the illustrative example of FIG. 1, or sigmoid functions in embodiments with only two LMs, in order to generate corresponding normalized distributions over the words of the vocabulary. A recurrent neural network (RNN) 50 is applied to the normalized distributions to generate a mixture distribution over the words of the vocabulary.).
Fancellu, Sun, Chua, and Le are analogous because they are both directed towards LSTM recurrent neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu, Sun, and Chua with the normalization of Le.
Doing so would allow for normalizing word distributions to ensure the vocabulary distribution is within a certain range. Normalization removes outliers and improves the accuracy of the network (para [0024]).
Regarding Claim 4,
Fancellu, Sun, and Chua teach the multi-layer neural network system of claim 1. 
Sun teaches wherein the second embedding layer generates the second state vectors by passing the tokens (fig. 4 & 5; pg. 4870; 
    PNG
    media_image4.png
    26
    438
    media_image4.png
    Greyscale
 state vector from first BILSTM forwarded to second layer BILSTM as shown in figure 4 & 5.), the first label embeddings and the first state vectors through the second bi-directional LSTM (pg. 4870; 
    PNG
    media_image6.png
    127
    407
    media_image6.png
    Greyscale
 X denotes token embeddings, h denotes the state vector, and little y denotes previous (first label embeddings) used to calculate Yt (i.e. second label embedding.)
	Fancellu, Sun, and Chua do not explicitly disclose
and generates the second label embeddings from the second state vectors using exponential normalization and the second label classifier.
However, Le (US 20170316775 A1) teaches
wherein the second embedding layer generates the second state vectors by passing the tokens, the first label embeddings and the first state vectors through the second bi-directional LSTM and generates the second label embeddings from the second state vectors using exponential normalization and the second label classifier (para [0024] The distributions output by the plurality of LMs 42 are normalized by normalization functions 46, e.g. softmax functions 46 in the illustrative example of FIG. 1, or sigmoid functions in embodiments with only two LMs, in order to generate corresponding normalized distributions over the words of the vocabulary. A recurrent neural network (RNN) 50 is applied to the normalized distributions to generate a mixture distribution over the words of the vocabulary.).
Fancellu, Sun, Chua, and Le are analogous because they are both directed towards LSTM recurrent neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu, Sun, and Chua with the normalization of Le.
Doing so would allow for normalizing word distributions to ensure the vocabulary distribution is within a certain range. Normalization removes outliers and improves the accuracy of the network (para [0024]).
Regarding Claim 8,
Fancellu, Sun, and Chua teach the multi-layer neural network of claim 5. 
	Fancellu, Sun, and Chua do not explicitly disclose
wherein the third embedding layer generates the third label embeddings using the third label classifier and exponential normalization on the third state vectors.
However, Le (US 20170316775 A1) teaches
wherein the third embedding layer generates the third label embeddings using the third label classifier and exponential normalization on the third state vectors (para [0024] The distributions output by the plurality of LMs 42 are normalized by normalization functions 46, e.g. softmax functions 46 in the illustrative example of FIG. 1, or sigmoid functions in embodiments with only two LMs, in order to generate corresponding normalized distributions over the words of the vocabulary. A recurrent neural network (RNN) 50 is applied to the normalized distributions to generate a mixture distribution over the words of the vocabulary.).
Fancellu, Sun, Chua, and Le are analogous because they are both directed towards LSTM recurrent neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM neural network of Fancellu, Sun, and Chua with the normalization of Le.
Doing so would allow for normalizing word distributions to ensure the vocabulary distribution is within a certain range. Normalization removes outliers and improves the accuracy of the network (para [0024]).
Regarding Claim 11,
Claim 11 is the method claim corresponding to the system of claim 1. Claim 11 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 12,
Claim 12 is the method claim corresponding to the system of claim 1. Claim 12 is substantially similar to claim 4 and is rejected on the same grounds.
Regarding Claim 16,
Claim 16 is the method claim corresponding to the system of claim 1. Claim 16 is substantially similar to claim 8 and is rejected on the same grounds.
Regarding Claim 19,
Claim 19 is the computer readable medium claim corresponding to the system of claim 1. Claim 19 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 20,
Claim 20 is the computer readable medium claim corresponding to the system of claim 1. Claim 20 is substantially similar to claim 4 and is rejected on the same grounds.

Claim(s) 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Fancellu/Sun/Chua, as applied above, and further in view of Dai et al. (US-10528866-B1).
Regarding Claim 7,
Fancellu, Sun, and Chua teach the multi-layer neural network of claim 5. 
	Fancellu, Sun, and Chua do not explicitly disclose
wherein the third embedding layer generates the third label embeddings using the third state vectors and an autoencoder.
However, Dai (US 10528866 B1) teaches
wherein the third embedding layer generates the third label embeddings using the third state vectors and an autoencoder (Col. 4 lines 1-5; In particular, for a given input text sequence, the autoencoder neural network 150 is configured to, as described above with reference to the document classification neural network 110, process each word in the input text sequence in order through the embedding input layer 112 and the LSTM neural network layers 120 to generate an updated hidden state of the LSTM neural network layers 120 after the last word in the sequence has been processed.).
Fancellu, Sun, Chua, and Dai are analogous because they are both directed towards LSTM recurrent neural networks.
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the BILSTM of Fancellu, Sun, and Chua with the autoencoder of Dai.
Doing so would allow for reducing training time and computational resources required to train the classification neural network. Training on the sequence autoencoding task does not require labeled training data and the availability of a large about of unsupervised training data cans be used to improve the performance of the neural network (Col. 2 lines 10-20;).
Regarding Claim 15,
Claim 15 is the method claim corresponding to the system of claim 1. Claim 15 is substantially similar to claim 7 and is rejected on the same grounds.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Graves et al. (“Hybrid speech recognition with deep bidirectional LSTM.”) – discloses a bidirectional LSTM layers stack on top of each other.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217. The examiner can normally be reached Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/H.N./Examiner, Art Unit 2121                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145