DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 06/07/2018, 10/18/2018, 09/19/2019 and 03/25/2020 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 6-7, 24 and 27 are rejected under 35 U.S.C. 112(b)  or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claims 6, 8 and 24 recite the limitation “positive sentiment class, negative sentiment class, very positive sentiment class, very negative sentiment class, somewhat positive sentiment class, somewhat negative sentiment class, or neutral sentiment class.” rendering the claim indefinite. It is unclear how those sentiment classes are positive, very positive, somewhat positive, etc; as they are subjective terms. The specification fails to provide a clear explanation. Claims 8 and 24 recite the same terms.
Claim 7 is also rejected due to its dependency on a rejected claim.
Claim 27 recites the limitation “computer executable instructions that implement the neural network-based NLP system” rendering the claim indefinite, because instructions cannot implement a system.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4-6, 13, 18-19 and 25-26 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis ("Machine Learning Sentiment Prediction based on Hybrid Document Representation") in view of Stuner ("Cascading BLSTM Networks for Handwritten Word Recognition").
In regard to claims 1 and 25, Stalidis teaches: A computer-implemented method of efficiently performing a machine classification task on a dataset comprising a plurality of inputs, the method including: (Stld. p. 2, Section I. INTRODUCTION, "Sentiment Analysis (often referred to as Opinion Mining) involves the application of Natural Language Processing, text analytics, and computational linguistics methods and tools in order to extract subjective information from text... Document-level Sentiment Analysis [a machine classification task] aims at classifying a multi-sentence document [e.g. a dataset (document) comprising inputs (sentence)] in terms of the polarity of the opinion expressed within it.")
processing the inputs through a first non-recurrent neural network which performs the machine classification task on the inputs, (Stld. p. 3, Section I, "This paper introduces a hybrid method that combines both lexicon-based features and machine learning for sentiment classification. We propose a novel document representation approach that combines the Bag-of-Words [e.g. a first non-recurrent neural network] representation with a TF-IDF weighted Word2Vec representation together with a vector of lexicon-based sentiment values... In Section II we describe the sentence vector representation methodology..."; Based on spec. [0036] the claimed invention use a combination of bag of words model and LSTM model)
wherein the first non-recurrent neural network generates a mean vector representation of an input by averaging vector embeddings of the input; and (Stld. p. 6 section II. VECTOR REPRESENTATION OF SENTENCES "Probably, the most straightforward approach to derive document-level representations is to simply take the average of the vectors of all words contained in the document, according to the Word2Vec embedding (i.e. mean Word2Vec). [a mean vector representation]")
Stalidis fails to teach, but Stuner teaches: further processing a selected subset of inputs through a second recurrent neural network (abbreviated RNN) to perform the machine classification task on the inputs in the subset, (Stuner p. 3418 "In the following experiments we propose a three see Rejects in yellow in Figure 2 [a selected subset of inputs]; p. 3418 "Fig. 2. The proposed BLSTM [recurrent neural network] Cascade. Each classifier process the rejects from the previous layer.") wherein the selection for further processing is conditioned on the first non-recurrent neural network's confidence in performing the machine classification task on the inputs in the dataset. (Stuner p. 3417 "Cascade of classifiers is a particular combination method, combining classifiers decisions sequentially by exploiting the complementary behavior of the classifiers, in order to progressively refine recognition decisions along the cascade. The core of classifiers cascade is the decision stage allowing rejection. Often the rejection criterion consists in applying a threshold on the classifier’s confidence score [conditioned on the confidence] at the cascade’s current stage.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis to incorporate the teachings of Stuner by including LSTM and a cascade framework. Doing so would provide promising results with low error rate by conceding rejects. (Stuner Abstract "This lexicon control method presents some interesting properties and enables us to efficiently combine LSTM networks in a cascade framework... Our approach presents promising results with low error rate by conceding rejects. Those rejects can finally be processed by a standard lexical decoding, enabling us to reach state of the art performance, while being much faster than existing methods for decoding."; p. 3418 "This rejection mechanism is essential for the cascade and enable to significantly speed up the process, when many classifiers are involved.")
Claim 25 recites substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claim 25. In addition, Stalidis teaches: A system with processors operating in parallel and coupled to memory, the memory loaded with computer instructions to efficiently perform a machine classification task on a dataset comprising a plurality of inputs, the instructions, when executed on the processors, implement actions comprising... (Stld. p.12 Section IV. EXPERIMENTS "… we used Python’s scikit-learn machine learning implementation[36]"; Stalidis indicates that they implement their method on a computer, where a processor and memory are inherent)
In regard to claim 4, Stalidis and Stuner teach: The computer-implemented method of claim 1, wherein the machine classification task is to classify an input to a first class or a second class. (Stld. p. 2, Section I. INTRODUCTION, "Sentiment Analysis aims mainly at identifying the sentiment content of the textual resource under inspection, and subsequently estimating its polarity (positive/negative) [first class/second class]... Document-level Sentiment Analysis [the machine classification task] aims at classifying a multi-sentence document in terms of the polarity of the opinion expressed within it.")
In regard to claim 5, Stalidis and Stuner teach: The computer-implemented method of claim 4, wherein the first non-recurrent neural network's confidence represents a probability score assigned to either the first class or the second class. (Stld. P.7 "... its positivity, negativity and objectivity, respectively, with the sum of these scores being always 1"; P. 8-9 "Thus, the vector representation of the text is Positive / Objective / Negative / Unknown [e.g. positive and negative as being first class or second class (if only two scores are selected)] 0.036 / 0.671 / 0.121 / 0.171 [e.g. a probability score]")
In regard to claim 6, Stalidis and Stuner teach: The computer-implemented method of claim 5, wherein the input is a sentence and the machine classification task is to classify the sentence to at least one of positive sentiment class, negative sentiment class, very positive sentiment class, very negative sentiment class, somewhat positive sentiment class, somewhat negative sentiment class, or neutral sentiment class. (Stld. p. 2, Section I. INTRODUCTION, "Sentiment Analysis aims mainly at identifying the sentiment content of the textual resource under inspection, and subsequently estimating its polarity (positive/negative) [positive sentiment class/negative sentiment class]... Document-level Sentiment Analysis [the machine classification task] aims at classifying a multi-sentence [input]
In regard to claim 13, Stalidis and Stuner teach: The computer-implemented method of claim 1, wherein the second RNN is a long short-term memory (abbreviated LSTM) network. (Stuner p. 3418 "Fig. 2. The proposed BLSTM [the second RNN / LSTM] Cascade. Each classifier process the rejects from the previous layer.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis to incorporate the teachings of Stuner by including LSTM and a cascade framework. Doing so would provide promising results with low error rate by conceding rejects.
In regard to claim 18, Stalidis and Stuner teach: The computer-implemented method of claim 1, wherein the machine classification task is at least one of: part-of-speech (abbreviated POS) tagging, chunking, dependency parsing, semantic relatedness, and textual entailment. (Stld. p.3 "Since one of the steps in lexicon-based approaches is to annotate text with part-of-speech tags, the aspects that the sentiment is targeted upon can be easily detected.")
In regard to claim 19, Stalidis and Stuner teach: The computer-implemented method of claim 1, wherein the machine classification task is at least one of: speech recognition, machine translation, text summarization, question answering, image captioning, and text-to-speech (abbreviated TTS) synthesis. (Stld. p. 2 "Related tasks are question answering (recognizing opinion oriented questions) and summarization (accounting for multiple viewpoints).")
In regard to claim 26, Stalidis and Stuner teach: A non-transitory, computer-readable medium having computer executable instructions for performing the method of claim 1. (Stld. p.12 Section IV. EXPERIMENTS "… we used Python’s scikit-learn machine learning implementation[36]"; Stalidis indicates that they implement their method on a computer, where a non-transitory, computer-readable medium is inherent
Claims 2 and 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in further view of Almahairi ("Dynamic Capacity Networks").
In regard to claim 2, Stalidis and Stuner fail to teach, but Almahairi teaches: The computer-implemented method of claim 1, wherein the second RNN is at least three percent more accurate and four times computationally more expensive than the first non-recurrent neural network. (Almh. p. 2 Section 2. Dynamic Capacity Networks "... We assume that the fine model [e.g. the second RNN] can achieve very good performance, but is computationally expensive... Conceptually, the coarse model [e.g. the first non-recurrent neural network] can be much more computationally efficient, but is expected to have worse performance than the fine model."; p. 7 Section 4.2.4. EMPIRICAL EVALUATION "Table 2 shows results of our experiment on SVHN. The coarse model has an error rate of 40.6%... The fine model, on the other hand, achieves a better error rate of 25.2% [three percent more accurate], but is more computationally expensive."; "Figure 5. Number of multiplications in the Coarse, Fine and DCNmodels given different image input sizes"; Given image size 448x448, the number of multiplications is approximate 35 and 3 billion for fine and coarse models respectively. [four times computationally more expensive]; Additionally, based on spec. [0119] [0155], and also see MPEP §2144.05 II. A, differences in frequent feature values will not support the patentability of subject matter encompassed by the prior art unless there is evidence indicating such value is critical. See In re Aller, 220 F.2d 454, 456, 105 USPQ 233, 235 (CCPA 1955), 'three percent' and 'four times' do not have patentable weights.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Almahairi by including Dynamic Capacity Network (DCN). Doing so would reduce the number of computations while maintaining similar or even better performance. (Almh., Abstract "We introduce the Dynamic Capacity Network (DCN)... This is achieved by combining modules of two types: low-capacity subnetworks and high-capacity sub-networks... Our findings indicate that DCNs are able to drastically reduce the number 
In regard to claim 9, Stalidis, Stuner and Almahairi teach: The computer-implemented method of claim 2, wherein the first non-recurrent neural network is at least one bag of words (abbreviated BoW) network. (Stld. p. 3, Section I, "This paper introduces a hybrid method that combines both lexicon-based features and machine learning for sentiment classification. We propose a novel document representation approach that combines the Bag-of-Words [e.g. a first non-recurrent neural network] representation with a TF-IDF weighted Word2Vec representation together with a vector of lexicon-based sentiment values... )
In regard to claim 10, Stalidis, Stuner and Almahairi teach: The computer-implemented method of claim 2, wherein the first non-recurrent neural network is at least one continuous bag of words (abbreviated CBoW) network. (Stld. p.6 "(a) Continuous Bag-of-Words model (CBOW): predicts a word when the surrounding words are given. It is much faster than the Skip-gram model and slightly more accurate for frequent words")
In regard to claim 11, Stalidis, Stuner and Almahairi teach: The computer-implemented method of claim 2, wherein the first non-recurrent neural network is at least one skip-gram network. (Stld. p.6 "(b) Skip-gram model (SG): predicts a window of words when a single word is known. It operates well with a small amount of training data representing accurately even rare words and phrases.")
Claims 3, 16, 20 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in further view of Abdelazeem ("A Greedy Approach for Building Classification Cascades").
In regard to claim 3, Stalidis and Stuner fail to teach, but Abdelazeem teaches: The computer-implemented method of claim 1, further including selecting the second RNN (Stuner p. 3418 "Fig. 2. The proposed BLSTM [RNN] Cascade. Each classifier process the rejects from the previous layer.") when the first non-recurrent neural network's confidence is below a set threshold. (Abdl., p. 116 "Figure 1. Typical classification cascade system"; p. 115 "In such a system, all the patterns to be classified first go through a first stage... The patterns that are classified with confidence scores lower than the threshold [a set threshold] are rejected to the second stage [selecting the second model] ... Figure 1 illustrates this idea.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Abdelazeem by including cascade systems. Doing so would allow the patterns to pass through different stages until they reach the powerful last stage that does not reject any pattern. (Abdl., p. 115 " This observation led to the development of cascade systems [1] which is the main concern of this paper... In the same manner, the patterns pass through different stages until they reach the powerful last stage that does not reject any patterns.")
In regard to claim 16, Stalidis, Stuner and Abdelazeem teach: The computer-implemented method of claim 3, wherein the threshold is a single value. (Abdl. p. 116 "Figure 1. Typical classification cascade system"; p. 115 "the patterns that are classified with confidence scores lower than the threshold [a single value] are rejected to the second stage ... Figure 1 illustrates this idea.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Abdelazeem by including cascade systems. Doing so would allow the patterns to pass through different stages until they reach the powerful last stage that does not reject any pattern.
In regard to claim 20, Stalidis teaches: A neural network-based natural language processing (abbreviated NLP) system (Stld. p. 2, Section I. INTRODUCTION, "Sentiment Analysis (often referred to as Opinion Mining) involves the application of Natural Language Processing [NLP], text analytics, and computational linguistics methods and tools in order to extract subjective information from text...") with processors operating in parallel to efficiently perform (Stld. p.12 Section IV. EXPERIMENTS "… we used Python’s scikit-learn machine learning implementation[36]"; Stalidis indicates that they implement their method using Python on a computer, where a processor and memory are inherent) a sentiment classification task on a sentence, the system comprising: (Stld. p. 2, Section I. INTRODUCTION, "Document-level Sentiment Analysis aims at classifying a multi-sentence document [a sentiment classification task on a sentence] in terms of the polarity of the opinion expressed within it.")
a first non-recurrent neural network that evaluates the sentence and (Stld. p. 3, Section I, "This paper introduces a hybrid method that combines both lexicon-based features and machine learning for sentiment classification. We propose a novel document representation approach that combines the Bag-of-Words [e.g. a first non-recurrent neural network] representation with a TF-IDF weighted Word2Vec representation together with a vector of lexicon-based sentiment values... In Section II we describe the sentence vector representation methodology..."; Based on spec. [0036] the claimed invention use a combination of bag of words model and LSTM model) produces a confidence score which specifies a likelihood of the sentence's sentiment, (Stld. P. 8-9 "Thus, the vector representation of the text is Positive / Objective / Negative / Unknown [e.g. the sentence's sentiment] 0.036 / 0.671 / 0.121 / 0.171 [e.g. a likelihood]") wherein the first non-recurrent neural network generates a mean vector representation of an input by averaging vector embeddings of the input; and (Stld. p. 6 section II. VECTOR REPRESENTATION OF SENTENCES "Probably, the most straightforward approach to derive document-level representations is to simply take the average of the vectors of all words contained in the document, according to the Word2Vec embedding (i.e. mean Word2Vec). [a mean vector representation]")
Stalidis fails to teach, but Stuner teaches: a guider that compares the confidence score produced by the first non-recurrent neural network against a set threshold and, (Stuner p. 3417 "Cascade of classifiers is a particular combination method, combining classifiers decisions sequentially the decision stage [e.g. a guider] allowing rejection. Often the rejection criterion consists in applying a threshold [a set threshold] on the classifier’s confidence score at the cascade’s current stage.") based on the comparison, determines whether the sentence requires supplemental evaluation by a second recurrent neural network (abbreviated RNN), including: (Stuner p. 3418 "Fig. 2. The proposed BLSTM [recurrent neural network] Cascade. Each classifier process the rejects from the previous layer.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis to incorporate the teachings of Stuner by including LSTM and a cascade framework. Doing so would provide promising results with low error rate by conceding rejects.
Stalidis and Stuner fail to teach, but Abdelazeem teaches: when the confidence score is below the threshold, using the second RNN to classify the sentence's sentiment; and (Abdl. p. 116 "Figure 1. Typical classification cascade system"; p. 115 "In such a system, all the patterns to be classified first go through a first stage... The patterns that are classified with confidence scores lower than the threshold are rejected to the second stage... Figure 1 illustrates this idea.")
when the confidence score is above the threshold, relying on the confidence score produced by the first non-recurrent neural network for the sentiment classification task, without requiring the supplemental evaluation by the second RNN. (Abdl. p. 115 "… those patterns that are classified with confidence score higher than a certain threshold leave the system with the labels given to them by the first stage. ")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Abdelazeem 
In regard to claim 27, Stalidis, Stuner and Abdelazeem teach: A non-transitory, computer-readable medium having computer executable instructions that implement the neural network-based NLP system of claim 20. (Stld. p.12 Section IV. EXPERIMENTS "… we used Python’s scikit-learn machine learning implementation[36]"; Stalidis indicates that they implement their method on a computer, where a non-transitory, computer-readable medium is inherent)
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in further view of Boyer (US 20170052971 A1).
In regard to claim 7, Stalidis and Stuner fail to teach, but Boyer teaches: The computer-implemented method of claim 6, wherein a probability score produced by the first non-recurrent neural network for a linguistically complex sentence is at least twenty percent lower than a class probability produced by the first non-recurrent neural network for a linguistically simple sentence. (Boyer [0021] "The challenge involves how to break a tie or near-tie... These cases of no clear majority seem to create weak signals between the sentiment score states defined... The problem is that this conflicts with the roadmap, particularly when the system is upgraded to reflect linguistic convictions, such as weak positive and strong positive. The in-between states become really weak positive, somewhat strong positive, [e.g. complex sentence] etc."; [0097] "FIGS. 7A-7C illustrate non-linear sentiment confidence functions in accordance with an illustrative embodiment. More particularly, FIG. 7A illustrates the non-linear confidence formula shown above. The confidence values [e.g. a probability score, a class probability] associated with raw aggregate sentiment scores degrade slowly for raw aggregate sentiment scores trending toward the midpoints according to a concave down curve that bottoms out at the midpoints between the sentiment values. Each confidence curve begins at a sentiment value with a 100% confidence and meets at the midpoint another confidence curve that then a simple sentence can be fully positive with 100% confidence, and a complex sentence can by weak negative / positive with 0% confidence[twenty percent lower]; Additionally, as mentioned in the claim 2, based on spec. [0120] [0127], and also see MPEP §2144.05 II. 'twenty percent' does not have patentable weight.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Boyer by including confidence scoring component. Doing so would approximate rate at which the aggregate sentiment is increasing or decreasing, as an indicator of whether the sentiment is trending positively or negatively. (Boyer [0090] "The sentiment confidence scoring component can use the slope of a line of best fit on the cached raw scores, optionally attenuated for recency or a date range, as well as the approximate rate at which the aggregate sentiment is increasing or decreasing, as an indicator of whether the sentiment is trending positively or negatively.")
In regard to claim 8, Stalidis, Stuner and Boyer teach: The computer-implemented method of claim 7, further including using the second RNN (Stuner p. 3418 "Fig. 2. The proposed BLSTM [the second RNN] Cascade. Each classifier process the rejects from the previous layer.") to classify the linguistically complex sentence to the at least one of positive sentiment class, negative sentiment class, very positive sentiment class, very negative sentiment class, somewhat positive sentiment class, somewhat negative sentiment class, or neutral sentiment class. (Stld., p.2 "Sentence-level or phrase-level Sentiment Analysis seeks to classify the sentiment expressed in a single sentence [sentence], and characterize the sentence as positive, negative, or neutral.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis to incorporate the teachings of Stuner by including .
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in view of Almahairi in further view of Bradbury ("Quasi-recurrent neural networks").
In regard to claim 12, Stalidis, Stuner and Almahairi fail to teach, but Bradbury teaches: The computer-implemented method of claim 2, wherein the first non-recurrent neural network is at least one convolutional neural network (abbreviated CNN). (Brd. p. 1 "Convolutional neural networks (CNNs) (Krizhevsky et al., 2012), though more popular on tasks involving image data, have also been applied to sequence encoding tasks (Zhang et al., 2015). Such models apply time-invariant filter functions in parallel to windows along the input sequence. CNNs possess several advantages over recurrent models, including increased parallelism and better scaling to long sequences such as those often seen with character-level language data.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis, Stuner and Almahairi to incorporate the teachings of Bradbury by including Convolutional models. Doing so would allow sequence processing being more successful in a hybrid architecture. (Brd. p. 1 "Convolutional models for sequence processing have been more successful when combined with RNN layers in a hybrid architecture (Lee et al., 2016), because traditional max- and average-pooling approaches to combining convolutional features across timesteps assume time invariance and hence cannot make full use of large-scale sequence order information.")
Claims 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in further view of Bradbury.
In regard to claim 14, Stalidis and Stuner fail to teach, but Bradbury teaches: The computer-implemented method of claim 1, wherein the second RNN is a gated recurrent unit (abbreviated GRU) network. (Brd. p.8 "A similar approach was taken by Lee et al. (2016) for character-level machine a bidirectional GRU.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Bradbury by including GRU. Doing so would allow training speed comparable to subword-level models without hard-coded text segmentation. (Brd. p. 8 "The parallelism of the convolutional, pooling, and highway layers allows training speed comparable to subword-level models without hard-coded text segmentation.")
In regard to claim 15, Stalidis, Stuner and Bradbury teach: The computer-implemented method of claim 1, wherein the second RNN is a quasi-recurrent neural network (abbreviated QRNN). (Brd. Abstract "We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis and Stuner to incorporate the teachings of Bradbury by including QRNN. Doing so would allow training and test up to 16 times fast. (Brd. Abstract "Due to their increased parallelism, they are up to 16 times faster at train and test time.")
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in view of Abdelazeem in further view of Cudak (US 20180032526 A1).
In regard to claim 17, Stalidis, Stuner and Abdelazeem fail to teach, but Cudak teaches: The computer-implemented method of claim 3, wherein the threshold is a range between two values. (Cudak, [0089] "For example, the confidence indication may designate the caller as 'trusted' in response to the confidence score exceeding a predetermined threshold (or alternatively falling within a predetermined range). Similarly, the confidence indication may designate the caller as 'untrustworthy' in response to the confidence score being below a threshold value (or alternatively falling within a predetermined range)."; "(e.g., within an upper range of confidence score values, such as 85% to 100%)...")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis, Stuner and Abdelazeem to incorporate the teachings of Cudak by including confidence score. Doing so would provide a way to present the confidence score by using specific colors. (Cudak, [0105] "Presenting the confidence score may include using specific colors to communicate the confidence score... the presentation module 320 may use the color green to indicate a relatively high confidence score (e.g., within an upper range of confidence score values, such as 85% to 100%)...")
Claims 21 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in view of Abdelazeem in further view of Boyer (US 20170052971 A1).
In regard to claim 21, Stalidis, Stuner and Abdelazeem fail to teach, but Boyer teaches: The system of claim 20, wherein the first non-recurrent neural network produces a twenty percent lower confidence score when the sentence is a linguistically complex sentence compared to when the sentence is a linguistically simple sentence. (Boyer [0021] "The challenge involves how to break a tie or near-tie... These cases of no clear majority seem to create weak signals between the sentiment score states defined... The problem is that this conflicts with the roadmap, particularly when the system is upgraded to reflect linguistic convictions, such as weak positive and strong positive. The in-between states become really weak positive, somewhat strong positive, [e.g. complex sentence] etc."; [0097] "FIGS. 7A-7C illustrate non-linear sentiment confidence functions in accordance with an illustrative embodiment. More particularly, FIG. 7A illustrates the non-linear confidence formula shown above. The confidence values [e.g. confidence score] associated with raw aggregate sentiment scores degrade slowly for raw aggregate sentiment scores trending toward the midpoints according to a concave down curve that bottoms out at the midpoints between the sentiment values. Each confidence curve begins at a simple sentence can be fully positive with 100% confidence, and a complex sentence can by weak negative / positive with 0% confidence [twenty percent lower]; Additionally, as mentioned in the claim 2, based on spec. [0120] [0127], and also see MPEP §2144.05 II. 'twenty percent' does not have patentable weight.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis, Stuner and Abdelazeem to incorporate the teachings of Boyer by including confidence scoring component. Doing so would approximate rate at which the aggregate sentiment is increasing or decreasing, as an indicator of whether the sentiment is trending positively or negatively.
In regard to claim 24, Stalidis, Stuner, Abdelazeem and Boyer teach: The system of claim 21, further including using the second RNN (Stuner p. 3418 "Fig. 2. The proposed BLSTM [the second RNN] Cascade. Each classifier process the rejects from the previous layer.") to classify the linguistically complex sentence's sentiment as positive, negative, very positive, very negative, somewhat positive, somewhat negative, or neutral. (Stld., p.2 "Sentence-level or phrase-level Sentiment Analysis seeks to classify the sentiment expressed in a single sentence [sentence], and characterize the sentence as positive, negative, or neutral.") (see also Boyer [0092] "the sentiments [strong negative, negative, weak negative, neutral, weak positive, positive, strong positive]")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis to incorporate the teachings of Stuner by including LSTM and a cascade framework. Doing so would provide promising results with low error rate by conceding rejects.
.
Claims 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Stalidis in view of Stuner in view of Abdelazeem in view of Boyer of in further view of Socher ("Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank").
In regard to claim 22, Stalidis, Stuner, Abdelazeem and Boyer fail to teach, but Socher teaches: The system of claim 21, wherein the linguistically complex sentence has co-occurrences of negative and positive constituents. (Socher, p. 1638 "Set 1: Negating Positive Sentences. The first set contains positive sentences and their negation [co-occurrences of negative and positive constituents]. In this set, the negation changes the overall sentiment of a sentence from positive to negative. Hence, we compute accuracy in terms of correct sentiment reversal from positive to negative.")
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis, Stuner, Abdelazeem and Boyer to incorporate the teachings of Socher by including Recursive Neural Tensor Network. Doing so would accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases. (Socher Abstract "we introduce the Recursive Neural Tensor Network… it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.")
In regard to claim 23, Stalidis, Stuner, Abdelazeem, Boyer and Socher teaches: The system of claim 21, wherein the linguistically complex sentence has multiple instances of negative words and contrastive conjunctions. (Socher, p. 1638 "5.3 Model Analysis: Contrastive Conjunction In this section, X but Y’ structure: A phrase X being followed by but which is followed by a phrase Y. The conjunction is interpreted as an argument for the second conjunct... Fig. 7 contains an example. We analyze a strict setting, where X and Y are phrases of different sentiment (including neutral)."; in Figure 7, 'slow' and 'repetive' are example of negative words)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Stalidis, Stuner, Abdelazeem and Boyer to incorporate the teachings of Socher by including Recursive Neural Tensor Network. Doing so would accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.
Conclusion
The art made of record and not relied upon is considered pertinent to applicant's disclosure.  Wu ("Aspect-based Opinion Summarization with Convolutional Neural Networks") teaches multitask CNNs with pre-trained word embedding.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained 
/S.C./Examiner, Art Unit 2122

/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126