Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Abstract Objection
The abstract of the disclosure is objected to because of undue length.  The abstract is 161 words.  37 CFR 1.72 requires that the abstract may not exceed 150 words.  Correction is required.  See MPEP § 608.01(b).
Specification Objection
The disclosure is objected to because of the following informalities: 
Para [0066] “architecture 400” label is not present in FIG. 4.
Para [0067] “architecture 500” label is not present in FIG. 5.
Para [0067] FIG. 5 is associated with LSTM model 244 but is labeled 144
Para [0093] “architecture 1500” label is not present in FIG 15.
Para [0094] “FIG. 902” is not present in the drawings.  
Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 1, 7, 8, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni et al. (hereafter Mahasseni) “Budget-aware Deep Semantic”,  in view of Baluja et al. (hereafter Baluja) “Adversarial Transformation Networks: Learning to Generate Adversarial Examples”, in view of Windasari et al. (hereafter Windasari) “Sentiment Analysis on Twitter Posts: An analysis of Positive or Negative Opinion on GoJek”, and in further view of Arlot et al. (hereafter Arlot)  “A survey of cross-validation procedures for model selection”.
Regarding claim 1
Mahasseni teaches a system with numerous processors operating in parallel, comprising: ([pg. 7, col. 2, lines 3-4] “Experiments are performed on an Intel quad core-i7 CPU and 16GB RAM on a single Tesla k80”) 
a decision neural network-based classifier that selects, ([pg. 8, col. 2, lines 4] “frame selection as a Markov Decision Process, and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames”) 
on an input-by-input basis ([pg. 1, col. 1, lines 30-33] “the goal is to assign a correct class label to every pixel in a video”).   The examiner notes “every pixel” teaches “an input-by-input” basis, as each pixel requires a correct class label similar to each “input” that is labeled.
between non-recurrent (neural network) ([pg. 2, col. 1, lines 13-14] “represented by a one-layer CNN”.   The examiner notes that CNNs are a feed-forward neural network, and teaches a “non-recurrent” neural network), 
and recurrent neural network-based classifiers (pg. 8, Col 2, line 3-5) “used an LSTM as the policy model”, the examiner notes that “LSTM” or Long Short-Term Memory, is a type of recurrent neural network, and teaches “recurrent neural network-based classifiers”),   
trained to perform a machine classification task ([pg. 5, Col 2, lines 1-20] “Algorithm 1 Training procedure of our Budget-Aware semantic segmentation model” The examiner notes “Training procedure of our Budget-Aware semantic segmentation model” teaches “trained to perform”, and ([pg. 1, Col 2, lines 15-16] “deep semantic video segmentation”.  The examiner notes that “deep semantic video segmentation” teaches a “machine classification task”).
 with the selection governed by the decision neural network based classifier’s training ([pg. 7, Col 2, line 4] “frame selection as a Markov Decision Process, and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames”, The examiner notes  “Long Short-Term Memory (LSTM) network to model a policy for selecting the frames” teaches “selection governed by the decision neural network”), and ([pg. 5, Col 2, lines 1-20] “Algorithm 1 Training procedure of our Budget-Aware semantic segmentation model” The examiner notes “Training procedure of our Budget-Aware semantic segmentation model” teaches “classifier’s training”).
Mahasseni does not teach on a decision training set annotated with model class labels that distinguish between inputs accurately classified only by the trained recurrent neural network-based classified and remaining inputs in the decision training set. 
Baluja teaches on a decision training set annotated with model class labels ([pg.1, Col 2, lines 4-12] “generating a targeted adversarial attack on a classifier can be expressed as [formula]=yt, where yt is a member of Y, [where Y] is some target label chosen by the attacker.” The examiner notes “target label chosen by the attacker” teaches “model class label” because Baluja uses a targeted classifier’s outputs “y” as training examples, and as the adversarial network is targeting a specific classifier, these examples are labeled by association.), and ([pg.1, Col 2, lines 27-30] “All that is required is the Target network’s outputs y and [adversarial network generated input classified by the targeted network] y’.  It is therefore possible to train ATNs [Adversarial Transformation Network] in a self-supervised manner, where they use unlabeled data as the input and make argmax [formula]= t.”),  the examiner notes the adversarial networks are able to use “unlabeled data” because the target is one classifier which is performing a “one versus rest” classification on inputs of digits “0” to “9”, and further teaches “model class label.”).
  Mahasseni and Baluja are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mahasseni to incorporate the teaching of Baluja Baluja [pg.1, Col 2, lines 27-30] to optimize the training of a classifier which is trained explicitly to understand the classifying capabilities of another classifying network.
Windasari teaches that distinguish between inputs accurately classified only by the trained recurrent neural network-based classifier and remaining inputs in the decision training set. ([pg. 3, Table III] “Confusion Matrix”, the True Positive, True Negative, False Positive, and False Negative results for a classification model are presented.”, the examiner notes that a person having ordinary skill in the art would be able to generate a confusion matrix with the recurrent and non-recurrent classifier results and arrange the data into training data sets.  
Mahasseni, Baluja and Windasari are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Baluja to incorporate the teaching of Windasari to organize the classification results of various networks using confusion matrices in order to develop training data sets as per Windasari [pg. 3, Table III] to improve the organization and understanding of classifier results.
Arlot teaches that distinguish between inputs accurately classified only by the trained recurrent neural network-based classifier and remaining inputs in the decision training set. ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”   The examiner notes that “to split data, once or several times” teaches the data subdivision “inputs accurately classified only by the trained recurrent neural network-based classifier and remain inputs in the decision training set.”)
Mahasseni, Baluja, Windasari and Arlot are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, and Windasari to incorporate the teaching of Arlot to generate a training data set using cross validation data splitting methods as per Arlot [pg. 3, lines 10-15] to have a method of evaluating the training of a particular model.
Regarding claim 7
The combination of Mahasseni, Baluja, Windasari, and Arlot, teaches claim 1.
Mahasseni teaches wherein the trained recurrent neural network-based classifier is at least three percent more accurate and four times computationally more expensive than the trained non-recurrent neural network based classifier.  ([pg. 8, Col 1, lines 8-13] “Considering the fact that we are ultimately upper-bounded by the accuracy of the full run of f [LSTM network] on the entire video (…) For a 4-fold speed up we have an accuracy reduction of only 11.5%”, The examiner notes that the less computationally expensive system is 4 times faster and 11.5% less accurate, and teaches “4 times computationally more expensive” and “at least three percent more accurate”.  Additionally, see MPEP 214405II.A, differences in frequent feature values will not support the patentability of subject matter encompassed by the prior art unless there is evidence indicating such value is critical.  See In re Aller, 220 F.2d 454, 456, 105 USPZ 233, 235, (CCPA 1955), 'three percent' and 'four times' do not have patentable weight.)
Regarding claim 8
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 7.
Mahasseni teaches wherein the trained recurrent neural network-based classifier is at least one recurrent neural network (abbreviated RNN). ([pg. 8, Col 2, line 3-5] “used an LSTM as the policy model”  The examiner notes that “LSTM” or Long Short-Term Memory, is a type of recurrent neural network, and teaches “recurrent neural network-based classifiers”).
Regarding claim 23
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Arlot teaches wherein the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier selected by the trained decision neural network-based classifier during inference are trained on a combination of the training set and the validation set.  ([pg. 53, lines 8-10] “In brief, CV consists in averaging several hold-out estimators of the risk corresponding to different data splits.”  The examiner notes “different data splits” teaches “a combination of the training set and the validate set” because originally the full dataset was split into training and validation set.  Now by splitting the data into a different combination, it is essentially cross-validation.)
Mahasseni, Baluja, Windasari and Arlot are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, and Windasari to incorporate the teaching of Arlot to generate a training data set using cross validation data methods as per Arlot [pg. 53, lines 8-10] to improve the training and efficacy evaluation of a machine learning model.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Weber (US Pat No. 9224386).
 Regarding claim 2
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Arlot teaches inputs accurately classified by both the trained non-recurrent and recurrent neural network-based classifiers. ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”   The examiner notes that “to split data, once or several times” teaches the data subdivision “inputs accurately classified by both the trained non-recurrent and recurrent neural network-based classifiers.”)
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the remaining inputs include inputs inaccurately classified by the trained recurrent neural network-based classifier.
Weber teaches wherein the remaining inputs include inputs inaccurately classified by the trained recurrent neural network-based classifier, ([Abstract] “The probabilities can be used to generate erroneous transcriptions in language model training corpora,” the examiner notes that “erroneous transcriptions” are training data examples of incorrect machine learning-based audio transcriptions and teach “inputs inaccurately classified by the trained recurrent neural network-based classifier”, 
Mahasseni, Baluja, Windasari, Arlot and Weber are analogous art because they are from the same field of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Weber to use incorrectly classified data examples within training data sets as per Weber [Abstract] to have a training data set that is reflective of the range of possible input data the classifier will be expected to correctly label.
Claims 3, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in view of Vo et al. (hereafter Vo), “Multi-channel LSTM-CNN model for Vietnamese sentiment analysis”, and in further view of  Ganguly et al. (hereafter Ganguly), (US 2014/0156568).
Regarding claim 3
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
	Windasari teaches generate a confusion matrix based on the trained non-recurrent and recurrent neural network-based classifiers' performance ([pg. 3, Table III] “Confusion Matrix”, the True Positive, True Negative, False Positive, and False Negative results for a classification model are presented.”, the examiner notes that a person having ordinary skill in the art would be able to generate a confusion matrix given the recurrent and non-recurrent classifier results),  
label the validation inputs in the first subset with a first model class label identifying the trained recurrent neural network-based classifier, label the validation inputs in the second subset with a second model class label identifying the trained non-recurrent neural network-based classifier, ([pg. 3, Col 2, lines 25-29] “Accuracy testing in this research is using confusion matrix method.  The confusion matrix method forms a matrix containing four conditions: “True Positive (TP)”, “True Negative (TN)”, “False Positive (FP)”, and “False Negative (FN)” as shown in Table 3.”, the examiner notes at the time of classification a person having ordinary skill in the art would be able to log / track for each output the classifier used to process the associated input, the result (accurate / inaccurate), and by using a confusion matrix and cross validation scheme, generate one or more training and validation sets and subsets for training of a decision neural network.  Logging this data teaches labeling the validation inputs in the first and second subsets). 
Mahasseni and Baluja and Windasari are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Baluja to incorporate the teaching of Windasari to organize the classification results of various networks using confusion matrices in order to develop training data sets as per Windasari ([pg. 3, Col 2, lines 25-29] to improve the efficiency of generating classifier output training data sets for a decision neural network to select a recurrent or a non-recurrent neural network to process input data.
Arlot teaches use the trained non-recurrent and recurrent neural network-based classifiers to perform the machine classification task on a validation set, the validation set comprising validation inputs annotated with the task class labels; ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”, the examiner notes the “the remaining part (the validation sample) is used for estimating the risk of the algorithm” teaches “use the trained non-recurrent and recurrent neural network-based classifiers to perform the machine classification task on a validation set” because “estimating the risk of the algorithm” describes the tasks of tabulating the results of classification and analyzing the results of one or more algorithms, usually with confusion matrices, and “the validation sample” teaches “the validation set comprising validation inputs annotated with the task class labels” because the validation set is a portion of an entire labeled data set , as taught by Vo, that may be segmented one or many different ways into training and validation sets for cross validation purposes.)
and use the confusion matrix to identify a first subset of validation inputs accurately inferred only by the trained recurrent neural network-based classifier and a second subset of validation inputs comprising validation inputs not in the first subset; ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”, the examiner notes “to split data, once or several times” teaches “use the confusion matrix to identify a first subset of validation inputs accurately inferred only by the trained recurrent neural network-based classifier and a second subset of validation inputs comprising validation inputs not in the first subset;” because a person having ordinary skill in the art would understand how to use the classification results presented in a confusion matrix to split a data set into any number of training and validation data sets using a cross validation scheme.)
and store the model-class-labeled validation inputs in the first and second subsets as the decision training set; ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”  The examiner notes that “split data, once or several times” has as part of the task of splitting the data into training and validation sections the act of storing the various permeations for further use, and so teaches “store the model-class-labeled validation inputs in the first and second subsets as the decision training set”).
Mahasseni, Baluja, Windasari, and Arlot are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, and Windasari to incorporate the teaching of Arlot to generate a training data set using cross validation data methods as per Arlot [pg. 3, lines 10-15] to improve the training and efficacy evaluation of a machine learning model.
Mahasseni, Baluja, Windasari, and Arlot do not teach train the non-recurrent and recurrent neural network-based classifiers to perform the machine classification task using a training set, the training set comprising training inputs annotated with task class labels defined for the machine classification task; and train the decision neural network-based classifier using the decision training set to output probabilities for the first and second model class labels on an input-by-input basis, the output probabilities specifying respective likelihoods of selecting the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier.
Vo teaches train ([pg. 4, Col 1, lines 33-36] “In the training phase, the algorithm for first-order gradient-based optimization AdaMax is used to learn model parameters.”, the examiner notes “in the training phase” teaches “train”),
the non-recurrent and recurrent neural network-based classifiers to perform the machine classification task ([pg. 1, Col 2, lines 7-8] “We propose a multi-channel LSTM-CNN model for Vietnamese sentiment analysis.”, the examiner notes “LSTM-CNN model” teaches “non-recurrent and recurrent neural network-based classifiers”, and “Vietnamese sentiment analysis” teaches “machine classification task”).
using a training set, the training set comprising training inputs annotated with task class labels defined for the machine classification task;, (pg. 1, Col 2, lines 1-3) “In VS dataset, we collected 17,500 reviews/comments from Vietnamese e-commercial sites (i.e. TihnTe.vn, Tiki.vn, etc.) and labeled for positive/negative/neutral by three annotators.”, the examiner notes that “VS dataset” teaches “training set”, and “17,500 review/comments (…) labeled for positive/negative/neutral by three annotators” teaches “training inputs annotated with task class labels defined for the machine classification task” because positive/negative/neutral labels are the labels the machine learning classifier was trained to apply to input data.”)
Mahasseni, Baluja, Windasari, Arlot and Vo are analogous art because they are in similar fields of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Vo to use labeled data to identify and Vo to optimize the training of a CNN and a LSTM neural network to process input data for Semantic Analysis.
Ganguly teaches train ([Abstract] “the best model can change based on most recent training performance results”, the examiner notes “training performance results” teaches “training”),
the decision neural network-based classifier using the decision training set to output probabilities for the first and second model class labels on an input-by-input basis, ([para 0018] “the decision component 110 is configured to predict or infer the likelihood, or probability, of an outcome given some input as a function of a predictive model or a set of predictive algorithms.  The predictive model, which can employ any number of statistical or machine learning techniques (…) neural networks”.  The examiner notes “The predictive model, which may employ any number of statistical or machine learning techniques (…) neural networks” teaches “decision neural network-based classifier”, and “the decision component 110 is configured to predict or infer the likelihood, or probability” teaches “output probabilities”, and “given some input” teaches “an input-by-input basis”).
the output probabilities specifying respective likelihoods of selecting the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier.  ([para 0018] “the decision component 110 is configured to predict or infer the likelihood, or probability, of an outcome given some input as a function of a predictive model or a set of predictive algorithms.  The predictive model, which can employ any number of statistical or machine learning techniques (…) neural networks”, and ([para 0022] “the model selection component 240 can be configured to identify a set of candidate predictive models for a type of input and analyze the performance of the set of candidate models.  The model that outperforms other models can subsequently be selected”, the examiner notes “The model that outperforms other models can subsequently be selected” teaches “likelihood of selecting the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier” because the “decision component 110 is configured to predict or infer the likelihood, or probability of an outcome given some input as a function of a predictive model”, which shows a separate neural network predicting the “outcome” of “a predictive model” evaluating data.
Mahasseni, Baluja, Windasari, Arlot, Vo and Ganguly are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, and Vo to incorporate the teaching of Ganguly to use predictive analytics to evaluate the performance metrics of models for predictive data processing as per Ganguly [para 0018]  to improve the model evaluation process for a decision component in selecting between predictive models.
Regarding claim 20
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Windasari teaches wherein the confusion matrix identifies at least one of: ([pg. 3, Table III] “Confusion Matrix”, the True Positive, True Negative, False Positive, and False Negative results for a classification model are presented.”, the examiner notes that a person having ordinary skill in the art would be able to generate a confusion matrix given the recurrent and non-recurrent classifier results). 
teaches validation inputs accurately classified by both the trained recurrent neural network- based classifier and the trained non-recurrent neural network-based classifier; 
validation inputs inaccurately classified by both the trained recurrent neural network- based classifier and the trained non-recurrent neural network-based classifier; 
validation inputs accurately classified by the trained non-recurrent neural network- based classifier but inaccurately classified by the trained recurrent neural network-based classifier; 
and validation inputs accurately classified by the trained recurrent neural network-based classifier but inaccurately classified by the trained non-recurrent neural network-based classifier. ([pg. 3, Col 2, lines 25-29) “Accuracy testing in this research is using confusion matrix method.  The confusion matrix method forms a matrix containing four conditions: “True Positive (TP)”, “True Negative (TN)”, “False Positive (FP)”, and “False Negative (FN)” as shown in Table 3.”, the examiner notes at the time of classification a person having ordinary skill in the art would be able to log / track for each output the classifier used to process the associated input, the result (accurate / inaccurate), and by using a confusion matrix and cross validation scheme, generate one or more training and validation sets and subsets for training of a decision neural network.  Logging this data teaches labeling the validation inputs in the first and second subsets). 
Mahasseni and Baluja and Windasari are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Baluja to Windasari to organize the classification results of various networks using confusion matrices in order to develop training data sets as per Windasari to improve the efficiency of generating classifier output training data sets for a decision neural network to select a recurrent or a non-recurrent neural network to process input data.
Regarding claim 21
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Windasari teaches wherein the first subset includes validation inputs accurately classified by the trained recurrent neural network-based classifier but inaccurately classified by the trained non-recurrent neural network-based classifier. ([pg. 3, Col 2, lines 25-29] “Accuracy testing in this research is using confusion matrix method.  The confusion matrix method forms a matrix containing four conditions: “True Positive (TP)”, “True Negative (TN)”, “False Positive (FP)”, and “False Negative (FN)” as shown in Table 3.”, the examiner notes at the time of classification a person having ordinary skill in the art would be able to log / track for each output the classifier used to process the associated input, the result (accurate / inaccurate), and by using a confusion matrix and cross validation scheme, generate one or more training and validation sets and subsets for training of a decision neural network.  Logging this data teaches labeling the validation inputs in the first and second subsets). 
Mahasseni and Baluja and Windasari are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Baluja to incorporate the teaching of Windasari to organize the classification results of various networks Windasari [pg. 3, Col 2, lines 25-29]  to improve the efficiency of generating classifier output training data sets for a decision neural network to select a recurrent or a non-recurrent neural network to process input data.
Regarding claim 22
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Windasari teaches wherein the second subset includes at least one of: validation inputs accurately classified by both the trained recurrent neural network- based classifier and the trained non-recurrent neural network-based classifier; validation inputs inaccurately classified by both the trained recurrent neural network- based classifier and the trained non-recurrent neural network-based classifier; and validation inputs accurately classified by the trained non-recurrent neural network- based classifier but inaccurately classified by the trained recurrent neural network-based classifier.  ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”, the examiner notes the “the remaining part (the validation sample) is used for estimating the risk of the algorithm” teaches “use the trained non-recurrent and recurrent neural network-based classifiers to perform the machine classification task on a validation set” because “estimating the risk of the algorithm” describes the tasks of tabulating the results of classification and analyzing the results of one or more algorithms, usually with confusion matrices, and “the validation sample” teaches “the validation set comprising validation inputs annotated with the task class labels” because the validation set is a portion of an entire labeled data set , as taught by Vo, that may be segmented one or many different ways into training and validation sets for cross validation purposes.)
Mahasseni and Baluja and Windasari are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Baluja to incorporate the teaching of Windasari to organize the classification results of various networks using confusion matrices in order to develop training data sets as per Windasari [pg. 3, Col 2, lines 25-29]  to improve the efficiency of generating classifier output training data sets for a decision neural network to select a recurrent or a non-recurrent neural network to process input data.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Ganguly.
Regarding claim 4
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Mahasseni teaches for a given input, ([Abstract] “our new budget-aware framework that learns to optimally select a small subset of frames for pixelwise labeling by a CNN”, the examiner notes “select a small subset of frames” teaches “for a given input”)
during inference, ([pg. 4, Col 1, Lines 17-19] “Note that the local observation input to the policy at each step only captures part of the global state of the inference process”, the examiner notes “inference process” teaches “during inference”)
the trained decision neural network-based classifier, perform the machine classification task on the given input using either the trained recurrent neural network-based classifier or the trained non-recurrent neural network-based classifier.  ([Abstract] “our new budget-aware framework that learns to optimally select a small subset of frames for pixelwise labeling by a CNN, and then efficiently interpolates the obtained segmentations to yet unprocessed frames. (…) and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames.  For training the LSTM, we develop a policy gradient reinforcement-learning approach”, and ([pg. 4, Col 2, lines 32-33] “The goal is to jointly learn the parameters of (pi) [LSTM] and g [CNN]”, the examiner notes “for training the LSTM” teaches “trained decision neural network-based classifier”, and “learns to optimally select a small subset of frames for pixel wise labeling” teaches “perform the machine classification task”, and “jointly learn the parameters (…) g” and “CNN” teaches “trained non-recurrent neural network-based classifier”, and “for training the LSTM” teaches trained recurrent neural network-based classifier”).
Mahasseni, Baluja, Windasari, and Arlot does not teach “based on output probabilities”.
Ganguly teaches based on output probabilities  ([para 0018] “the decision component 110 is configured to predict or infer the likelihood, or probability, of an outcome given some input as a function of a predictive model or a set of predictive algorithms.”  The examiner notes “the decision component 110 is configured to predict or infer the likelihood, or probability” teaches “output probabilities”,).
Mahasseni, Baluja, Windasari, Arlot and Ganguly are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Ganguly to use predictive analytics to evaluate the performance metrics of models for predictive data processing as per Ganguly to improve the model evaluation process for a decision component in selecting between predictive models.
Claims 5 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, Windasari, Arlot, Vo, Ganguly, and in further view of Achin et al. (hereafter Achin) (US Pat. No. 9489630).
Regarding claim 5
The combination of Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly teaches claim 3.
Mahasseni teaches further configured to select the trained recurrent neural network-based classifier ([Abstract] “our new budget-aware framework that learns to optimally select a small subset of frames for pixelwise labeling by a CNN”, the examiner notes “by a CNN” teaches “select the trained non-recurrent neural network-based classifier”)
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly does not teach when the output probability of the first model class label is higher than that of the second model class label.
Achin teaches when the output probability of the first model class label is higher than that of the second model class label. ([Col 5, lines 35-43] “determining the suitability of the plurality of predictive modeling procedures comprises assigning suitability scores to the respective modeling procedures,(…) and where in selecting at least a subset of the predictive modeling procedures comprises selecting approximately a specific fraction of the predictive modeling procedures having highest suitability scores.” The examiner notes that “selecting approximately a specific fraction of the predictive modeling procedures having highest suitability scores” teaches “select (…) when the output probability of the first model class label is higher than that of the second model class label”).
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly and Achin are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly to incorporate the teaching of Achin to select a model for Achin [Col 5, lines 35-43] to improve the quality of data processing by selecting the model with the highest suitability as determined by a suitability scoring method.
Regarding claim 6
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Mahasseni teaches further configured to select the trained non-recurrent neural network-based classifier ([Abstract] “our new budget-aware framework that learns to optimally select a small subset of frames for pixelwise labeling by a CNN”, the examiner notes “by a CNN” teaches “select the trained non-recurrent neural network-based classifier”)
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly do not teach the output probability of the second model class label is higher than that of the first model class label.
Achin teaches when the output probability of the second model class label is higher than that of the first model class label. ([Col 5, lines 35-43] “determining the suitability of the plurality of predictive modeling procedures comprises assigning suitability scores to the respective modeling procedures,(…) and where in selecting at least a subset of the predictive modeling procedures comprises selecting approximately a specific fraction of the predictive modeling procedures having highest suitability scores.” The examiner notes that “selecting approximately a specific fraction of the predictive modeling procedures having highest suitability scores” teaches “select (…) the output probability of the second model class label is higher than that of the first model class label”).
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly and Achin are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly to incorporate the teaching of Achin to select a model for processing specific data as per Achin [Col 5, lines 35-43] to improve the quality of data .
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Sheikh et al. (hereafter Sheikh) “Learning Word Importance with the Neural-Bag-of-Words Model”.
Regarding claim 9
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 7.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the trained non-recurrent neural network-based classifier is at least one bag of words (abbreviated BoW) network.
Sheikh teaches wherein the trained non-recurrent neural network-based classifier is at least one bag of words (abbreviated BoW) network. ([pg. 1, Col 1, line 1] “The Neural Bag-of-Words (NBOW) model performs classification”).
Mahasseni, Baluja, Windasari, Arlot, and Sheikh are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot to incorporate the teaching of Sheikh to incorporate a Neural Bag-of-Words model as per Sheikh [pg. 1, Col 1, line 1] to improve computational speed and achieve useful classification results in Natural Language Processing Machine Learning tasks.
Claims 10-13 are rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Dos Santos et al. (hereafter Dos Santos) “Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts”.
Regarding claim 10
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 7.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the trained non-recurrent neural network-based classifier is at least one continuous bag of words (abbreviated CBoW) network.
Dos Santos teaches wherein the trained non-recurrent neural network-based classifier is at least one continuous bag of words (abbreviated CBoW) network. ([pg. 6, lines 5-6] “we perform unsupervised learning of word-level embeddings using the word2vec tool, which implements the continuous bag-of-words and skip-gram”).
Mahasseni, Baluja, Windasari, Arlot, and Dos Santos are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot to incorporate the teaching of Dos Santos to incorporate a continuous bag-of-words model as per Dos Santos [pg. 6, lines 5-6] to improve classification results in Natural Language Processing Machine Learning tasks by using the continuous bag-of-words CNN which 
Regarding claim 11
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 7.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the trained non-recurrent neural network-based classifier is at least one skip-gram network.
Dos Santos teaches wherein the trained non-recurrent neural network-based classifier is at least one skip-gram network. ([pg. 6, lines 5-6] “we perform unsupervised learning of word-level embeddings using the word2vec tool, which implements the continuous bag-of-words and skip-gram”.)
Mahasseni, Baluja, Windasari, Arlot, and Dos Santos are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot to incorporate the teaching of Dos Santos to incorporate a skip-gram as per Dos Santos [pg. 6, lines 5-6] to improve classification results by integrating the skip-gram 
Regarding claim 12
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 7.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the trained non-recurrent neural network-based classifier is at least one convolutional neural network (abbreviated CNN).
Dos Santos teaches wherein the trained non-recurrent neural network-based classifier is at least one convolutional neural network (abbreviated CNN). ([Title] “Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts”).
Mahasseni, Baluja, Windasari, Arlot, and Dos Santos are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot to incorporate the teaching of Dos Santos to incorporate a skip-gram as per Dos Santos [pg. 6, lines 5-6] to improve classification results by integrating the skip-gram 
Regarding claim 13
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 8.
Mahasseni teaches wherein the RNN is a long short-term memory (abbreviated LSTM) network. ([pg. 8, Col 2, line 4] “frame selection as a Markov Decision Process, and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames”).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view Wang et al. (hereafter Wang) “Combination of Convolutional and Recurrent Neural Network Sentiment Analysis of Short Texts”.
Regarding claim 14
The combination of Mahasseni, Baluja, Windasari, and Arlot claim 8.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the RNN is a gated recurrent unit (abbreviated GRU) network.
Wang teaches wherein the RNN is a gated recurrent unit (abbreviated GRU) network. ([pg. 7, lines 9-10] “CNN-GRU-word2vec: A model with pre-trained vectors from word2vec, max pooling and GRU recurrent unit.”)
Mahasseni, Baluja, Windasari, Arlot, and Wang are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Wang to use a GRU in a classifier network as per Wang [pg. 7, lines 9-10] to improve the classifier network’s ability to incorporate long term dependencies of different time scales in a classification decision.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view Bradbury et al. (hereafter Bradbury) “Quasi Recurrent Neural Networks”.
Regarding claim 15
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 8.
Mahasseni, Baluja, Windasari, and Arlot do not teach wherein the RNN is a quasi-recurrent neural network (abbreviated QRNN).
Bradbury teaches wherein the RNN is a quasi-recurrent neural network (abbreviated QRNN). ([Abstract, line 3-4] “We introduce quasi-recurrent neural networks (QRNNs)”.)
Mahasseni, Baluja, Windasari, Arlot, and Bradbury are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Bradbury to use a QRNN in a classifier network as per  Bradbury [Abstract, line 3-4] to increase the network’s parallelism leading to decreased training and testing time.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, in view of Vo, in view of Ganguly , and in Kohavi et al. (hereafter Kohavi) “A study of cross-validation and bootstrap for accuracy estimation and model selection”.
Regarding claim 16
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly do not teach wherein the training set and the validation set are part of a single data set that is subjected to held-out splitting to create the training set and the validation set.
Kohavi teaches wherein the training set and the validation set are part of a single data set that is subjected to held-out splitting to create the training set and the validation set.  ([pg. 2, col 1, lines 31-37]) “2.1 Holdout: The holdout method, sometimes called test sample estimation, partitions the data into two mutually exclusive subsets called a training set and a test set, or holdout set.”,  The examiner notes “the holdout method” teaches “held-out splitting”, and “partitions the data into two” teaches “single data set”, and “two mutually exclusive subsets called a training set and a test set, or holdout set” teaches “the training set and the validation set”).
Mahasseni, Baluja, Windasari, Arlot, Vo, Ganguly and Kohavi are analogous art because they are focused in the field of predictive analytics using Machine Learning 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly to incorporate the teaching of Kohavi to generate a training Kohavi [pg. 2, col 1, lines 31-37] to improve the training and evaluation of models in machine learning.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, in view of Vo, in view of Ganguly, and in further view Hockey et al. (hereafter Hockey) “Comparison of Grammar-Based and Statistical Language Models trained on the Same Data”.
Regarding claim 17
The combination of Mahasseni, Baluja, Windasari, and Arlot, Vo, and Ganguly teaches claim 3.
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly do not teach wherein the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier are trained separately on the training set.
Hockey teaches wherein the trained recurrent neural network-based classifier and the trained non-recurrent neural network-based classifier are trained separately on the training set. ([pg. 1, Col 1, lines 9-11] “We construct L-PCFG-based and n-gram language models from the same corpus for comparison”
Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly and Hockey are analogous art because they are focused in the field of predictive analytics using Machine Learning 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Vo, and Ganguly to incorporate the teaching of Hockey to train different Hockey ([pg. 1, Col 1, lines 9-11] to improve the one-to-one comparison of different models by training each on the same data.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Wang.
Regarding claim 18
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Mahasseni, Baluja, Windasari, Arlot do not teach wherein the machine classification task is sentiment classification and the inputs are sentences.
Wang teaches wherein the machine classification task is sentiment classification and the inputs are sentences.  ([Title] “Sentiment Analysis” and [pg. 2, lines 4-5] “We develop an end-to-end and bottom up algorithm to effectively model sentence representation.”  The examiner notes “Sentiment Analysis” teaches “sentiment classification” and “sentence representation” teaches “the inputs are sentences”)
Mahasseni, Baluja, Windasari, Arlot, and Wang are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Wang to perform sentiment analysis on sentences as per Wang [Title] and [pg. 2, lines 4-5] to expand the machine learning tasks and input parameters to perform sentiment analysis at the sentence level.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, in view of Wang, and in view of Vo et al. (hereafter Vo), “Multi-channel LSTM-CNN model for Vietnamese sentiment analysis”, and in Kucuktunc et al. (hereafter Kucuktunc) “A large-Scale Sentiment Analysis for Yahoo! Answers)”.
Regarding claim 19
The combination of Mahasseni, Baluja, Windasari, Arlot and Wang teaches claim 18.
Mahasseni, Baluja, Windasari, Arlot and Wang do not teach wherein the task class labels are at least one of positive sentiment, negative sentiment, very positive sentiment, very negative sentiment, somewhat positive sentiment, somewhat negative sentiment, or neutral sentiment.
Vo teaches annotated with at least one of positive sentiment, negative sentiment, or neutral sentiment labels ([pg. 1, Col 2, lines 1-3] “In VS dataset, we collected 17,500 reviews/comments from Vietnamese e-commercial sites (i.e. TihnTe.vn, Tiki.vn, etc.) and labeled for positive/negative/neutral by three annotators.”).
Mahasseni, Baluja, Windasari, Arlot, Wang and Vo are analogous art because they are in similar fields of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, and Wang to incorporate the teaching of Vo to label training data with positive, negative, and neutral sentiment as per Vo to optimize the labeling and organization of sentiment analysis training data.
Kucuktunc teaches very positive sentiment, very negative sentiment, somewhat positive sentiment, somewhat negative sentiment, or neutral sentiment. ([pg. 3, Col 2, lines 9-11] “Positive sentiment strength scores range from +1 (not positive) to +5 (extremely positive.  Similarly, negative sentiment strength scores range from -1 to -5.”  The examiner notes “Positive sentiment strength scores (…) negative sentiment strength scores” teach “very positive sentiment, very negative sentiment, somewhat positive sentiment, somewhat negative sentiment,”)
Mahasseni, Baluja, Windasari, Arlot, Wang, Vo, and Kucuktunc are analogous art because they are both focused in the field of communication network security. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Wang, and Vo to incorporate the teaching of Kucuktunc to label the training data with a sentiment rating as per Kucuktunc [pg. 3, Col 2, lines 9-11] to improve the quality of the training data by having users or experts label according to the content sentiment on a scale of positivity or negativity.
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in view of Nam et al.  (hereafter Nam) “Learning Multi-Domain Convolutional Neural Networks for Visual Tracking”, and in further view of Ganguly.
Regarding claim 24
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Mahasseni teaches wherein the decision neural network-based classifier ([pg. 8, Col 2, line 4] “frame selection as a Markov Decision Process, and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames”. 
Mahasseni, Baluja, Windasari, Arlot do not teach comprises the trained non-recurrent neural network-based classifier trained on the training set, with an original classification layer 
Nam teaches comprises the trained non-recurrent neural network-based classifier trained on the training set, with an original classification layer ablated, one or more new fully-connected layers, and a new classification layer, ([Abstract]“Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation.  (…) We train each domain in the network iteratively to obtain generic target representations in the shared layers.  When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer.”, the examiner notes CNN is a non-recurrent neural network, and “pretrained CNN” teaches “trained non-recurrent neural network”, and [pg. 1, Col 2, 40-42]-[pg. 2, Col 1, lines 1-10] “When a test sequence is given, all the existing branches of binary classification layers, which were used in the training phase, are removed and a new single branch is constructed to compute target scores in the test sequence.  The new classification layer and the fully connected layers”, the examiner notes “training phase” teaches “trained”, and “all the existing branches of binary classification layers (…) are removed” teaches “an original classification layer ablated”, and “the fully connected layers” teaches “one or more new fully-connected layers”, and “The new classification layer” teaches “a new classification layer”).
Mahasseni, Baluja, Windasari, Arlot and Nam are analogous art because they are all focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Nam to reorganize a neural network classifier as per Nam [Abstract] to improve the architecture of a neural network to align with a new task.
Ganguly teaches and a new classification layer that produces output probabilities for the first and second model class labels. ([para 0018] “the decision component 110 is configured to predict or infer the likelihood, or probability, of an outcome given some input as a function of a predictive model or a set of predictive algorithms.  The predictive model, which can employ any number of statistical or machine learning techniques (…) neural networks”, and ([para 0022] “the model selection component 240 can be configured to identify a set of candidate predictive models for a type of input and analyze the performance of the set of candidate models.  The model that outperforms other models can subsequently be selected”,  the examiner notes “the decision component 110 is configured to predict” teaches “a new classification layer that produces output probabilities” because it is a component that may be added to an already existing structure, and “probability, of an outcome given some input as a function of a predictive model” teaches “output probabilities”, and “the model that outperforms other models can subsequently be selected” teaches “first and second model class labels”).
Mahasseni, Baluja, Windasari, Arlot, Nam and Ganguly are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, and Nam to incorporate the teaching of Ganguly to use predictive analytics to evaluate the performance metrics of models for predictive data processing as per Ganguly to .
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and view of Nam, and in view of Ganguly, and in further view of Nogueira et al.  (hereafter Nogueira) “Towards better exploiting convolutional neural networks for remote sensing scene classification”.
Regarding claim 25
The combination of Mahasseni, Baluja, Windasari, Arlot, Nam, and Ganguly teaches claim 24.
Mahasseni, Baluja, Windasari, Arlot, Nam, and Ganguly do not teach further configured to, during training, back propagate gradients only for the fully-connected layers and the new classification layer and keeping weights of the trained non-recurrent neural network-based classifier fixed.
Nogueira teaches further configured to, during training, back propagate gradients only for the fully-connected layers and the new classification layer and keeping weights of the trained non-recurrent neural network-based classifier fixed. [pg. 6, Figure 2.] “in another option (highlighted in green), weights of initial layers can be frozen and only final layers are turned.”  The examiner notes convolution layers from the figure below teach “non-recurrent neural network-based classifier”, and “final layers” which are designated in the figure as fully connected and classification layers teach “fully-connected layers and new classification layer”.  The examiner notes in the figure below from Nogueira the initial layers are the convolution layers in red are indicated for freezing because the weights are transferred, which teaches “keeping weights of the trained non-recurrent neural network-based classifier fixed”, while the fully connected layers and the classification layer weights are not frozen, which teaches “back propagate gradients only for the fully-connected layers and the new classification layer”.  

    PNG
    media_image1.png
    511
    800
    media_image1.png
    Greyscale

Mahasseni, Baluja, Windasari, Arlot, Nam, Ganguly and Nogueira are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot, Nam, and Ganguly to incorporate the teaching of Nogueira to restrict the training of specific layers while training others as per Nogueira [pg. 6, Figure 2.]  to allow for the integration of pretrained models while reorganizing network architecture for new tasks.
Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in view of Ganguly, and in view of Nam, and in further view of Sheikh et al.  (hereafter Sheikh) “Learning Word Importance with the Neural-Bag-of-Words Model”.
Regarding claim 26
The combination of Mahasseni, Baluja, Windasari, Arlot, Nam, and Ganguly teaches claim 24.
Mahasseni, Baluja, Windasari, Arlot, Nam, and Ganguly  do not teach wherein the trained non-recurrent neural network-based classifier is at least one bag of words (abbreviated BoW) network.
Sheikh teaches wherein the trained non-recurrent neural network-based classifier is at least one bag of words (abbreviated BoW) network. ([pg. 1, Col 1, line 1] “The Neural Bag-of-Words (NBOW) model performs classification”).
Mahasseni, Baluja, Windasari, Arlot, and Sheikh are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, Arlot to incorporate the teaching of Sheikh to incorporate a Neural Bag-of-Words model as per Sheikh [pg. 1, Col 1, line 1] to improve computational speed and achieve useful classification results in Natural Language Processing Machine Learning tasks.
Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, and in further view of Yin et al.  (hereafter Yin) “Comparative study of CNN and RNN for Natural Language Processing”, in view of Collobert et al.  (hereafter Collobert) “A unified architecture for natural language processing:  view of Koo et al.  (hereafter Koo) “Simple Semi-dependency parsing”.
Regarding claim 27
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Mahasseni, Baluja, Windasari, Arlot do not teach wherein the machine classification task is at least one of: part-of-speech (abbreviated POS) tagging , chunking, dependency parsing, semantic relatedness, textual entailment.
Yin teaches part-of-speech (abbreviated POS) tagging, textual entailment.  ([pg. 3, Col 1, lines 23-47,- Col 2, lines 1-35] “Tasks, Sentiment Classification, Relation Classification, Textual Entailment, Answer Selection, Part-of-Speech Tagging”)
Mahasseni, Baluja, Windasari, Arlot, and Yin are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Yin to incorporate the methods to perform the Natural Language Processing tasks as per Yin [pg. 3, Col 1, lines 23-47,- Col 2, lines 1-35]  .
Collobert teaches chunking, semantic relatedness, ([pg. 2a, Col 1, lines 13-40] “Part-of-speech tagging, chunking, semantic related words”).
Mahasseni, Baluja, Windasari, Arlot, and Collobert are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Collobert to incorporate the methods to perform the Natural Language Processing tasks as per Collobert [pg. 2a, Col 1, lines 13-40]  to expand the task capabilities to include Part-of-speech tagging, chunking, semantic related words.
Koo teaches dependency parsing. (Title) “Simple Semi-supervised Dependency Parsing”).
Mahasseni, Baluja, Windasari, Arlot, and Koo are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Koo to incorporate the methods to perform the Natural Language Processing tasks as per Koo [Title] to expand the task capabilities to include Dependency Parsing.
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, in view of Deng et al.  (hereafter Deng) “Machine Learning Paradigms for Speech Recognition: An Overview”, in view of Asinimovich et al.  (hereafter Asinimovich) (US Pat. No. 9772998), in view of Gambhir et al.  (hereafter Ghambhir) “Recent automatic text summarization techniques: a survey”, in view of Andreas et al.  (hereafter Andreas) “Learning to Compose Neural Networks for Question Answering”, in view of You et al.  (hereafter You) “Image captioning with semantic attention”, in further view of Wang et al.  (hereafter Wang (2)) “Tacotron: A fully end-to-end Text-to-Speech synthesis model.
Regarding claim 28
The combination of Mahasseni, Baluja, Windasari, and Arlot teaches claim 1.
Mahasseni, Baluja, Windasari, Arlot do not teach wherein the machine classification task is at least one of: speech recognition, machine translation, text summarization, question answering, image captioning, text-to-speech (abbreviated TTS) synthesis.
Deng teaches text-to-speech (abbreviated TTS) synthesis. ([pg. 1, Col 1, line 1] “Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques”).
Mahasseni, Baluja, Windasari, Arlot, and Deng are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Deng to incorporate the methods to perform Deng [pg. 1, Col 1, line 1] to expand the task capabilities to include text-to-speech (abbreviated TTS) synthesis.
Asinimovich teaches machine translation, (Abstract) “The preferred embodiments provide an automated machine translation from one language to another.”)
Mahasseni, Baluja, Windasari, Arlot, and Asinimovich are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Asinimovich to incorporate the methods to perform the Natural Language Processing tasks as per Asinimovich [Abstract] to expand the task capabilities to include an automated machine translation from one language to another.
Gambhir teaches text summarization, ([pg. 1, line 3-4] “there is growing interest among the research community for developing new approaches to automatically summarize the text”
Mahasseni, Baluja, Windasari, Arlot, and Gambhir are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Gambhir to incorporate the methods to Gambhir [pg. 1, line 3-4] to expand the task capabilities to include the ability to automatically summarize the text.
Andreas teaches question answering, ([pg. 1, Col 1, lines 1-3] “We describe a question answering model that applies to both images and structured knowledge bases.”
Mahasseni, Baluja, Windasari, Arlot, and Andreas are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Andreas to incorporate the methods to perform the Natural Language Processing tasks as per Andreas [pg. 1, Col 1, lines 1-3] to expand the task capabilities to include a question answering model.
You teaches image captioning, ([pg. 2, Col 1, lines 21-23] “we propose a new image captioning approach that combines the top-down and bottom-up approaches through a semantic attention model.”
Mahasseni, Baluja, Windasari, Arlot, and You are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of You to incorporate the methods to perform You [pg. 2, Col 1, lines 21-23] to expand the task capabilities to include image captioning.
Wang (2) teaches text-to-speech (abbreviated TTS) synthesis.  ([pg. 1, lines 4-5] “we present Tacotron, an end-to-end generative text-to-speech model that synthesizes speech directly from characters.”
Mahasseni, Baluja, Windasari, Arlot, and Wang (2) are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Wang (2) to incorporate the methods to perform the Natural Language Processing tasks as per Wang (2) [pg. 1, lines 4-5] to expand the task capabilities to include synthesizing speech directly from characters.
Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Wang, and in view of Barry et al. (hereafter Barry)  “Sentiment Analysis of Online Review ”, in view of Sheikh,  in view of Windasari, in view of Arlot, in view of Vo, in view of Nam, in view of Ganguly.
Regarding claim 29
Mahasseni teaches A neural network-based decision system ([pg. 8, col. 2, lines 4] “frame selection as a Markov Decision Process, and specify a Long Short-Term Memory (LSTM) network to model a policy for selecting the frames”) 
with processors operating in parallel to efficiently perform ([pg. 7, col. 2, lines 3-4] “Experiments are performed on an Intel quad core-i7 CPU and 16GB RAM on a single Tesla k80”).
Mahasseni does not teach a sentiment classification task on a sentence using either a trained recurrent long short-term memory (abbreviated LSTM) network or a trained bag of words abbreviated BoW) network that generates a mean vector representation of a sentence by averaging token vector embeddings of the sentence, the system implementing actions comprising: generating a confusion matrix based on the trained LSTM and BoW networks' evaluation of validation sentences annotated with positive and negative sentiment labels; using the confusion matrix to identify a subset of validation sentences accurately classified only by the trained LSTM network and annotating the subset of validation sentences with a first model label identifying the trained LSTM network, annotating remaining of the validation sentences with a second model label identifying the trained BoW network, and storing the model-annotated validation sentences in a decision set; and constructing a decision system using the trained BoW network and training the decision system using the decision set to produce an output that 
Wang teaches wherein the machine classification task is sentiment classification and the inputs are sentences.  ([Title] “Sentiment Analysis” and [pg. 2, lines 4-5] “We develop an end-to-end and bottom up algorithm to effectively model sentence representation.”  The examiner notes “Sentiment Analysis” teaches “sentiment classification” and “sentence representation” teaches “the inputs are sentences”).
Mahasseni and Wang are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mahasseni to incorporate the teaching of Wang to perform sentiment analysis on sentences as per Wang [Title] and [pg. 2, lines 4-5] to expand the machine learning tasks to perform sentiment analysis at the sentence level.
Barry teaches using either a trained recurrent long short-term memory (abbreviated LSTM) network ([Title] “Sentiment Analysis of Online review using Bag-of-Words and LSTM Approaches” and [pg. 8, lines 1-2] “we run the Word2vec model to generate word embeddings on each of our datasets.  These embeddings are used as inputs to the LSTM to learn”, the examiner notes “inputs to the LSTM to learn” teaches “trained”).
use the trained LSTM network or the trained BoW network for classifying the sentence's sentiment.  ([pg. 1, lines 15]-[pg. 2, lines 1-10] “The popularity of such bag-of-words approaches is mainly due to their simplicity and efficiency, whilst having the ability to achieve very high accuracy.  Bag-of-words features are created by viewing the document as an unordered collection of words, which are then used to classify the document.  Despite their overall high success rates, there exist some downsides to using bag-of-words or n-gram approaches.  The main pitfall of such approaches is that they ignore long-range word ordering such that modifiers and their objects may be separated by many unrelated words (7).  As word order is lost, sentences with different meanings which use the same words will have similar representations.  Another key downside to using bag-of-words approaches is that they are unable to deal effectively with negation.” AND (pg. 2, lines 25-29) “the addition of word embeddings to the field of NLP has enabled practitioners to use more advanced learning algorithms which can handle sequential data as inputs such as Recurrent Neural Networks (RNNs).  An important development in the field of RNNs was the introduction of the Long Short-Term Memory (henceforth LSTM) RNN”. The examiner notes Bag-of-Words and LSTM RNN are well documented in academic literature for sentiment analysis, and choosing to use [Specifications 0036] “a combination of computationally cheap, less-accurate bag of words (BoW) model and computationally expensive, more-accurate long short-term memory (LSTM) model to perform natural processing tasks such as sentiment analysis” is not novel.  
Mahasseni, Wang, and Barry are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Wang to incorporate the teaching of Barry to perform sentiment analysis on sentences using a LSTM or bag of words network as per Barry [Title] and [pg. 8, lines 1-2] to improve classification 
Sheikh teaches a trained bag of words (abbreviated BoW) network that generates a mean vector representation of a sentence by averaging token vector embeddings of the sentence, ([pg. 1, Col 1, line 1] “The Neural Bag-of-Words (NBOW) model performs classification” and [pg. 2, Col 2, lines 33-34] “the NBOW model is trained to minimize the categorical cross-entropy loss using a stochastic gradient descent algorithm”, and [pg. 1, Col 2, line 13-15] “The NBOW model takes an average of the word vectors in the input text and performs classification”).
Mahasseni, Wang, Barry and Sheikh are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry to incorporate the teaching of Sheikh to incorporate a Neural Bag-of-Words model as per Sheikh [pg. 1, Col 1, line 1] to improve computational speed and maintain a useful level of accurate classification results in sentiment analysis of input text.
Windasari teaches generating a confusion matrix based on the trained LSTM and BoW networks' evaluation of ([pg. 3, Table III] “Confusion Matrix”, the True Positive, True Negative, False Positive, and False Negative results for a classification model are presented.”, the examiner notes that a person having ordinary skill in the art would be able to generate a confusion matrix with the LSTM and BoW classifier results.  
using the confusion matrix to identify a subset of validation sentences accurately classified only by the trained LSTM network and annotating the subset of validation sentences the examiner notes at the time of classification a person having ordinary skill in the art would be able to log / track for each output the classifier used to process the associated input, the result (accurate / inaccurate), and by using a confusion matrix and cross validation scheme, generate one or more training and validation sets and subsets for training of a decision neural network.  Logging this data teaches annotating the validation inputs in the first and second models. 
Mahasseni, Wang, Barry, Sheikh and Windasari are analogous art because they are in similar fields of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, and Sheikh to incorporate the teaching of Windasari to generate confusion matrices and annotate the classification results of various models as per Windasari [pg. 3, Table III] to improve the organization of classifier results in preparation for cross validation data splits.
Arlot teaches validation sentences ([pg. 3, lines 10-15] “Cross-validation (CV) is a popular strategy for algorithm selection.  The main idea behind CV is to split data, once or several times, for estimating the risk of each algorithm:  Part of data (the training sample) is used for training each algorithm, and the remaining part (the validation sample) is used for estimating the risk of the algorithm.”   The examiner notes that “to split data, once or several times” teaches “validation sentences” because splitting data makes a training set and validation set.”)
using the confusion matrix to identify a subset of validation sentences accurately classified only by the trained LSTM network and annotating the subset of validation sentences with a first model label identifying the trained LSTM network, annotating remaining of the validation sentences with a second model label identifying the trained BoW network, the examiner notes “to split data, once or several times” teaches “using the confusion matrix to identify a subset of validation sentences accurately classified only by the trained LSTM network and annotating the subset of validation sentences with a first model label identifying the trained LSTM network, annotating remaining of the validation sentences with a second model label identifying the trained BoW network;” because a person having ordinary skill in the art would understand how to use the classification results presented in a confusion matrix to split a data set into any number of training and validation data sets as part of a cross validation scheme.)
and storing the model-annotated validation sentences in a decision set; The examiner notes that “split data, once or several times” has, as part of the task of splitting the data into training and validation sections, the act of storing the various permeations for further use, and so teaches “storing the model-annotated validation sentences in a decision set”).
Mahasseni, Wang, Barry, Sheikh, Windasari and Arlot are analogous art because they manipulate data sets in the course of their work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, and Windasari to incorporate the teaching of Arlot to split a data set one or more ways Arlot [pg. 3, lines 10-15] to generate training and validation sets data set to improve the training and efficacy evaluation of a machine learning model.
Vo teaches annotated with positive and negative sentiment labels ([pg. 1, Col 2, lines 1-3] “In VS dataset, we collected 17,500 reviews/comments from Vietnamese e-commercial sites (i.e. TihnTe.vn, Tiki.vn, etc.) and labeled for positive/negative/neutral by three annotators.”).
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot and Vo are analogous art because they are in similar fields of endeavor, and/or are reasonably pertinent to the problem of using machine learning methods to perform their respective work. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, Windasari, and Arlot to incorporate the teaching of Vo to label training data with positive, negative, and neutral sentiment as per Vo to optimize the labeling and organization of sentiment analysis training data.
Nam teaches constructing a decision system using the trained BoW network ([Abstract] “Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation.  (…) We train each domain in the network iteratively to obtain generic target representations in the shared layers.  When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer.”, the examiner notes “we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer” teaches “constructing a decision system using the trained BoW network” , and [pg. 1, Col 2, 40-42]-[pg. 2, Col 1, lines 1-10] “When a test sequence is given, all the existing branches of binary classification layers, which were used in the training phase, are removed and a new single branch is constructed to compute target scores in the test sequence.  The new classification layer and the fully connected layers”, the examiner notes “training phase” teaches “trained”).
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo and Nam are analogous art because they are all focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, and Vo to incorporate the teaching of Nam to reorganize a neural network classifier as per Nam [Abstract] to improve the architecture of a neural network to align with a new task.
Ganguly teaches and training ([Abstract] “the best model can change based on most recent training performance results”, the examiner notes “training performance results” teaches “training”),
the decision system using the decision set to produce an output that specifies whether to use ([para 0018] “the decision component 110 is configured to predict or infer the likelihood, or probability, of an outcome given some input as a function of a predictive model or a set of predictive algorithms.  The predictive model, which can employ any number of statistical or machine learning techniques (…) neural networks”.  The examiner notes “The predictive model, which may employ any number of statistical or machine learning techniques (…) neural networks” teaches “decision neural network-based classifier”, and “the decision component 110 is configured to predict or infer the likelihood, or probability” teaches “output probabilities”, and “given some input” teaches “an input-by-input basis”).
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam and Ganguly are analogous art because they are both focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, and Nam to incorporate the teaching of Ganguly to use predictive analytics to evaluate the performance metrics of models for predictive data processing as per Ganguly [para 0018] to improve the model evaluation process for a decision component in selecting between predictive models.
Claims 30-32 are rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Wang, in view of Barry, in view of Sheikh, in view of Windasari, in view of Arlot, in view of Nam, in view of Ganguly, and in further view of Achin.
Regarding claim 30
The combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly teaches claim 29.
Barry teaches the trained LSTM network or the trained BoW network to classify a sentence's sentiment.  (pg. 1, lines 15)-(pg. 2, lines 1-10) “The popularity of such bag-of-words approaches is mainly due to their simplicity and efficiency, whilst having the ability to achieve very high accuracy.  Bag-of-words features are created by viewing the document as an unordered collection of words, which are then used to classify the document.  Despite their overall high success rates, there exist some downsides to using bag-of-words or n-gram approaches.  The main pitfall of such approaches is that they ignore long-range word ordering such that modifiers and their objects may be separated by many unrelated words (7).  As word order is lost, sentences with different meanings which use the same words will have similar representations.  Another key downside to using bag-of-words approaches is that they are unable to deal effectively with negation.” AND (pg. 2, lines 25-29) “the addition of word embeddings to the field of NLP has enabled practitioners to use more advanced learning algorithms which can handle sequential data as inputs such as Recurrent Neural Networks (RNNs).  An important development in the field of RNNs was the introduction of the Long Short-Term Memory (henceforth LSTM) RNN”. The examiner notes Bag-of-Words and LSTM RNN are well documented in academic literature for sentiment analysis, and choosing to use [Specifications 0036] “a combination of computationally cheap, less-accurate bag of words (BoW) model and computationally expensive, more-accurate long short-term memory (LSTM) model to perform natural processing tasks such as sentiment analysis” is not novel.
Mahasseni, Wang and Barry are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Wang to incorporate the teaching of Barry to integrate the use of Bag-of-Words and LSTM as per Barry [pg. 1, lines 15]-[pg. 2, lines 1-10] to improve computational speed and achieve highly accurate classification results in Natural Language Processing Machine Learning tasks.
 Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly do not teach “further including, during inference, using an output of the trained decision system to select either”.
Achin teaches further including, during inference, using an output of the trained decision system to teaches select either (Col 5, lines 35-43) “determining the suitability of the plurality of predictive modeling procedures comprises assigning suitability scores to the respective modeling procedures,(…) and where in selecting at least a subset of the predictive modeling procedures comprises selecting approximately a specific fraction of the predictive modeling procedures having highest suitability scores.” And (Col 22, lines 50-56) “space search engine may select the model with the highest score” The examiner notes that “the model with the highest score” teaches “when the output probability of the second model class label is higher than that of the first model class label”).
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, Ganguly and Achin are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly to incorporate the teaching of Achin to select a model for processing specific data as per Achin [Col 5, lines 35-43] to improve the .
Regarding claim 31
The combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, Ganguly, and Achin teaches claim 30.
Barry teaches wherein the decision system selects the trained LSTM network when the sentence is a linguistically complex sentence.  (pg. 1, lines 15)-(pg. 2, lines 1-10) “The popularity of such bag-of-words approaches is mainly due to their simplicity and efficiency, whilst having the ability to achieve very high accuracy.  Bag-of-words features are created by viewing the document as an unordered collection of words, which are then used to classify the document.  Despite their overall high success rates, there exist some downsides to using bag-of-words or n-gram approaches.  The main pitfall of such approaches is that they ignore long-range word ordering such that modifiers and their objects may be separated by many unrelated words (7).  As word order is lost, sentences with different meanings which use the same words will have similar representations.  Another key downside to using bag-of-words approaches is that they are unable to deal effectively with negation.” AND (pg. 2, lines 25-29) “the addition of word embeddings to the field of NLP has enabled practitioners to use more advanced learning algorithms which can handle sequential data as inputs such as Recurrent Neural Networks (RNNs).  An important development in the field of RNNs was the introduction of the Long Short-Term Memory (henceforth LSTM) RNN”. The examiner notes the strengths and weaknesses of Bag-of-Words and LSTM RNN are well documented in academic literature and known in the field of sentence 
Mahasseni, Wang and Barry are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Wang to incorporate the teaching of Barry to integrate the use of Bag-of-Words and LSTM as per Barry [pg. 1, lines 15]-[pg. 2, lines 1-10] to improve computational speed and achieve highly accurate classification results in Natural Language Processing Machine Learning tasks.
Regarding claim 32
The combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly teaches claim 30.
Barry teaches wherein the decision system selects the trained BoW network when the sentence is a linguistically simple sentence.  ([pg. 1, lines 15]-[pg. 2, lines 1-10] “The popularity of such bag-of-words approaches is mainly due to their simplicity and efficiency, whilst having the ability to achieve very high accuracy.  Bag-of-words features are created by viewing the document as an unordered collection of words, which are then used to classify the document.  Despite their overall high success rates, there exist some downsides to using bag-of-words or n-gram approaches.  The main pitfall of such approaches is that they ignore long-range word ordering such that modifiers and their objects may be separated by many unrelated words (7).  As word order is lost, sentences with different meanings which use the same words will have similar representations.  Another key downside to using bag-of-words approaches is that they are unable to deal effectively with negation.” AND (pg. 2, lines 25-29) “the addition of word embeddings to the field of NLP has enabled practitioners to use more advanced learning algorithms which can handle sequential data as inputs such as Recurrent Neural Networks (RNNs).  An important development in the field of RNNs was the introduction of the Long Short-Term Memory (henceforth LSTM) RNN”. The examiner notes the strengths and weaknesses of Bag-of-Words and LSTM RNN are well documented in academic literature and known in the field of sentence semantic analysis.  Choosing to classify a linguistically simple sentence using a Bag-of-Words classifier or using a LSTM to classify a linguistically complex sentence is not novel.
Mahasseni, Wang and Barry are analogous art because they are focused in the field of predictive analytics using Machine Learning. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni and Wang to incorporate the teaching of Barry to integrate the use of Bag-of-Words and LSTM as per Barry [pg. 1, lines 15]-[pg. 2, lines 1-10] to improve computational speed and achieve highly accurate classification results in Natural Language Processing Machine Learning tasks.
Claim 33 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Baluja, in view of Windasari, in view of Arlot, in further view of Anisimovich.
Regarding claim 33
The combination of Mahasseni, Baluja, Windasari, Arlot teaches claim 1.
Mahasseni, Baluja, Windasari, Arlot do not teach A non-transitory, computer-readable medium having computer executable instructions that implement the system of claim 1.
Anisimovich teaches A non-transitory, computer-readable medium having computer executable instructions that implement the system of claim 1.  ([Col 10, lines 29-31] “A non-transitory computer-readable medium having instructions stored therein that, when executed by at least one processor, cause the processor to :”)
Mahasseni, Baluja, Windasari, Arlot, and Asinimovich are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Baluja, Windasari, and Arlot to incorporate the teaching of Asinimovich to incorporate the use of non-transitory computer-readable medium as per Asinimovich [Col 10, lines 29-31] to store instructions for execution.
Claim 34 is rejected under 35 U.S.C. 103 as being unpatentable over Mahasseni, in view of Wang, in view of Barry, in view of Sheikh, in view of Windasari, in view of Arlot, in view of Vo, in view of Nam, in view of Ganguly, in further view of Anisimovich.
Regarding claim 34
The combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly teaches claim 29.
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly do not teach A non-transitory, computer-readable medium having computer executable instructions that implement the system of claim 29.
Anisimovich teaches A non-transitory, computer-readable medium having computer executable instructions that implement the system of claim 29.  ([Col 10, lines 29-31] “A non-transitory computer-readable medium having instructions stored therein that, when executed by at least one processor, cause the processor to :”).
Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, Ganguly and Asinimovich are analogous art because they are both focused in the field of predictive analytics using Machine Learning.  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Mahasseni, Wang, Barry, Sheikh, Windasari, Arlot, Vo, Nam, and Ganguly to incorporate the teaching of Asinimovich to incorporate the use of non-transitory computer-readable medium as per Asinimovich [Col 10, lines 29-31] to store instructions for execution.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN S WALKER whose telephone number is (303)297-4479.  The examiner can normally be reached on Monday - Friday 0730-1700 (MT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANN LO, can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/BENJAMIN WALKER/Examiner, Art Unit 2126                                                                                                                                                                                                        
/LUIS A SITIRICHE/Primary Examiner, Art Unit 2126