DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 03/29/2022 has been entered. 

Response to Arguments
Regarding 103 rejection, applicant’s arguments with respect to claim(s) 1-3, 5-14, and 16-27 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 17-21 and 24-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim 17 recite “weights are computed via learning a regularized neural network’. It’s not clear what is meant by weights are computed via learning a regularized neural network. For the purpose of examining, ‘weights are computed via learning a regularized neural network is interpreted to mean ‘training a regularized neural network’. Appropriate clarification/correction is required.
Claims 18-21 are rejected as they are being directly or indirectly dependent on rejected claims 17.
As for claims 24, and 25, which recites “attaching and training a plurality of linear probes on the flattened intermediate representations of a high performing neural network”. “high performing” is indefinite because examiner cannot know how high the performance of a neural networks needs to be, where in the specification is not defined. if “high performing neural network” is a specific neural network, applicant needs to clarify which neural network is being used. For examination, examiner is interpreting high performing neural network as any neural network being trained. Appropriate clarification/correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 6-13, 16-23 and 25-27 are rejected under 35 USC 103 as being unpatentable over Goel et al. (US 2018/0308487A1) in view of Jin et al. (US 2016/0283814 A1) in further view of Crammer et al. (Confidence-Weighted Linear Classification for Text Categorization).

Regarding claim 1. 
Goel teaches a computer-implemented method for improving a [simple] model (see ¶ 44, and figure 4, LSTM-RNN attention model 304) using a confidence profile (see ¶ 44, “The input layer 402 along with the hidden layer 404 is used to generate an output vector Y1 at an output layer 406 representing a particular semantic meaning of a word”, i.e. where the output vector y1 corresponds to the confidence profile), 
the method comprising: generating, using a linear probe (see ¶ 44 and figure 3, word2vec model 302), confidence scores through flattened intermediate representations of a neural network (see ¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation); 
and setting weights of samples during a training of the [simple] model using the confidence scores of the intermediate representations that justify the weights [and minimize a loss of the simple model] (see ¶ 28, “the level of confidence or confidence score shows the best match of the spoken word with the system's predefined grammar or the list of keyword”, theoretically-justified weighting of weights is interpreted as best match of sample I.e. confidence score shows the best match of the list of keyword) (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304. The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, where LSTM-RNN attention model 304 (simple model) trained using word2vect (i.e. linear probe that generates the confidence scores)): [and retraining the simple model using the weights that are justified]. 
Goel teaches improvement to the training, see ¶ 42, “the dialogue engine 100 trains another neural network to learn the output tags for each of these vectors.  The word embeddings are trained on very large amounts of text in the target language.  However the classifier is subsequently trains on only a smaller number of tags that are available from the grammar”, Which corresponds to improving the training process [where the loss of the simple model is increasing training and reducing error] but Goel does not teach a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model); minimize a loss of the simple model and retraining the simple model using the weights that are justified. 
Jin teaches a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model, therefore see ¶ 51-53 and ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. decision tree weight training model corresponds to the simple model); [minimize a loss of the simple model] and retraining the simple model using the weights that are justified (see ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. wherein the BP neural network train the model 1 to model N, then retrained using weights and mark-up samples that are justified to assure accuracy of the classification). 
Both Goel and Jin pertain to the problem of training using weights for enhanced classification efficiency and accuracy, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel and Jin to teach wherein a simple model being a decision tree model and retraining the simple model using the weights that are justified. The motivation for doing so would be “together with the combinational use of the marked up samples for extracting features from the text line samples, the generated text line classifiers provide for enhanced classification efficiency and accuracy.” (see Jin Abstract).

Goel and Jin do not specifically teach minimize a loss of the simple model.
Crammer teach minimize a loss of the simple model (see page 1895, section 3. Online Learning of Linear Classifiers, “The algorithm then updates its prediction rule and proceeds to the next round. For online evaluations, error is reported as the total loss i on the training data and in batch evaluations, error is reported on held out data”, also page 1896, formula 2 and 3, also see page 1896, “minimizing the divergence to the current weights” and formula 2 and 4 on the same page).
Goel, Jin and Crammer pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Jin and Crammer to use loss function to minimize a loss of the simple model. The motivation for doing so would be to calculate the loss of the simple model and correct the training until the loss is minimized, see page 1892 introduction, “Weight confidence is formalized with a Gaussian distribution over weight vectors, which is updated for each new training example so that the probability of correct classification for that example under the updated distribution meets a specified confidence” (see Crammer page 1892, 1896-1898).

Regarding claim 2. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the generating generates the confidence scores by: 
training the simple model on an original dataset (see ¶ 44, “a schematic representation of a word2vec model used for training an LSTM-RNN model with attention”); 
learning the weights for samples in the original dataset as a function of the simple model and the linear probes (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax is a function).

Regarding claim 6. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the weights are computed via training a regularized neural network that inputs the same confidence scores of a selected linear probe (see ¶ 32, “the level of confidence or confidence score shows the best match of the spoken word with the system's predefined grammar or the list of keyword”, theoretically-justified weighting of weights is interpreted as best match of sample I.e. confidence score shows the best match of the spoken word, also see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304. The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, where LSTM-RNN attention model 304 (simple model) trained using word2vect (i.e. linear probe that generates the confidence scores).

Regarding claim 7. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein a regularization term is set to keep the weights in the regularized neural network from going to zero when training the regularized neural network (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar [prevents weights to go to zero] corresponds to generating using linear probe to generate confidence scores]).

Regarding claim 8. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein a penalty is imposed on the weights in the learning to prevent the weights from diverging (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar [prevents weights to diverge] corresponds to generating using linear probe to generate confidence scores]).

Regarding claim 9. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein the regularized neural network is trained on batches of data (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors. Referring to FIG. 4, there are multiple input vectors W1V1, W1V2, . . . , W1Vn present at an input layer 402 that represents the possible candidate words appear in different context by different end user.”, where W1Vn are batches), and wherein the regularized neural network represents a function of all training samples (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax of multiple input vectors W1V1, W1V2, . . . , W1Vn is a function).

Regarding claim 10. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein the training alternates between minimizing two blocks of variables and when sub-problems have solutions and are differentiable, all limit points of the variables are shown to be stationary points as the learned weights (see ¶ 33, “The system utilizes GRXML or JSGF of ABNF format grammars to learn the one or more action tags and entities of the semantic engine, and also for enhancing a vocabulary based on the grammar model and a vocabulary based on the language model”, wherein enhancing vocabulary corresponds to limit points and also see ¶ 33, “The method further comprises step of extracting acoustic features from the input speech of the end user to identify language and/or accent the end user”), and wherein the simple model is trained with the corresponding learned weights (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304”).

Regarding claim 11. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches embodied in a cloud-computing environment (see ¶ 3, teaching the language model text may come from books, websites etc. therefore a delivery of service through internet is embodied i.e. cloud-computing environment).

Claim 12 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 1. Therefore the rejection of claims 1 above applies equally here. Jin also teaches the addition elements of claim 12 not recited in claim 1 comprising the computer program product comprising a computer- readable storage medium having program instructions embodied therewith (see ¶ 69, “Regardless of being implemented using software, hardware, firmware or the combinations thereof, instruction code can be stored in any kind of computer readable media (for example, permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or changeable medium, etc.)”).
Claim 13 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 2. Therefore the rejection of claims 2 above applies equally here. 
Regarding claim 16. 
Goel, Jin and Crammer teaches the non-transitory computer program product of claim 12, 
Goel further teaches wherein the weights are computed via training a regularized neural network that inputs the same confidence scores of a selected linear probe and weights of the samples that are set in the setting (see ¶ 32, “the level of confidence or confidence score shows the best match of the spoken word with the system's predefined grammar or the list of keyword”, theoretically-justified weighting of weights is interpreted as best match of sample I.e. confidence score shows the best match of the spoken word, also see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304. The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, where LSTM-RNN attention model 304 (simple model) trained using word2vect (i.e. linear probe that generates the confidence scores).
Claim 17 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 6. Therefore the rejection of claims 6 above applies equally here. 
Claim 18 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 7. Therefore the rejection of claims 7 above applies equally here. 
Claim 19 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 8. Therefore the rejection of claims 8 above applies equally here. 
Claim 20 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 9. Therefore the rejection of claims 9 above applies equally here. 
Claim 21 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 10. Therefore the rejection of claims 10 above applies equally here. 
Claim 22 recite a system to perform the computer-implemented method recited in claim 1. Therefore the rejection of claim 1 above applies equally here. Jin also teaches the addition elements of claim 22 not recited in claim 1 comprising processor; and a memory, the memory storing instructions (see ¶ 69, “medium can be implemented using, for example, programmable array logic (PAL), random access memory (RAM), programmable read only memory (PROM), read only memory (ROM), electrically erasable programmable ROM (EEPROM), magnetic storage, optical storage, digital versatile disc (DVD), or the like”, also see claim 8, comprising processor).
Claim 23 recite a system to perform the computer-implemented method recited in claim 11. Therefore the rejection of claim 11 above applies equally here.
Regarding claim 25. 
Goel teaches a computer-implemented method for improving a [simple] model (see ¶ 44, and figure 4, LSTM-RNN attention model 304) using a confidence profile (see ¶ 44, “The input layer 402 along with the hidden layer 404 is used to generate an output vector Y1 at an output layer 406 representing a particular semantic meaning of a word”, i.e. where the output vector y1 corresponds to the confidence profile), 
the method comprising: generating, using a linear probe (see ¶ 44 and figure 3, word2vec model 302), confidence scores through flattened intermediate representations of a neural network (see ¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation); 
and wherein the generating generates the confidence scores by: attaching and training the linear probe on the flattened intermediate representations of a high performing neural network  (see figure 4 and ¶ 45 “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors. Referring to FIG. 4, there are multiple input vectors W1V1, W1V2, . . . , W1Vn present at an input layer 402 that represents the possible candidate words appear in different context by different end user… the recurrent neural network (RNN) uses that output class vector for the word”, [i.e. attaching and training all the word2vec multi-words in the RNN (high performing neural network)]);
training the [simple] model on an original dataset (see ¶ 44, “a schematic representation of a word2vec model used for training an LSTM-RNN model with attention”); 
learning the weights for examples in the original dataset as a function of the [simple] model and the linear probes (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax is a function); 
[and retraining the simple model on a final weighted dataset while minimizing a loss of the simple model].
Goel teaches improvement to the training, see ¶ 42, “the dialogue engine 100 trains another neural network to learn the output tags for each of these vectors.  The word embeddings are trained on very large amounts of text in the target language.  However the classifier is subsequently trains on only a smaller number of tags that are available from the grammar”, Which corresponds to improving the training process [where the loss of the simple model is increasing training and reducing error] also see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304” but Goel does not teach a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model); and retraining the simple model on a final weighted dataset while minimizing a loss of the simple model.
Jin teaches a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model, therefore see ¶ 51-53 and ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. decision tree weight training model corresponds to the simple model); 
and retraining the simple model on a final weighted dataset while [minimizing a loss] of the simple model. (see ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. wherein the BP neural network train the model 1 to model N, then retrained using weights and mark-up samples that are justified to assure accuracy of the classification). 
Both Goel and Jin pertain to the problem of training using weights for enhanced classification efficiency and accuracy, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel and Jin to teach wherein a simple model being a decision tree model and retraining the simple model using the weights that are justified. The motivation for doing so would be “together with the combinational use of the marked up samples for extracting features from the text line samples, the generated text line classifiers provide for enhanced classification efficiency and accuracy.” (see Jin Abstract).

Goel and Jin do not specifically teach minimize a loss of the simple model.
Crammer teach minimize a loss of the simple model (see page 1895, section 3. Online Learning of Linear Classifiers, “The algorithm then updates its prediction rule and proceeds to the next round. For online evaluations, error is reported as the total loss i on the training data and in batch evaluations, error is reported on held out data”, also page 1896, formula 2 and 3, also see page 1896, “minimizing the divergence to the current weights” and formula 2 and 4 on the same page).
Goel, Jin and Crammer pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Jin and Crammer to use loss function to minimize a loss of the simple model. The motivation for doing so would be to calculate the loss of the simple model and correct the training until the loss is minimized, see page 1892 introduction, “Weight confidence is formalized with a Gaussian distribution over weight vectors, which is updated for each new training example so that the probability of correct classification for that example under the updated distribution meets a specified confidence” (see Crammer page 1892, 1896-1898).

Regarding claim 26. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the simple model is not classified as a neural network (see ¶ 51-53 and ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. decision tree weight training model corresponds to the simple model which is not classified as neural network).
The motivation utilized in the combination of claim 1, applies equally as well to claim 26.

Regarding claim 27. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the simple model includes of a lasso model or decision tree model (see ¶ 51-53 and ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. decision tree weight training model corresponds to the simple model).
The motivation utilized in the combination of claim 1, applies equally as well to claim 27.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3, 5, 14 and 24 are rejected under 35 USC 103 as being unpatentable over Goel et al. (US 2018/0308487A1) in view of Jin et al. (US 2016/0283814 A1) in further view of Crammer et al. (Confidence-Weighted Linear Classification for Text Categorization) in further view of Mau et al (“Gaussian Probabilistic Confidence Score for Biometric Applications”).

Regarding claim 3. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 2, 
Goel and Crammer do not teaches wherein the function includes an area under curve (AUC) function.
Mau teaches wherein the function includes an area under curve (AUC) function (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
Goel, Jin, Crammer and Mau pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Jin, Crammer and Mau to use the function of area under the curve to calculate confidence scores. The motivation for doing so would be calculate the area of all confidence scores to train best weighted weights based on the best confidence scores (see Mau last paragraph of page 4).

Regarding claim 5. 
Goel, Jin and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the weights are computed…of the confidence scores of a selected linear probe and then justified in the justified weighting (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation).
Goel do not teach wherein the weights are computed via computing an area under a curve (AUC) of the confidence scores.
Mau teaches wherein the weights are computed via computing an area under a curve (AUC) of the confidence scores (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
The motivation utilized in the combination of claim 3, applies equally as well to claim 5.
Claim 14 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 3. Therefore the rejection of claim 3 above applies equally here. 
Regarding claim 24. 
Goel teaches a computer-implemented method for improving a [simple] model (see ¶ 44, and figure 4, LSTM-RNN attention model 304) using a confidence profile (see ¶ 44, “The input layer 402 along with the hidden layer 404 is used to generate an output vector Y1 at an output layer 406 representing a particular semantic meaning of a word”, i.e. where the output vector y1 corresponds to the confidence profile), the method comprising: 
generating, using a linear probe (see ¶ 44 and figure 3, word2vec model 302), confidence scores through flattened intermediate representations of a neural network (see ¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation);  
and wherein the generating generates the confidence scores by: attaching and training the linear probe on the flattened intermediate representations of a high performing neural network (see figure 4 and ¶ 45 “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors. Referring to FIG. 4, there are multiple input vectors W1V1, W1V2, . . . , W1Vn present at an input layer 402 that represents the possible candidate words appear in different context by different end user… the recurrent neural network (RNN) uses that output class vector for the word”, [i.e. attaching and training all the word2vec multi-words in the RNN (high performing neural network)]); 
training the [simple] model on an original dataset (see ¶ 44, “a schematic representation of a word2vec model used for training an LSTM-RNN model with attention”); 
learning the weights for samples in the original dataset as an [area under curve function] of the [simple] model and the linear probes (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax is a function); 
[and retraining the simple model on a final weighted dataset while minimizing a loss of the simple model].
Goel teaches improvement to the training, see ¶ 42, “the dialogue engine 100 trains another neural network to learn the output tags for each of these vectors.  The word embeddings are trained on very large amounts of text in the target language.  However the classifier is subsequently trains on only a smaller number of tags that are available from the grammar”, Which corresponds to improving the training process [where the loss of the simple model is increasing training and reducing error] also see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304” but Goel does not teach a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model); wherein the function is an area under a curve function and retraining the simple model on a final weighted dataset while minimizing a loss of the simple model.
Jin teaches a simple mode (Examiner note: wherein interpretation is based on new claim 27, a simple model being a decision tree model OR a lasso model, therefore see ¶ 51-53 and ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. decision tree weight training model corresponds to the simple model); and retraining the simple model on a final weighted dataset while [minimizing a loss] of the simple model. (see ¶ 64, “back propagation (BP) neural network is utilized for single model training by use of the extracted features. Since each type of text line samples can train a correspondent model, various types of text line samples train various models, each model can be designated as a decision tree. In the beginning, a weight is assigned for each decision tree, then the weight training is performed for the decision trees by use of a portion of the marked up samples such that each decision tree is assigned an appropriate weight to assure the accuracy of the classification.”, i.e. wherein the BP neural network train the model 1 to model N, then retrained using weights and mark-up samples that are justified to assure accuracy of the classification). 
Both Goel and Jin pertain to the problem of training using weights for enhanced classification efficiency and accuracy, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel and Jin to teach wherein a simple model being a decision tree model and retraining the simple model using the weights that are justified. The motivation for doing so would be “together with the combinational use of the marked up samples for extracting features from the text line samples, the generated text line classifiers provide for enhanced classification efficiency and accuracy.” (see Jin Abstract).
Goel and Jin do not specifically teach minimize a loss of the simple model and wherein the function is an area under a curve (AUC).
Crammer teach minimize a loss of the simple model (see page 1895, section 3. Online Learning of Linear Classifiers, “The algorithm then updates its prediction rule and proceeds to the next round. For online evaluations, error is reported as the total loss i on the training data and in batch evaluations, error is reported on held out data”, also page 1896, formula 2 and 3, also see page 1896, “minimizing the divergence to the current weights” and formula 2 and 4 on the same page).
Goel, Jin and Crammer pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Jin and Crammer to use loss function to minimize a loss of the simple model. The motivation for doing so would be to calculate the loss of the simple model and correct the training until the loss is minimized, see page 1892 introduction, “Weight confidence is formalized with a Gaussian distribution over weight vectors, which is updated for each new training example so that the probability of correct classification for that example under the updated distribution meets a specified confidence” (see Crammer page 1892, 1896-1898).
Goel, Jin and Crammer do not teach wherein the function is an area under a curve (AUC).
Mau teaches wherein the function is an area under a curve (AUC) (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
Goel, Jin, Crammer and Mau pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Jin, Crammer and Mau to use the function of area under the curve to calculate confidence scores. The motivation for doing so would be calculate the area of all confidence scores to train best weighted weights based on the best confidence scores (see Mau last paragraph of page 4).


Conclusion
Related arts not used in the current office action:
MNIH et al. (US 20150095017 A1): A system and method are provided for learning natural language word associations using a neural network architecture. A word dictionary comprises words identified from training data consisting a plurality of sequences of associated words. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IMAD M KASSIM whose telephone number is (571)272-2958. The examiner can normally be reached mon-fri 730-500.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached on (303) 297 - 4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.K./Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129