DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Regarding 103 rejection, applicant’s arguments with respect to claim(s) 1-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Regarding the 112 rejection, rejection for claims 1, 12 and 22 is maintained because it is not clear what the limitation “theoretically-justify the weights” mean. See below for more details. 
Rejection for claim 17 is maintained because there was no amendment or remark regarding the rejection. See below for rejection. 
Added rejection for claim 25 reciting “learning the weights for examples in the original dataset as function for solving a neural network learnt of the simple model and the linear probes”. See below for rejection and interpretation. 

Examiner’s Remarks
As for analysis for independent claims 1, 12, 22, 24 and 25 based on 35 U.S.C. 101 directed to an abstract idea without significantly more, the examiner is interpreting the limitation “setting weights of samples during a training of the simple model using the confidence scores of the intermediate representations that theoretically-justify the weights and minimize a loss of the simple model.” to perform improvements to the weights of the samples during a training of the simple model using the confidence scores of the intermediate representations. Since there is an 112B rejection to clarify what is meant “theoretically-justify the weights”, examiner suggest to fully clarify how the weights and the training are being performed because there may be a potential 101 directed to an abstract idea concern. 
As for claim 2, similarly recited in claims 13, 24, and 25, which recites “attaching and training a plurality of linear probes on the flattened intermediate representations of a high performing neural network”, examiner is interpreting high performing neural network as a DNN neural network as recited in ¶ 29 “ a method where probes are added to the intermediate layers of a deep neural network (DNN)”, where the DNN is the high performance neural network. Where a high performing neural network is one of the types of neural networks. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-3, 5-14,  16-23 and 25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out 
Claims 1, 12 and 22 recite “theoretically-justify the weights”. It’s not clear what is meant by theoretically- justify the weights. For the purpose of examining, ‘theoretically- justify the weights’ is interpreted to mean ‘weights of samples’. Appropriate clarification/correction is required. 
Claim 17 recite “weights are computed via learning a regularized neural network’. It’s not clear what is meant by weights are computed via learning a regularized neural network. For the purpose of examining, ‘weights are computed via learning a regularized neural network is interpreted to mean ‘training a regularized neural network’. Appropriate clarification/correction is required.
Claim 17 recite “justified in the justified weighting”. it’s not clear what is meant by justified in the justified weighting. For the purpose of examining, ‘justified in the justified weighting is interpreted to mean ‘stored in the weighting’. Appropriate clarification/correction is required.
Claim 25 recite “learning the weights for examples in the original dataset as function for solving a neural network learnt of the simple model and the linear probes”. It’s not clear what is meant by learning the weights for examples in the original dataset as function for solving a neural network learnt of the simple model and the linear probes. For the purpose of examining, ‘learning the weights for examples in the original dataset as function for solving a neural network learnt of the simple model and the linear probes’ is interpreted to mean ‘training weights for samples in the original dataset as a function learnt of simple model and linear probes’. Appropriate clarification/correction is required. 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 6-13, 16-23 and 25 are rejected under 35 USC 103 as being unpatentable over Goel et al. (US 2018/0308487A1) in view of Crammer et al. (Confidence-Weighted Linear Classification for Text Categorization).

Regarding claim 1. 
Goel teaches a computer-implemented method for improving a simple model (see ¶ 44, and figure 4, LSTM-RNN attention model 304) using a confidence profile (see ¶ 44, “The input layer 402 along with the hidden layer 404 is used to generate an output vector Y1 at an output layer 406 representing a particular semantic meaning of a word”, i.e. where the output vector y1 corresponds to the confidence profile), the method comprising: 
generating, using a linear probe (see ¶ 44 and figure 3, word2vec model 302), confidence scores through flattened intermediate representations (see ¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation); 
and setting weights of samples during a training of the simple model using the confidence scores of the intermediate representations that theoretically-justify the weights (see ¶ 28, “the level of confidence or confidence score shows the best match of the spoken word with the system's predefined grammar or the list of keyword”, theoretically-justified weighting of weights is interpreted as best match of sample I.e. confidence score shows the best match of the list of keyword) (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304. The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, where LSTM-RNN attention model 304 (simple model) trained using word2vect (i.e. linear probe that generates the confidence scores)). 
minimize a loss of the simple model.
Crammer teach minimize a loss of the simple model (see page 1895, section 3. Online Learning of Linear Classifiers, “The algorithm then updates its prediction rule and proceeds to the next round. For online evaluations, error is reported as the total loss i on the training data and in batch evaluations, error is reported on held out data”, also page 1896, formula 2 and 3, also see page 1896, “minimizing the divergence to the current weights” and formula 2 and 4 on the same page).
Both Goel and Crammer pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel and Crammer to use loss function to minimize a loss of the simple model. The motivation for doing so would be to calculate the loss of the simple model and correct the training until the loss is minimized, see page 1892 introduction, “Weight confidence is formalized with a Gaussian distribution over weight vectors, which is updated for each new training example so that the probability of correct classification for that example under the updated distribution meets a specified confidence” (see Crammer page 1892, 1896-1898).

Regarding claim 2. 
Goel and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the generating generates the confidence scores by: attaching and training a plurality of linear probes on the flattened intermediate representations of a high performing neural network (see figure 4 and ¶ 45 “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors. Referring to FIG. 4, there are multiple input vectors W1V1, W1V2, . . . , W1Vn present at an input layer 402 that represents the possible candidate words appear in different context by different end user… the recurrent neural network (RNN) uses that output class vector for the word”, [i.e. attaching and training all the word2vec multi-words in the RNN (high performing neural network)]); 
training the simple model on an original dataset (see ¶ 44, “a schematic representation of a word2vec model used for training an LSTM-RNN model with attention”); 
learning the weights for samples in the original dataset as a function of the simple model and the linear probes (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax is a function); 
and retraining the simple model on a final weighted dataset (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304”).

Regarding claim 6. 
Goel and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches wherein the weights are computed via training a regularized neural network that inputs the same confidence scores of a selected linear probe and weights of the samples that are set in the setting (see ¶ 32, “the level of confidence or confidence score shows the best match of the spoken word with the system's predefined grammar or the list of keyword”, theoretically-justified weighting of weights is interpreted as best match of sample I.e. confidence score shows the best match of the spoken word, also see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304. The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, where LSTM-RNN attention model 304 (simple model) trained using word2vect (i.e. linear probe that generates the confidence scores).

Regarding claim 7. 
Goel and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein a regularization term is set to keep the weights in the regularized neural network from going to zero when training the regularized neural network (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar [prevents weights to go to zero] corresponds to generating using linear probe to generate confidence scores]).

Regarding claim 8. 
Goel and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein a penalty is imposed on the weights in the learning to prevent the weights from diverging (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar [prevents weights to diverge] corresponds to generating using linear probe to generate confidence scores]).

Regarding claim 9. 
Goel and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein the regularized neural network is trained on batches of data (see ¶ 44, “The word2vec model 302 models multi-word context with a ), and wherein the regularized neural network represents a function of all training samples (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax of multiple input vectors W1V1, W1V2, . . . , W1Vn is a function).

Regarding claim 10. 
Goel and Crammer teaches the computer-implemented method of claim 6, 
Goel further teaches wherein the training alternates between minimizing two blocks of variables and when sub-problems have solutions and are differentiable, all limit points of the variables are shown to be stationary points as the learned weights (see ¶ 33, “The system utilizes GRXML or JSGF of ABNF format grammars to learn the one or more action tags and entities of the semantic engine, and also for enhancing a vocabulary based on the grammar model and a vocabulary based on the language model”, wherein enhancing vocabulary corresponds to limit points and also see ¶ 33, “The method further comprises step of extracting acoustic features from the input speech of the end user to identify language and/or accent the end user”), and wherein the simple model is trained with the corresponding learned weights (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304”).

Regarding claim 11. 
Goel and Crammer teaches the computer-implemented method of claim 1, 
Goel further teaches embodied in a cloud-computing environment (see ¶ 3, teaching the language model text may come from books, websites etc. therefore a delivery of service through internet is embodied i.e. cloud-computing environment).

Claim 12 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 1. Therefore the rejection of claims 1 above applies equally here. 
Claim 13 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 2. Therefore the rejection of claims 2 above applies equally here. 
Claim 16-17 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 6. Therefore the rejection of claims 6 above applies equally here. 
Claim 18 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 7. Therefore the rejection of claims 7 above applies equally here. 
Claim 19 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 8. Therefore the rejection of claims 8 above applies equally here. 
Claim 20 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 9. Therefore the rejection of claims 9 above applies equally here. 
Claim 21 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 10. Therefore the rejection of claims 10 above applies equally here. 
Claims 22-23 recite a system to perform the computer-implemented method recited in claims 1 and 11. Therefore the rejection of claims 1 and 11 above applies equally here. 
Claim 25 recite a method to perform the computer-implemented method recited in the combination of claims 1-2. Therefore the rejection of claims 1-2 above applies equally here. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3, 5, 14 and 24 are rejected under 35 USC 103 as being unpatentable over Goel et al. (US 2018/0308487A1) in view of Crammer et al. (Confidence-Weighted Linear Classification for Text Categorization) in further view of Mau et al (“Gaussian Probabilistic Confidence Score for Biometric Applications”).

Regarding claim 3. 
Goel and Crammer teaches the computer-implemented method of claim 2, 
Goel and Crammer do not teaches wherein the function includes an area under curve (AUC) function.
Mau teaches wherein the function includes an area under curve (AUC) function (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
Goel, Crammer and Mau pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Crammer and Mau to use the function of area under the curve to calculate confidence scores. The motivation for doing so would be calculate the area of all confidence scores to train best weighted weights based on the best confidence scores (see Mau last paragraph of page 4).

Regarding claim 5. 

Goel further teaches wherein the weights are computed…of the confidence scores of a selected linear probe and then justified in the justified weighting (¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the input speech word sequence into an action or semantic tag using the intermediate word2vec representation”, i.e. flatten intermediate representation).
Goel do not teach wherein the weights are computed via computing an area under a curve (AUC) of the confidence scores.
Mau teaches wherein the weights are computed via computing an area under a curve (AUC) of the confidence scores (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
Both Goel and Mau pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel and Mau to use the function of area under the curve to calculate confidence scores. The motivation for doing so would be 
Claim 14 recite a non-transitory computer program product to perform the computer-implemented method recited in claim 3. Therefore the rejection of claim 3 above applies equally here. 
Regarding claim 24. 
Goel and Crammer teaches a computer-implemented method for improving a simple model (see ¶ 44, and figure 4, LSTM-RNN attention model 304) using a confidence profile (see ¶ 44, “The input layer 402 along with the hidden layer 404 is used to generate an output vector Y1 at an output layer 406 representing a particular semantic meaning of a word”, i.e. where the output vector y1 corresponds to the confidence profile), the method comprising: 
generating, using a linear probe (see ¶ 44 and figure 3, word2vec model 302), confidence scores through flattened intermediate representations (see ¶¶ 27-28 teaches using confidence scores to show best match of the spoken words with the system predefined grammar, also see ¶ 32, “wherein the semantic engine utilizes a grammar model and the language model to extract meaning for said one or more transcripts.”, also see ¶ 44, “The word2vec model 302, used in the semantic engine 104, is designed as a continuous bag of word (CBOW) by using a multilayer neural network with one or more hidden layers 404.”, [i.e. word2vec (linear probe) uses the semantic engine which generates confidence score for best grammar corresponds to generating using linear probe to generate confidence scores], also see ¶ 44, “The LSTM-RNN 304 encodes the the intermediate word2vec representation”, i.e. flatten intermediate representation);  
and wherein the generating generates the confidence scores by: attaching and training the linear probe on the flattened intermediate representations of a high performing neural network (see figure 4 and ¶ 45 “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors. Referring to FIG. 4, there are multiple input vectors W1V1, W1V2, . . . , W1Vn present at an input layer 402 that represents the possible candidate words appear in different context by different end user… the recurrent neural network (RNN) uses that output class vector for the word”, [i.e. attaching and training all the word2vec multi-words in the RNN (high performing neural network)]); 
training the simple model on an original dataset (see ¶ 44, “a schematic representation of a word2vec model used for training an LSTM-RNN model with attention”); 
learning the weights for samples in the original dataset as an area under curve function of the simple model and the linear probes (see ¶ 44, “The word2vec model 302 models multi-word context with a computationally efficient hierarchical softmax updates to learn the output vectors”, where softmax is a function); 
and retraining the simple model on a final weighted dataset (see ¶ 44, “The output vector Y1 of the word2vec model 302 is used for training a LSTM-RNN attention model 304”).
Goel do not teach wherein the function is an area under a curve (AUC) and minimize a loss of the simple model.
minimize a loss of the simple model (see page 1895, section 3. Online Learning of Linear Classifiers, “The algorithm then updates its prediction rule and proceeds to the next round. For online evaluations, error is reported as the total loss i on the training data and in batch evaluations, error is reported on held out data”, also page 1896, formula 2 and 3, also see page 1896, “minimizing the divergence to the current weights” and formula 2 and 4 on the same page).
Goel and Crammer pertain to the problem of training using confidence scores, thus being analogous. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to combine Goel, Mau and Crammer to use loss function to minimize a loss of the simple model. The motivation for doing so would be to calculate the loss of the simple model and correct the training until the loss is minimized, see page 1892 introduction, “Weight confidence is formalized with a Gaussian distribution over weight vectors, which is updated for each new training example so that the probability of correct classification for that example under the updated distribution meets a specified confidence” (see Crammer page 1892, 1896-1898).
Goel and Crammer do not teach wherein the function is an area under a curve (AUC).
Mau teaches wherein the function is an area under a curve (AUC) (see page 3, “The area under the curve (AUC) from the ROC plot was 94.01% for MRH distance, 94.38% for Binomial confidence, and 94.64% for both Normal confidence and log Normal confidence”, also see figures 2-4).
.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IMAD M KASSIM whose telephone number is (571)272-2958.  The examiner can normally be reached on mon-fri 730-500.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached on (303) 297 - 4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/IMAD KASSIM/Examiner, Art Unit 2125                                                                                                                                                                                             
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129