DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claims 18 and 19 are objected to because of the following informalities:  The “method” should be changed to “system”, if they were to be dependent on claim 1.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-13, 15-18, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Epstein et al. (US 2015/0348541) in view of Weng et al. (US 2012/0271631).
Claim 1
Epstein discloses a system comprising: at least one hardware processor; a memory device coupled to the at least one hardware processor; 
a first language model stored on the memory device; and a second language model stored on the memory device (See Fig. 1 with multiple language models); 
the at least one hardware processor operable to at least: 
run the first language model with an input set of words in a given sentence, the first language model outputting a first set of candidate words predicted to follow the input set of the first language model for a set of input data, and also obtains output of the second language model for the set of input data (508). Various linguistic contexts, such as word sequences, may be provided to the models. Estimates or likelihood scores may be received from the models, representing likelihoods, indicated by the models, that certain words or class symbols will occur in the linguistic contexts that were provided. [0063] The computing system evaluates the output of the first language model and the output of the second language model (510). The computing system may determine a first score based on the output of the first language model, determine a second score based on the output of the second language model, and compare the first score and the second score. For example, the scores may be word error rate scores determined for the respective language models. As another example, the scores may be perplexity scores determined for the respective language models.); 
run the second language model with the input set of words in the given sentence, the second language model outputting a second set of candidate words predicted to follow the input set of words in the given sentence, the second language model further outputting a score associated with each of the candidate words in the second set of candidate words (Same citations as above is provided here. [0062] The computing system obtains output of the first language model for a set of input data, and also obtains output of the second language model for the set of input data (508). Various linguistic contexts, such as word sequences, may be provided to the models. Estimates or likelihood scores may be received from the models, representing likelihoods, indicated by the models, that certain words or class symbols will occur in the linguistic contexts that were provided. [0063] The computing system evaluates the output of the 
receive an actual word following the input set of words ([0044], Transcriptions of the utterances can be obtained, for example, by human transcription or human labeling or verification of a transcription, so that the ground truth about the actual content of the utterances is known.); 
responsive to determining that the actual word matches with a candidate word in the first set of candidate words, update a first cumulative tally associated with the first language model with the score associated with the candidate word in the first set matching the actual word ([0044], The output of the language models 132a-132n can be compared to the known transcriptions for the utterances to determine word error rates. [0045] As another example, the model evaluator 140 may determine cross-perplexity scores 144 for the language models 132a-132n. The test data 126 may include examples of language for which instances of certain semantic classes have been identified. Given various language sequences as input, the model evaluator 140 may determine how well each of the language models 132a-132n represents the test data 126. For example, the perplexity scores 144 may indicate, that given a set of valid language sequences, to what extent does a model predict these sequences to occur. Generally, perplexity is correlated with word error rate. [0046] The model evaluator 140 compares the scores for the various language models 132a-132n. For example, the model evaluator 140 may 0053] In this manner, the best classes to represent each concept may be determined one by one, by repeating the processing of stages (C) to (E) with variations of a concept being tested each time. As a result, Examiner notes the score tally changes accordingly); 
responsive to determining that the actual word matches with a candidate word in the second set of candidate words, update a second cumulative tally associated with the second language model with the score associated with the candidate word in the second set matching the actual word (Same citations provided in above paragraph is repeated here. [0044], The output of the language models 132a-132n can be compared to the known transcriptions for the utterances to determine word error rates. [0045] As another example, the model evaluator 140 may determine cross-perplexity scores 144 for the language models 132a-132n. The test data 126 may include examples of language for which instances of certain semantic classes have been identified. Given various language sequences as input, the model evaluator 140 may determine how well each of the language models 132a-132n represents the test data 126. For example, the perplexity scores 144 may indicate, that given a set of valid language sequences, to what extent does a model predict these sequences to occur. Generally, perplexity is correlated with word error rate. [0046] The model evaluator 140 compares the scores for the various language models 132a-132n. For example, the model evaluator 140 may compare the word error rate scores 142 of the different language models 132a-132n to each other to determine which language model 132a-132n has the lowest error rate. [0053] In this manner, the best classes to represent each concept may be determined one by one, by repeating the processing of stages (C) to (E) with variations of 
Still Epstein may not explicitly detail responsive to determining that the first cumulative tally and the second cumulative tally deviate by more than a pre-defined threshold, identify the actual word in the given sentence for flagging.
Weng teaches in [0056], “If all of the confidence scores for all of the language models are lower than the lowest confidence score in the respective acceptable range of confidence scores, then the system flags the result as unrecognized.”
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate speech recognition using multiple language models as taught by Weng with the language model generation system of Epstein, because doing so would have provided a way to analyze the generated confidence score for each of the models against the range of confidence scores associated with correctly recognized test data utterances (the "acceptable range of confidence scores") ([0055] of Weng).
Claim 2
Epstein in view of Weng further suggests the system of claim 1, wherein the first language model and the second language model are artificial neural network models ([0031] of Epstein, machine learning algorithm [0032] FIG. 3 is a diagram that illustrates an example of training data 300 that may be used to generate a language model.). 
Claim 4
Epstein in view of Weng further teaches the system of claim 1, wherein the first language model is trained based on a first training data set including at least data determined to have proper usage of a language, and the second language model is trained based on a second training 0056] Multiple test data sets may be used to test the language models 132a-132n, where each test data set includes examples of a different type of context.).  
Claim 5
Epstein in view of Weng further teaches the system of claim 4, wherein the first training data set includes at least data associated with a category of submission, wherein the first language model is trained to output the first set of next words likely to be associated with language found in the category of submission ([0056] of Epstein, Multiple test data sets may be used to test the language models 132a-132n, where each test data set includes examples of a different type of context. The language model and associated class that provide the best performance for each test data set may be selected. From these results, the computing system 110 may generate a set of rules that indicate which classes should be used in which contexts. As an 
Claim 6
Epstein in view of Weng further teaches the system of claim 4, wherein the first training data set includes at least data associated with a category of submission grouped by a cohort, wherein the first language model is trained to output the first set of next words likely to be associated with language found in the category of submission grouped by the cohort (as provided above in claim 5, and claim 6 with respect to [0055]-[0056] of Epstein, a cohort could be a general location class versus lower-level class).  
Claim 7
Epstein in view of Weng further teaches the system of claim 1, wherein the given sentence is fed into the first language model and the second language model one word at a time ([0039] of Epstein, The language models 132a-132n may use any appropriate amount of context to generate likelihood scores. The language models 132a-132n may use a previous word, two previous words, or another amount of context to estimate the likelihood of occurrence of the next word in the sequence. Each of the language models 132a-132n is trained to estimate the 
Claim 8
Epstein in view of Weng further teaches the system of claim 1, wherein the at least one hardware processor is operable to cause flagging of the actual word in the given sentence (Weng teaches in [0056], “If all of the confidence scores for all of the language models are lower than the lowest confidence score in the respective acceptable range of confidence scores, then the system flags the result as unrecognized.”).
Claim 9
Epstein in view of Weng further teaches the system of claim 1, wherein the at least one hardware processor is operable to cause highlighting of the actual word in the given sentence (Weng teaches in [0056], “If all of the confidence scores for all of the language models are lower than the lowest confidence score in the respective acceptable range of confidence scores, then the system flags the result as unrecognized.” Examiner interprets highlighting as a form of emphasis such as flagging, and not as a visual highlight on a display device. Further Epstein teaches [0077] of Epstein, Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback).  
Claim 10
Epstein in view of Weng further teaches the system of claim 1, wherein the at least one hardware processor is operable to provide the first set of candidate words ([0062] of Epstein, The computing system obtains output of the first language model for a set of input data, and also obtains output of the second language model for the set of input data (508). Various linguistic 
Claim 11
Epstein in view of Weng further teaches the system of claim 10, wherein the at least on hardware processor is operable to cause a presentation of the first set of candidate words ([0077] of Epstein, Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback. [0047] of Weng, Each of the models in the family of grammar-based language models 124 and the family of statistical language models 128 is associated with a recognizer which at block 406 generates a recognized output and a confidence score for each of the respective models in the family of grammar-based language models and the family of statistical language models.).
Claims 12-13, 15-18
	These claims recite substantially the same limitations as those provided in claims 1-2, 4-7 above, and therefore they are rejected for the same reasons.
Claim 20
Epstein teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to:
run by the device, a first language model with an input set of words in a given sentence, the first language model outputting a first set of candidate words predicted to follow the input set of words in the given sentence; run by the device, a second language model with the input set of the first language model for a set of input data, and also obtains output of the second language model for the set of input data (508). Various linguistic contexts, such as word sequences, may be provided to the models. Estimates or likelihood scores may be received from the models, representing likelihoods, indicated by the models, that certain words or class symbols will occur in the linguistic contexts that were provided. [0063] The computing system evaluates the output of the first language model and the output of the second language model (510). The computing system may determine a first score based on the output of the first language model, determine a second score based on the output of the second language model, and compare the first score and the second score. For example, the scores may be word error rate scores determined for the respective language models. As another example, the scores may be perplexity scores determined for the respective language models.); and 
Epstein teaches [0077], Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. Still Epstein may not clearly detail based on the first set of candidate words, the second set of candidate words, and an actual next word following the input set of words, provide by the device, guidance for phrasing the given sentence.
Weng teaches in [0056], “If all of the confidence scores for all of the language models are lower than the lowest confidence score in the respective acceptable range of confidence scores, then the system flags the result as unrecognized.” Examiner interprets the flagging as a form of guidance.
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate speech recognition using multiple language models as taught by Weng with the language model generation system of Epstein, because doing so would have provided a way to analyze the generated confidence score for each of the models against the range of confidence scores associated with correctly recognized test data utterances (the "acceptable range of confidence scores") ([0055] of Weng).
Claims 3 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Epstein et al. (US 2015/0348541) in view of Weng et al. (US 2012/0271631) and Lee (US 2020/0152180) .
Claim 3
Epstein in view of Weng teaches the system of claim 2, except wherein the first language model and the second language model are recurrent neural network models.  
Lee teaches in [0066] The encoder and the decoder 120 may be a sequence-to-sequence encoder-decoder implemented by an encoder-decoder neural network. A neural network may be a deep neural network (DNN), as a non-limiting example. In such an example, the DNN may include one or more of a fully connected network, a deep convolutional network, a recurrent neural network ( RNN).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate speech recognition as taught by Lee with the language model generation system of Epstein in view of Weng, because doing so would have enabled the 
Claims 14
	This claim recites substantially the same limitations as those provided in claim 3 above, and therefore it is rejected for the same reasons.
Claims 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Epstein et al. (US 2015/0348541) in view of Weng et al. (US 2012/0271631) and Li et al. (US 2015/0332673) 
Claim 19
Epstein in view of Weng further teaches the method of claim 1, except wherein the method further comprises causing a presentation of the first set of candidate words responsive to the flagged actual word being selected.
Li teaches comprises causing a presentation of the first set of candidate words responsive to the flagged actual word being selected ([0139] The inventors have appreciated that techniques described herein may be beneficially applied to improve the accuracy of automatic speech recognition where errors might otherwise result due to recognition of tokens that are invalid or improbable when considered in view of their semantic classes.  Some techniques described herein may penalize or otherwise discourage invalid or improbable recognition results while biasing recognition toward results that are more probable given the semantic information that can be detected.  In some embodiments, when a likely misrecognition is detected, or when the SLM scores indicate that the user may truly have said an invalid token for a given semantic class, an alert may be generated to flag the situation for the user.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS H MAUNG whose telephone number is (571)270-5690.  The examiner can normally be reached on Monday-Friday, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 1-(571) 272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR 






/THOMAS H MAUNG/            Primary Examiner, Art Unit 2654