DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2019-0092634, filed on 7/30/2019.
Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on the following date 12/23/2019, have been considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Such claim limitation(s) is/are: 
“… a speech outputter for receiving a spoken sentence speech spoken by a user” (see claim 11, line 2)
“a speech recognizer
“a recognition failure cause analyzer configured to analyze whether a cause of the recognition failure is due to the acoustic model or the language model when the speech recognition fails” (see claims 6, 8, 11-14, 16)
“a speech recognition success determiner configured to determine whether speech recognition has been successful in the speech recognizer” (see claim 12)
“a speech recognition failure cause analyzer configured to store speech recognition failure data in the failure of the speech recognition and determine whether the failure cause is present in the acoustic model or the language model by analyzing the speech recognition failure data” (see claims 6-8, 12-14, 16)
“an acoustic model learner configured to add the recognition failure data to a learning database of the acoustic model and learn the acoustic model based on the added learning database of the acoustic model when the speech recognition failure cause is present in the acoustic model” (see claims 12, 17, 18)
“a language model learner configured to add the recognition failure data to a learning database of the language model and learn the language model based on the added learning database of the language model when the speech recognition failure cause is present in the language model” ( see claims 12, 17, 18)
“at least one of a failure cause analyzer through searching a minimum weight of the acoustic model or a failure cause analyzer through measuring reliability of the acoustic model” (see claims 7, 8, 13, 15, 16)
“a performance estimator configured to evaluate performance of a result of machine-learning in the acoustic model learner and the language model learner” (see claim 17)
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 10 is drawn to a “signal” per se as recited in the preamble and as such is non-statutory subject matter. As per filed Applicant’s Specification Par. 00125: “The example such as hard discs, floppy discs, and magnetic tapes, optical media such as CD-ROM and DVD magneto-optical media such as floptical disc, and hardware devices specially configured to store and perform program codes, such as ROM, RAM, and flash memory.”, the term ”computer readable recording medium" is not defined as to the scope and the term "such as" are open-ended.  Hence, one of ordinary skilled in the art can interpret such term to include transitory signals and non-transitory signals. It does not appear that a claim reciting a signal encoded with functional descriptive material falls within any of the categories of patentable subject matter set forth in § 101. First, a claimed signal is clearly not a "process" under § 101 because it is not a series of steps. The other three § 101 classes of machine, compositions of matter and manufactures "relate to structural entities and can be grouped as 'product' claims in order to contrast them with process claims." 1 D. Chisum, Patents § 1.02 (1994). 
The Applicant's Specification presents a broad definition as to what the “computer readable recording medium” covers and is being made to include transitory and non-transitory signals. As per filed Applicant’s Specification Par. 00125: “The example embodiments described above may be implemented through computer programs executable through various components on a computer, and such computer programs may be recorded in computer-readable media. Here, the medium may include magnetic media such as hard discs, floppy discs, and magnetic tapes, optical media such as CD-ROM and DVD magneto-optical media such as floptical disc, and hardware devices specially configured to store and perform program such as ROM, RAM, and flash memory.”, the term ”computer readable recording medium" is not defined as to the scope and the term "such as" are open-ended. Hence, it appears that the claims appear to be drawn towards transitory signals, which is not subject matter eligible. In order to overcome the present rejection, the Applicant is advised to amend the claims by using the following terminology: "non-transitory machine readable storage medium." Such example terminology has been also found in the Official Gazette 1351 OG 212.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 10 - 12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Kuo et al. (US20160253989A1)(hereinafter "Kuo").

Regarding claim 1, Kuo teaches a speech recognition method comprising: receiving a spoken sentence speech spoken by a user; (Par. 001:"Speech recognition components typically employ speech recognition processes that analyze inputs representative of a user's speech in order to determine one or more appropriate actions associated with the spoken input.").
performing speech recognition using an acoustic model and a language model stored in a speech database; (Par. 001:"Speech recognition components typically involve a large number of variables and modeling parameters", and Par. 0027:"… a speech recognition component 120 includes an acoustic model component 124", and Par. 0028:"... a speech recognition component 120 further includes a language model component 126.").
determining whether the speech recognition is successful; (Par. 0087:"FIG. 9A, in the example evaluation process 900, the cases that successfully pass the one or more dictionary [or spelling] check operations [at 930], and the cases that successfully pass the one or more emulation operations [at 922], are further analyzed using one or more force alignment operations at 938. In at least some implementations, the one or more force alignment operations at 938 include taking a text transcription of an audio segment [i.e. the reference result] and determining where in time one or more particular words occur in the audio segment, comparing those results with the speech recognition results from the speech recognition component [or “build”], and determining whether each case from the speech recognition component is acceptable [e.g., “pass”] or not acceptable [e.g., “fail”] from an alignment perspective. For example, in at least some implementations, the one or more force alignment operations at 938 may determine that a case is acceptable if at least a portion of the one or more words is reasonably closely aligned [e.g., over 50% aligned, over 75% aligned, etc.] with the reference result.").
storing speech recognition failure data when the speech recognition fails; (Par. 0032:" ... the output component 156 may store the recognition error diagnostics 160 on memory 110, or may output the recognition error diagnostics 160 via one or more output devices [e.g., display device, printer, etc.] for analysis and evaluation by the user, or may output the recognition error diagnostics 160 in any other suitable manner").
recognition error diagnostics [RED] at 402. Preparing for recognition error diagnostics at 402 may include a user making one or more selections involved in a particular “build” of a speech recognition component [e.g., acoustic model, language model, lexicon, training data, etc.].", and Par. 0049:"... the one or more analysis operations that are performed on cases having recognition errors [at 510] may include performing one or more acoustic model scoring operations on cases having recognition errors at 516. Similar to the language model, an acoustic model may determines a probability [or score] that an associated segment of speech is a particular word or sequence of words.", and Par. 0052:"... for example, suggesting adjustments of engine settings, suggesting adjustments of language model parameters [failure due to LM], suggesting adjustments of acoustic model parameters [failure due to AM], suggesting supplementation of training data, or other suitable recommendations for possible correction of errors.").
and updating the acoustic model by adding the recognition failure data to a learning database of the acoustic model when the cause of the speech recognition failure is due to the acoustic model and machine-learning the acoustic model based on the added learning database of the acoustic model and updating the language model by adding the recognition failure data to a learning database of the language model when the cause of the speech recognition failure is due to the language model and machine-learning the language model based on the added learning database of the language model. (Par. 0075:" Alternately, the evaluation process 900 may optionally provide a recommendation to provide the speech recognition component with by adding words which are currently not in the component's lexicon, adding possessive words, adding [or modifying] one or more other words or word types suitable for resolving particular transcription errors, or other suitable recommendations. For example, if the one or more transcription error analysis operations at 966 indicate that a relatively high number of errors are occurring because one or more words are not in a language model [LM] of the speech recognition component, the evaluation process 900 may identify that deficiency and may recommend that the one or more relevant words be added into the language model, or that one or more pronunciations be added to the lexicon to address these errors.", and Par. 0089:"... the one or more acoustic model analysis operations includes one or more lexicon analysis operations at 983. In at least some implementations, the internal lexicon of a speech recognition process [or “build”] specifies which words in a language can be recognized or spoken, and defines how an acoustic model expects a word to be pronounced [typically using characters from a single phonetic alphabet]. The one or more lexicon analysis operations at 983 may assess whether a particular recognition error may be attributable to one or more deficiencies of the lexicon of the acoustic model, and if so, optionally provides one or more recommendations to correct or modify the lexicon accordingly at 984.").

Regarding claim 10, Kuo teaches a computer program stored in a computer readable recording medium for executing the method according to claim 1 using a computer. (Par. 0003:"... an apparatus for diagnosing speech recognition errors may include at least one processing component, and one or more computer-readable media operably coupled to the at computer-readable media may bear one or more instructions that, when executed by the at least one processing component, perform operations including at least: performing one or more speech recognition operations …").

Regarding claim 11, Kuo teaches a speech recognition device comprising: a speech outputter for receiving a spoken sentence speech spoken by a user; (Par. 0023:” An embodiment of a system 100 for performing speech recognition error diagnosis is shown in FIG. 1. In this implementation, the system 100 includes one or more processing components 102, and one or more input/output [I/O] [outputter] components 104 coupled to a memory 110 by a bus 106.”, and Par. 001:"Speech recognition components typically employ speech recognition processes that analyze inputs representative of a user's speech in order to determine one or more appropriate actions associated with the spoken input.").
a speech recognizer for performing speech recognition using an acoustic model and a language model stored in a speech database; (Par. 001:"Speech recognition components typically involve a large number of variables and modeling parameters", and Par. 0027:"… a speech recognition component 120 further includes an acoustic model component 124", and Par. 0028:"... a speech recognition component 120 further includes a language model component 126.").
a recognition failure cause analyzer configured to analyze whether a cause of the recognition failure is due to the acoustic model or the language model when the speech recognition fails; (Par. 0038:” … the performing recognition error diagnostics [at 406] may include performing additional analysis operations on such cases to interpret errors, categorize recognition error diagnostics [RED] at 402. Preparing for recognition error diagnostics at 402 may include a user making one or more selections involved in a particular “build” of a speech recognition component [e.g., acoustic model, language model, lexicon, training data, etc.].", and Par. 0049:"... the one or more analysis operations [analyzer] that are performed on cases having recognition errors [at 510] may include performing one or more acoustic model scoring operations on cases having recognition errors at 516. Similar to the language model, an acoustic model may determines a probability [or score] that an associated segment of speech is a particular word or sequence of words.", and Par. 0052:"... for example, suggesting adjustments of engine settings, suggesting adjustments of language model parameters [failure due to LM], suggesting adjustments of acoustic model parameters [failure due to AM], suggesting supplementation of training data, or other suitable recommendations for possible correction of errors.").
and a controller (Fig. 3 component 150) configured to control the acoustic model or the language model of the speech recognizer of the speech recognizer to be updated based on the analyzed speech recognition failure cause. (Par. 0030:” FIG. 3 illustrates an embodiment of a speech recognition evaluation component 150. In this implementation, the speech recognition evaluation component 150 includes a control component 152, a recognition error diagnostics [or diagnosis] [RED] component 152, an output component 154, and an adjustment component 158. In at least some implementations, the control component 152 may receive one or more inputs for controlling the speech recognition evaluation component 150.”, and Par. 0036:"… the example evaluation process 400 includes preparing for recognition error diagnostics [RED] at recognition error diagnostics at 402 may include a user making one or more selections involved in a particular “build” of a speech recognition component [e.g., acoustic model, language model, lexicon, training data, etc.].", and Par. 0049:"... the one or more analysis operations that are performed on cases having recognition errors [at 510] may include performing one or more acoustic model scoring operations on cases having recognition errors at 516. Similar to the language model, an acoustic model may determines a probability [or score] that an associated segment of speech is a particular word or sequence of words.", and Par. 0052:"... for example, suggesting adjustments of engine settings, suggesting adjustments of language model parameters [failure due to LM], suggesting adjustments of acoustic model parameters [failure due to AM], suggesting supplementation of training data, or other suitable recommendations for possible correction of errors.").

Regarding claim 12, Kuo teaches the speech recognition device of claim 11, wherein the recognition failure cause analyzer comprises: a speech recognition success determiner configured to determine whether speech recognition has been successful in the speech recognizer; (Figure 9 and Par. 0059: A variety of suitable speech recognition components may be used for the execution of the developer's selected “build” options at 904,”, and Par. 0087:"FIG. 9A, in the example evaluation process 900, the cases that successfully pass the one or more dictionary [or spelling] check operations [at 930], and the cases that successfully pass the one or more emulation operations [at 922], are further analyzed using one or more force alignment operations at 938. In at least some implementations, the one or more force alignment operations at 938 include taking a text transcription of an audio segment [i.e. the result] and determining where in time one or more particular words occur in the audio segment, comparing those results with the speech recognition results from the speech recognition component [or “build”], and determining whether each case from the speech recognition component is acceptable [e.g., “pass”] or not acceptable [e.g., “fail”] from an alignment perspective. For example, in at least some implementations, the one or more force alignment operations at 938 may determine that a case is acceptable if at least a portion of the one or more words is reasonably closely aligned [e.g., over 50% aligned, over 75% aligned, etc.] with the reference result.").
a speech recognition failure cause analyzer configured to store speech recognition failure data in the failure of the speech recognition and determine whether the failure cause is present in the acoustic model or the language model by analyzing the speech recognition failure data; (Par. 0032:" ... the output component 156 may store the recognition error diagnostics 160 on memory 110, or may output the recognition error diagnostics 160 via one or more output devices [e.g., display device, printer, etc.] for analysis and evaluation by the user, or may output the recognition error diagnostics 160 in any other suitable manner", and Par. 0036:"… the example evaluation process 400 includes preparing for recognition error diagnostics [RED] at 402. Preparing for recognition error diagnostics at 402 may include a user making one or more selections involved in a particular “build” of a speech recognition component [e.g., acoustic model, language model, lexicon, training data, etc.].", and Par. 0049:"... the one or more analysis operations that are performed on cases having recognition errors [at 510] may include performing one or more acoustic model scoring operations on cases having recognition errors at 516. Similar to the language model, an acoustic model may language model parameters [failure due to LM], suggesting adjustments of acoustic model parameters [failure due to AM], suggesting supplementation of training data, or other suitable recommendations for possible correction of errors.").
an acoustic model learner configured to add the recognition failure data to a learning database of the acoustic model and learn the acoustic model based on the added learning database of the acoustic model when the speech recognition failure cause is present in the acoustic model; (Par. 0089:"... the one or more acoustic model analysis operations includes one or more lexicon analysis operations at 983. In at least some implementations, the internal lexicon of a speech recognition process [or “build”] specifies which words in a language can be recognized or spoken, and defines how an acoustic model expects a word to be pronounced [typically using characters from a single phonetic alphabet]. The one or more lexicon analysis operations at 983 may assess whether a particular recognition error may be attributable to one or more deficiencies of the lexicon of the acoustic model, and if so, optionally provides one or more recommendations to correct or modify the lexicon accordingly at 984.", and Par. 0090:” If it is determined that the recognition error may be correctable via one or more adjustments to the LTS parameter[s] [at 985], then the evaluation process 900 may recommend one or more adjustments [or fixes] to one or more LTS parameters of the acoustic model at 986.”, and Par. 0131:” FIG. 12 is a diagram of an embodiment of a computer system [learner] environment 1200 for performing operations associated with evaluating speech recognition processes.”).
speech recognition component with more data at 969, such as by adding words which are currently not in the component's lexicon, adding possessive words, adding [or modifying] one or more other words or word types suitable for resolving particular transcription errors, or other suitable recommendations. For example, if the one or more transcription error analysis operations at 966 indicate that a relatively high number of errors are occurring because one or more words are not in a language model [LM] of the speech recognition component, the evaluation process 900 may identify that deficiency and may recommend that the one or more relevant words be added into the language model, or that one or more pronunciations be added to the lexicon to address these errors." , and Par. 0090:” If it is determined that the recognition error may be correctable via one or more adjustments to the LTS parameter[s] [at 985], then the evaluation process 900 may recommend one or more adjustments [or fixes] to one or more LTS parameters of the acoustic model at 986.”, and Par. 0131:” FIG. 12 is a diagram of an embodiment of a computer system [learner] environment 1200 for performing operations associated with evaluating speech recognition processes.”).


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Kuo, as applied to claim 1, in further view of  Han et al. (US20120095766A1)(hereinafter "Han").


Kuo does not teach the speech recognition method of claim 1, wherein the performing of the speech recognition comprises selecting a result of the highest final score among a plurality of speech recognition result candidates as a speech recognition result, wherein the final score is calculated by multiplying a score of the acoustic model by a weight and then adding a score of the language model.
Han teaches wherein the performing of the speech recognition comprises selecting a result of the highest final score among a plurality of speech recognition result candidates as a speech recognition result, (Par. 0092:” The outputter 146 may output a sentence having a highest integrated score or more than one sentence having relatively high integrated scores as results of the recognition of the input word string.”).
wherein the final score is calculated by multiplying a score of the acoustic model by a weight and then adding a score of the language model. (Par. 0091:” The integrated score calculator 144 may calculate the integrated score of the input word string by adding up the acoustic model score and the bidirectional language model score of the input word string, which are provided by the acoustic model score calculator 142 and the bidirectional language model score calculation unit 130, respectively. The acoustic model score and the bidirectional language model score of the input word string may be the logarithmic values of the probabilities of the input word string calculated using the acoustic model and using the forward and backward language models 160 and 170. The integrated score calculator 144 1] may apply different weights to the acoustic model score and the bidirectional language model score of the input word string and 2] may thus adjust the ratio between the acoustic model score and language model score of the input word string in the integrated score of the input word string.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kuo in view of Han to select a result of the highest final score among a plurality of speech recognition result candidates as a speech recognition result, wherein the final score is calculated by multiplying a score of the acoustic model by a weight and then adding a score of the language model, in order to overcome the limits of speech recognition using a unidirectional language model that often fails to produce a proper sentence based on a combination of words that can be represented successfully simply using a given corpus, as evidence by Han (See Par. 0059)

Claims 9, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Kuo, as applied to claim 1, 11, 11 respectively in further view of  Konopka et al. (US20030036903A1)(hereinafter " Konopka").

Regarding claim 9, 17, and 18 Kuo teaches a speech recognition method.
Regarding claim 9 Kuo does not teach the speech recognition method of claim 1, wherein the updating of the acoustic model comprises evaluating performance of a result of machine-learning the learned acoustic model and updating the acoustic model when the improvement of the speech recognition performance is confirmed, and wherein the updating of the language model comprises evaluating performance of a result of machine-learning the 
Konopka teaches wherein the updating of the acoustic model comprises evaluating performance of a result of machine-learning the learned acoustic model and updating the acoustic model when the improvement of the speech recognition performance is confirmed, and wherein the updating of the language model comprises evaluating performance of a result of machine-learning the learned language model and updating the language model when the improvement of the speech recognition performance is confirmed. (Par. 0001:”In this document, the term “speech models” refers collectively to those components of a speech recognition system that reflect the language which the system models, including for example, acoustic speech models, pronunciation entries, grammar models.”, and Par. 0036:” Preferably, the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kuo in view of Konopka to update of the acoustic model comprises evaluating performance of a result of machine-learning the learned acoustic model and updating the acoustic model when the improvement of the speech recognition performance is confirmed, in order to improve speech recognition of 

Regarding claim 17, Kuo does not teach the speech recognition device of claim 12, further comprising: a performance estimator configured to evaluate performance of a result of machine-learning in the acoustic model learner and the language model learner. 
Konopka teach the speech recognition device of claim 12, further comprising: a performance estimator configured to evaluate performance of a result of machine-learning in the acoustic model learner and the language model learner. (Par. 0001:”In this document, the term “speech models” refers collectively to those components of a speech recognition system that reflect the language which the system models, including for example, acoustic speech models, pronunciation entries, grammar models.”, and Par. 0036:” Preferably, the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Kuo in view of Konopka to evaluate performance of a result of machine-learning in the acoustic model learner and the language model learner, in order to improve speech recognition of utterances from the class of 

Regarding claim 18, Kuo does not teach the speech recognition device of claim 17, wherein when it is confirmed that the speech recognition performance evaluated by the performance estimator is improved, the controller controls the acoustic model or the language model of the speech recognizer of the speech recognizer to be updated to a model learned by the acoustic model learner or the language model learner.
Konopka teach the speech recognition device of claim 17, wherein when it is confirmed that the speech recognition performance evaluated by the performance estimator is improved, the controller controls the acoustic model or the language model of the speech recognizer of the speech recognizer to be updated to a model learned by the acoustic model learner or the language model learner. (Par. 0001:”In this document, the term “speech models” refers collectively to those components of a speech recognition system that reflect the language which the system models, including for example, acoustic speech models, pronunciation entries, grammar models.”, and Par. 0036:” Preferably, the speech recognition apparatus at the user's site uses both the updated speech models and the original speech models stored thereat to determine respective best matches to new utterances. If subsequent utterances are determined to be better matched to the updated speech models, the original speech models are replaced by the updated models.”).
.


Allowable Subject Matter
Claims 3 - 8, and 13 - 16 are objected to as being dependent upon a rejected base claims, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim 3 recites “The speech recognition method of claim 2, wherein the determining of whether the failure cause is due to the acoustic model or the language model comprises: changing weights for the score of the acoustic model; re-extracting speech recognition results for the speech recognition failure data according to the changed weights; calculating speech recognition error rates between the re-extracted speech recognition results and an input spoken text; determining a speech recognition result of which an error rate is the minimum among the calculated speech recognition error rates; determining an acoustic model weight of the speech recognition result of which the error rate is the minimum; and comparing the determined weight with a previously set weight to determine whether the failure cause is an 
Claim 4 depends from claim 3, which is also allowable for its dependency on allowable base claim 3. 

Claim 5 recites “The speech recognition method of claim 1, wherein the analyzing of the speech recognition failure data to determine whether the failure cause is due to the acoustic model or the language mode comprises: calculating an output of the acoustic model representing a probability distribution in each class for a given input value of the acoustic model; calculating an entropy for the output value every frame input to measure the reliability of the acoustic model; calculating an average of the calculated entropies; comparing whether the average of the entropies is greater than a threshold; and determining the failure cause as 

Claim 6 recites “The speech recognition method of claim 1, wherein the analyzing of the speech recognition failure data to determine whether the failure cause is due to the acoustic model or the language mode comprises: analyzing the speech recognition failure data by a plurality of speech recognition failure cause analyzer, multiplying an output of each speech recognition failure cause analyzer by an output weight, and comparing a final value obtained by 


Claims 7, and 8 depends from claim 6, which are also allowable for its dependency on allowable base claim 6. 
Claim 13, recites “The speech recognition device of claim 12, wherein the speech recognizer is configured to select a result of the highest final score among a plurality of speech 

Claims 14, 15, and 16 depends from claim 13, which are also allowable for substantially similar reasons. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Kim et al. (US20190005947A1) teaches a speech recognition method includes obtaining an original learning data set for the recognition target language, constructing a target label by dividing the text information included in each piece of original learning data in letter units, and building an acoustic model based on a deep neural network by learning the learning speech data included in the each piece of original learning data and the target label corresponding to the learning speech data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689.  The examiner can normally be reached on Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/D.A./             Examiner, Art Unit 2656                                                                                                                                                                                           
/Paras D Shah/             Primary Examiner, Art Unit 2659                                                                                                                                                                                           

05/20/2021