DETAILED ACTION
This office action is in response to Applicant’s submission filed on 8/17/2021. Claims 1- 4, 6 – 14, 16 - 20 are pending in the application.  Claims 5, 15 are cancelled. As such, claims 1- 4, 6 – 14, 16 - 20have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. CN201810949733.4, filed on 8/20/2018.
Drawings
The drawing filed on 2/1/2019 have been accepted and considered by the examiner.

Response to Argument
Applicant's arguments filed with respect to the 35 USC §101 rejections raised in the previous office action have been fully considered and are persuasive. The claimed invention, as currently amended, overcomes the 35 USC §101 rejections. Therefore, the 35 USC §101 rejections are withdrawn.
With respect to the rejection of claims 1-3, 9-13, 19 and 20 under 35 U.S.C. §102(a)(1) as being anticipated by Prabhavalkar et al. (U.S. Patent Application 
Claims 1-3, 9-13, 19 and 20 were rejected under 35 USC 102(a)(2) as being anticipated by U.S. patent publication 2020/0027. 
Claim 1 recites in part: 
receiving a speech utterance; 
inferring a sequence of phonemes from the speech utterance according to an acoustic model; 
computing a plurality of transcription hypotheses from the sequence of phonemes; 
computing, according to a first model, a first model score for the transcription; 
computing, according to a second model, a second model score for the transcription; 
computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; 
choosing one of the transcriptions based on the hybrid score; and outputting the chosen hypothesis as the transcription of the speech. 
Prabhavalkar does not disclose the invention as recited in claim 1. Prabhavalkar discloses has a first model (a LAS model) and a second model (an FST-based LM). Prabhavalkar describes an interpolation between the two models that includes weighting the second model by an LM weight A (lambda). ¶ [0127] However, Prabhavalkar does not disclose or suggest a way of choosing A. An ordinarily skilled practitioner in the art would assume A to be a constant. In particular, there is no indication that a weight in Prabhavalkar's system can be dynamic. 
For at least these reasons, Prabhavalkar fails to anticipate every element of claim 1, and claim 1 is patentable over Prabhavalkar. Independent claim 11 includes similar patentable elements as claim 1 and should be patentable for at least the same reasons. 
Furthermore, claims 2-4, 7-10, 12-14, and 17-20 recite specific ways in of implementing a dynamic weight, none of which Prabhavalkar contemplates. 
For the foregoing reasons, and by virtue of their dependence from a novel claim, dependent claims 2-3, 9-10, 12-13, 19 and 20 are allowable under 35 USC 102. Their rejections should be withdrawn. 
Claims 5 and 15 are cancelled and so their rejections should be withdrawn. 
Rejection under 35 U.S.C. §103
Claims 4, 7, 14 and 17 are rejected under 35 U.S.C. §103(a) as allegedly being unpatentable over Prabhavalkar in view of U.S. patent publication no. 2012/0271617 (Nakajima). Claims 6 and 16 are rejected under 35 U.S.C. §103(a) as allegedly being unpatentable over Prabhavalkar in view of "Joint Unsupervised Adaptation of N-gram and RNN Language Models," December 2017 (Masumura). Claims 8 and 18 are rejected under 35 U.S.C. §103(a) as allegedly being 
Regarding claim 4, 7, 14, and 17, Nakajima proposes adapting a likelihood score of a model based on rule-based semantic analysis. However, claims 4, and 14 are directed to using semantic information to change the weighting between models. Computing weighting between language models, to the extent that it might be known as an advanced technique in speech recognition could be done by multiple methods. Doing so 10 
according to semantic information is not obvious just because Nakajima uses semantic information in computing scores from a model. 
Claims 7 and 17 refer to using rule-based logic to generate interpolation weights. 
Nakajima uses rule-based grammar analysis of semantic information to provide inputs that affect the scores output by a model. Rules for computing interpolation weights would be different types of rules altogether from the grammar rules that Nakajima proposes to use for affecting language model scores. 
As described with regard to rejections under 35 USC 102, Prabhavalkar does not anticipate independent claim 1 or claim 11. Nakajima, Masamura, and Lee fail to cure the deficiencies of Prabhavalkar with respect to claims 4, 6-8, 14, and 16-18. 
For all of the reasons set forth above, claims 4, 6-8, 14 and 16-18 are non-obvious at least by virtue of their depending from independent claims 1 and 11, and should be patentable as being dependent on an allowable base claim in addition to the patentably distinguishing elements they each recite. 

Toward the end of page 9, Applicant states: Prabhavalkar does not disclose the invention as recited in claim 1. Prabhavalkar discloses has a first model (a LAS model) and a second model (an FST-based LM). Prabhavalkar describes an interpolation between the two models that includes weighting the second model by an LM weight λ. ¶ [0127] However, Prabhavalkar does not disclose or suggest a way of choosing λ. An ordinarily skilled practitioner in the art would assume λ to be a constant. In particular, there is no indication that a weight in Prabhavalkar's system can be dynamic.
Examiner respectfully disagrees that “An ordinarily skilled practitioner in the art would assume λ to be a constant”. Unless explicitly mentioned by Prabhavalkar, I am 
Furthermore, the amendment which introduced new limitations of: “a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; choosing one of the transcriptions based on the hybrid score; and outputting the chosen hypothesis as the transcription of the speech.”, necessitated a new ground for prior art selection where Gillick et al. (U.S. Patent Publication number:  US6167377A), hereinafter referred to Gillick, provides necessary support to maintain the rejection. Gillick teaches (Gillick, "Col. 1, Line 63 - Col. 2 Line 1:”The invention provides a dynamic interpolation technique for use in combining scores from a collection of language models to produce a single language model score. The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.”, and Col. 5, Line 34-37:”As discussed below, the recognizer 215 uses a dynamic interpolation technique to combine the results of a collection of language models to produce a single language model result.”, and Col. 2, Line 13 – 26:”The combination expression may combine the language model results using combination weights associated with the language models. For example, when the language model results are numerical scores that indicate likelihoods associated with candidates, the combination expression may multiply the numerical score for a language model by a combination weight associated with the language model. The adjusting the combination weights using language model results associated with the selected candidate.”, and Col. 1, Line 65 - Col. 2 Line 1:”The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.", and Col. 16, Line 20 – 25:”This process may be referred to as interpolating the two language models using the interpolation weights λ1, λ2.  The recognizer 215 uses the combined language model scores for the candidate words as one factor in identifying a candidate word wx that best corresponds to a user's utterance or a portion of the utterance [step 1315]”).
Examiner believes any concerns Applicant may have toward the interpolation being dynamic should be resolved when Gillick teachings are considered. Consequently, Examiner respectfully disagree and finds the Applicant’s argument moot in view of Prabhavalkar, and Gillick as mentioned supra.
For at least the supra provided reasons, Applicant’s arguments are found not persuasive. Examiner respectfully disagrees, and therefore, the rejections of Claims 1, and 11 under 35 U.S.C. §102(a)(1) are sustained under now U.S.C. §103 and further updated accordingly.
In response to the art rejection of the remainder of dependent claims 2 – 10, 9, 12 – 20 rejected under 35 U.S.C. §103 in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in remarks filed 8/17/2021, Examiner respectfully notes as follows. For completeness, should the mentioned claim(s) is (are) likewise traversed for similar reasons to independent supra reasons provided in the response directed towards claim 1, and 11 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and as such Applicant’s arguments are also found not persuasive. Consequently, claim rejections for claim 1 – 20 are sustained.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.


Claims 1 – 3, 9 - 13, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar  (US20200027444A1) and Gillick et al. (US6167377A)(hereinafter " Gillick").

Prabhavalkar was applied in the previous Office Action.
Regarding claim 1, Prabhavalkar teaches a method of speech transcription, the method comprising: computing a transcription hypothesis from a sequence of phonemes (Par. 0027:” ...determining, based on output of the speech recognition model in response to processing of input data for the training example, a set of hypotheses using beam search decoding; identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”).
receiving a speech utterance; (Prabhavalkar, Par. 0008:” receiving audio data for an utterance”)
inferring a sequence of phonemes from the speech utterance according to an acoustic model; (Prabhavalkar, Par. 0008:” The method includes receiving audio data for an utterance, providing features indicative of acoustic characteristics of the utterance as input to an encoder neural network, and processing an output of the encoder neural network using an attender neural network to generate a context vector.”).
plurality of transcription hypotheses from the sequence of phonemes; (Prabhavalkar, Par. 0027:” identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”).
computing, according to a first model, a first model score for the transcription; (Prabhavalkar, Par. 0020:” In some implementations, generating a transcription for the utterance comprises: generating language model scores for the multiple candidate transcriptions using a language model; and determining the transcription based on the language model scores generated using the language model”).
computing, according to a second model, a second model score for the transcription (Prabhavalkar, Par. 0127:” For example, a log-linear interpolation can be done between the LAS model [First model] and a finite-state transducer [FST]-based LM [Second model] trained to go from graphemes to words at each step [same transcription] of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”).
Prabhavalkar does not teach computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; choosing one of the transcriptions based on the hybrid score; and outputting the chosen hypothesis as the transcription of the speech.

Gillick teaches computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; (Gillick, "Col. 1, Line 63 - Col. 2 Line 1 :”The invention provides a dynamic interpolation technique for use in combining scores from a collection of language models to produce a single language model score. The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.”, and Col. 5, Line 34-37:”As discussed below, the recognizer 215 uses a dynamic interpolation technique to combine the results of a collection of language models to produce a single language model result.”, and Col. 2, Line 13 – 26:”The combination expression may combine the language model results using combination weights associated with the language models. For example, when the language model results are numerical scores that indicate likelihoods associated with candidates, the combination expression may multiply the numerical score for a language model by a combination weight associated with the language model. The combination expression may be adjusted by adjusting the combination weights using language model results associated with the selected candidate.”, and Col. 1, Line 65 - Col. 2 Line 1:”The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.").
choosing one of the transcriptions based on the hybrid score; and (Gillick, Col. 16, Line 20 - 25) This process may be referred to as interpolating the two language models using the interpolation weights λ1, λ2.  The recognizer 215 uses the combined language model scores for the candidate words as one factor in identifying a candidate word wx that best corresponds to a user's utterance or a portion of the utterance [step 1315].”).
outputting the chosen hypothesis as the transcription of the speech. (Gillick, Col. 16, Line 45 - 48) the recognizer may use the dynamic interpolation technique to update the interpolation weights after a series of k words (w1, w2 . . . wk) have been identified as best recognition candidates.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar in view of Gillick to interpolate between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic, in order to permit speech recognition systems to make more effective use of a variety of language models, as evidence by Gillick (see Col. 2, lines 2-3).

Regarding claim 2, which depend from claims 1, Prabhavalkar further teaches “wherein at least one interpolation weight is conditioned on the content of the transcription”. (Prabhavalkar, Par. 0051:” ... each chunk having a first predetermined speech content being predicted at the current time step and a second predetermined number of speech frames representing speech occurring after the speech content being predicted in the current time step.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”)

Regarding claim 3, which depend from claim 2, Prabhavalkar further teaches “wherein the conditioning is based on word presence”. (Prabhavalkar, Par. 0054:” In some implementations, the speech recognition model is configured to output at least one probability distribution for each chunk, wherein the probability distribution can indicate an element that does not correspond to a word element as a most likely prediction.”, and Par. 0044:”…. generating the transcription for the utterance comprises using beam search decoding to generate one or more candidate transcriptions based on the word element [word presence] scores”).


score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η”). Note: the negative sign before lambda in the following equation is acting as a compression factor.
                
                    
                        
                            y
                        
                        
                            "
                        
                    
                    =
                    
                        
                            
                                
                                    argmin
                                
                                
                                    y
                                
                            
                        
                        ⁡
                        
                            -
                             
                            l
                            o
                            g
                            P
                            (
                            y
                            |
                            x
                            )
                        
                    
                    -
                     
                    λ
                     
                    l
                    o
                    g
                    
                        
                            P
                        
                        
                            L
                            M
                        
                    
                    
                        
                            x
                        
                    
                    -
                     
                    η
                    c
                    o
                    v
                    e
                    r
                    a
                    g
                    e
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                    (
                    7
                    )
                
            

Regarding claim 10, which depend from claim 1, Prabhavalkar further teaches “wherein the interpolation generates the hybrid score using a weighted sum function”. (Prabhavalkar, Par. 0010:” In some implementations, the context vector is a weighted sum of multiple encoder outputs for the utterance.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion.”)


instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 504, the storage device 506, or memory on processor 502”, and Par. 0027:” ...determining, based on output of the speech recognition model in response to processing of input data for the training example, a set of hypotheses using beam search decoding; identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”) 
receiving a speech utterance; (Prabhavalkar, Par. 0008:” receiving audio data for an utterance”)
computing a plurality of transcription hypotheses from a sequence of phonemes; inferred from an acoustic model; (Prabhavalkar, Par. 0027:” identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”, and Par. 0008:” providing features indicativeacoustic characteristics of the utterance as input to an encoder neural network”).
computing, according to a first model, a first model score for the transcription; (Prabhavalkar, Par. 0020:” In some implementations, generating a transcription for the utterance comprises: generating language model scores for the multiple transcriptions using a language model; and determining the transcription based on the language model scores generated using the language model”).
computing, according to a second model, a second model score for the transcription (Prabhavalkar, Par. 0127:” For example, a log-linear interpolation can be done between the LAS model [First model] and a finite-state transducer [FST]-based LM [Second model] trained to go from graphemes to words at each step [same transcription] of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”).
Prabhavalkar does not teach computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; choosing one of the transcriptions based on the hybrid score; and outputting the chosen hypothesis as the transcription of the speech.

Gillick teaches computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic; (Gillick, "Col. 1, Line 63 - Col. 2 Line 1 :”The invention provides a dynamic interpolation technique for use in combining scores from a collection of language models to produce a single language model score. The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.”, and Col. 5, Line 34-37:”As discussed below, the recognizer 215 uses a dynamic interpolation technique to combine the results of a collection of language models to produce a single language model result.”, and Col. 2, Line 13 – 26:”The combination expression may combine the language model results using combination weights associated with the language models. For example, when the language model results are numerical scores that indicate likelihoods associated with candidates, the combination expression may multiply the numerical score for a language model by a combination weight associated with the language model. The combination expression may be adjusted by adjusting the combination weights using language model results associated with the selected candidate.”, and Col. 1, Line 65 - Col. 2 Line 1:”The dynamic interpolation technique dynamically assigns weights to the scores of each model in a way that emphasizes the most effective language models.").
choosing one of the transcriptions based on the hybrid score; and (Gillick, Col. 16, Line 20 - 25) This process may be referred to as interpolating the two language models using the interpolation weights λ1, λ2.  The recognizer 215 uses the combined language model scores for the candidate words as one factor in identifying a candidate word wx that best corresponds to a user's utterance or a portion of the utterance [step 1315].”).
outputting the chosen hypothesis as the transcription of the speech. (Gillick, Col. 16, Line 45 - 48) the recognizer may use the dynamic interpolation technique to update the interpolation weights after a series of k words (w1, w2 . . . wk) have been identified as best recognition candidates.”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar in view of Gillick to interpolate between the first model score and second model score using interpolation weights, where a first interpolation weight is applied to the first model score and a second interpolation weight is applied to the second model score and at least one of the weights is dynamic, in order to permit speech recognition systems to make more effective use of a variety of language models, as evidence by Gillick (see Col. 2, lines 2-3).

Regarding claim 12, which depend from claims 11, Prabhavalkar further teaches “wherein at least one interpolation weight is conditioned on the content of the transcription”. (Prabhavalkar, Par. 0051:” ... each chunk having a first predetermined number of speech frames representing speech occurring before speech content being predicted at the current time step and a second predetermined number of speech frames representing speech occurring after the speech content being predicted in the current time step.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”)

Regarding claim 13, which depend from claims 12, Prabhavalkar further teaches “wherein the conditioning is based on word presence”. (Prabhavalkar, Par. 0054:” In some implementations, the speech recognition model is configured to output at least one probability distribution for each chunk, wherein the probability distribution can indicate an element that does not correspond to a word element as a most likely prediction.”, and Par. 0044:”…. generating the transcription for the utterance comprises using beam search decoding to generate one or more candidate transcriptions based on the word element [word presence] scores”).

Regarding claim 19, which depend from claim 11, Prabhavalkar further teaches “wherein the first model score and second model score are generated using a function that compresses the values of the first model score and the second model score”, (Prabhavalkar, Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a 
                
                    
                        
                            y
                        
                        
                            "
                        
                    
                    =
                    
                        
                            
                                
                                    argmin
                                
                                
                                    y
                                
                            
                        
                        ⁡
                        
                            -
                             
                            l
                            o
                            g
                            P
                            (
                            y
                            |
                            x
                            )
                        
                    
                    -
                     
                    λ
                     
                    l
                    o
                    g
                    
                        
                            P
                        
                        
                            L
                            M
                        
                    
                    
                        
                            x
                        
                    
                    -
                     
                    η
                    c
                    o
                    v
                    e
                    r
                    a
                    g
                    e
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                     
                    (
                    7
                    )
                
            


Regarding claim 20, which depend from claim 11, Prabhavalkar further teaches “wherein the interpolation generates the hybrid score using a weighted sum function”. (Prabhavalkar, Par. 0010:” In some implementations, the context vector is a weighted sum of multiple encoder outputs for the utterance.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion.”)

Claims 4, 7, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar  (US20200027444A1), and  Gillick  (US6167377A) as applied to claim 2, 1, 12, and 11 respectively, in further view of Nakajima (US20120271617A1).

Nakajima was applied in the previous Office Action.
Regarding claim 4, Prabhavalkar, and Gillick teach a method of speech transcription.


With respect to claim 4, Nakajima teaches wherein the conditioning is based on semantic information (Nakajima, Par. 0024:” … Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood. In some instances, however, the likelihood that a particular word or phrase occurs in a particular context depends on the frequency of previous uses of the word or phrase, regardless of the semantic or grammatical accuracy of the word or phrase”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Nakajima to include semantic information to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).

Regarding claim 14 Prabhavalkar, and Gillick teach a method of speech transcription.

With respect to claim14, Nakajima teaches wherein the conditioning is based on semantic information (Nakajima, Par. 0024:” … Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood. In some instances, however, the likelihood that a particular word or phrase occurs in a particular context depends on the frequency of previous uses of the word or phrase, regardless of the semantic or grammatical accuracy of the word or phrase”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Nakajima to include semantic information to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).




Prabhavalkar, and Gillick do not teach wherein the conditioning is based on semantic information, wherein the interpolation weights are generated using rule-based logic.
With respect to claims 7, Nakajima teaches wherein the interpolation weights are generated using rule-based logic (Nakajima, Par. 0024:” .... Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Nakajima to include rule-based logic in order to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).

Regarding claim 17 Prabhavalkar, and Gillick teach a method of speech transcription.

With respect to claim 17, Nakajima teaches wherein the interpolation weights are generated using rule-based logic (Nakajima, Par. 0024:” .... Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Nakajima to include rule-based logic in order to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).


Claims 6, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar  (US20200027444A1), and  Gillick  (US6167377A)  as applied to claim 1, and 11 respectively, in further view of Masumura et al. “Joint Unsupervised Adaptation of N-gram and RNN Language Models, Dec 2017” (hereinafter “Masumura”)

Masumura was applied in the previous Office Action.
Regarding claim 6, Prabhavalkar, and Gillick teach a method of speech transcription.
Prabhavalkar, and Gillick do not teach wherein the first model is an n-gram model and the second model is a neural network, wherein the first model score and second model score are generated using a function that compresses the values of the first model score and the second model score.  
With respect to claim 6, Masumura teaches further wherein the first model is an n-gram model and the second model is a neural network (Masumura, Section III Page 1589:”In language modeling, mixture models are composed by combining two or more LMs trained from disparate sources with mixture weights. The mixture models were often introduced for domain adaptation or unsupervised adaptation. N-gram mixture models and RNN mixture models are shown in Fig. 1.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Masumura to include first model as n-gram model and the second model as a neural network, and compressing the values of the first model score and the second model score in order to show joint unsupervised adaptation method outperformed a method where no modeling was adapted and one that 

Regarding claim 16 Prabhavalkar, and Gillick teach a method of speech transcription.
Prabhavalkar, and Gillick do not teach wherein the first model is an n-gram model and the second model is a neural network, wherein the first model score and second model score are generated using a function that compresses the values of the first model score and the second model score.  
With respect to claim 16 Masumura teaches further wherein the first model is an n-gram model and the second model is a neural network (Masumura, Section III Page 1589:”In language modeling, mixture models are composed by combining two or more LMs trained from disparate sources with mixture weights. The mixture models were often introduced for domain adaptation or unsupervised adaptation. N-gram mixture models and RNN mixture models are shown in Fig. 1.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Masumura to include first model as n-gram model and the second model as a neural network, and compressing the values of the first model score and the second model score in order to show joint unsupervised adaptation method outperformed a method where no modeling was adapted and one that .

Claims 8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar  (US20200027444A1), and  Gillick  (US6167377A)  as applied to claim 1, and 11 respectively, in further view of Jihyun Lee (US 20200160838 A1)(hereinafter “Lee”)

Lee was applied in the previous Office Action.
Regarding claim 8, Prabhavalkar, and Gillick teaches a method of speech transcription.
Prabhavalkar, and Gillick do not teach wherein the interpolation weights are generated using a neural network.
Lee teaches wherein the interpolation weights are generated using a neural network. (Lee, Par. 0030:” … implement a decoder configured to determine a first score of candidate texts based on the encoded speech, implement a weight determiner configured to determine weights for each of the respective language models based on an output of the encoder, determine a second score for the candidate texts based on the respective language models, apply the weights to the second score of the candidate texts obtained from the respective language models to obtain a weighted second score, …., based on a sum of the first score and the weighted second score corresponding to the target candidate text.”, and Par. neural network”, and Par. 0060:” … accurate result of speech recognition by dynamically determining a weight to be applied to an output of at least one language model based on a situation …., adjust a combination weight to be applied to an output of a language model based on the domain obtained through the classification, and thus effectively adjust an influence of the language model on a result of the speech recognition based on the domain of the speech input”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Lee to include neural network to generate interpolation weights in order to extract a feature value from the input speech, and determine the weight using a neural network-based weight determiner configured to output a weight corresponding to the extracted feature value, as evidence by Lee (see Par. 0027).

Regarding claim 18 Prabhavalkar, and Gillick teaches a method of speech transcription.
Prabhavalkar, and Gillick do not teach wherein the interpolation weights are generated using a neural network.
Lee teaches wherein the interpolation weights are generated using a neural network. (Lee, Par. 0030:” … implement a decoder configured to determine a first score of candidate texts based on the encoded speech, implement a weight weights for each of the respective language models based on an output of the encoder, determine a second score for the candidate texts based on the respective language models, apply the weights to the second score of the candidate texts obtained from the respective language models to obtain a weighted second score, …., based on a sum of the first score and the weighted second score corresponding to the target candidate text.”, and Par. 0035:”Each of the encoder, the decoder, and the weight determiner may be implemented on a neural network”, and Par. 0060:” … accurate result of speech recognition by dynamically determining a weight to be applied to an output of at least one language model based on a situation …., adjust a combination weight to be applied to an output of a language model based on the domain obtained through the classification, and thus effectively adjust an influence of the language model on a result of the speech recognition based on the domain of the speech input”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar, and Gillick in view of Lee to include neural network to generate interpolation weights in order to extract a feature value from the input speech, and determine the weight using a neural network-based weight determiner configured to output a weight corresponding to the extracted feature value, as evidence by Lee (see Par. 0027).


Conclusion
The following prior arts made of record and not relied upon is considered pertinent to applicant’s disclosure.
Willett (U.S. Patent Application No. : US20200043468A1) teaches (Par. 0047):” an example method of controlling the parameters during a decoding process. A method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition (302), receiving speech at an automatic speech recognition system (304), applying, by the automatic speech recognition system, the set of parameters to process the speech to yield text (306), and outputting the text from the automatic speech recognition system (308). In one aspect, the parameters in the set of parameters can include hyper parameters and/or can be used to replace one or more previously fixed parameters in the system. The set of parameters can include one or more of a word insertion penalty, a language model scale, an acoustic model scale, a silence prior penalty, a silence prior, a word penalty, and a beam pruning width, a duration model scale, other search pruning control parameters, and language model interpolation weights. Any parameter now existing or introduced in future signal processing can be used to train the model for use in estimating a set of parameters for processing the speech. In one aspect, the method can be dynamic and occur continuously while receiving and processing speech, and in another aspect, the method can be performed in batch mode. Hybrid aspects are also contemplated where parameters are partially estimated for updating at the decoder and partially remain fixed.”
Behzadi (U.S. Patent Application No. : US20180012594A1) teaches (Par. 0039):” the initial language model 130a and the adjusted language model 130b as different dynamic states of the language model 130, in some implementations, the ASRM 110 may instead generate a new language model that provides the interpolated to transcription scores for the particular n-grams that correspond to the follow-up queries specified for the initial voice query 104a. In such implementations, the ASRM 110 may be capable of dynamically selecting a particular language model, from among a plurality of available language models, to generate a transcription for a subsequent voice query based on various types of data associated with the user 102 as described herein.”
Akbacak (U.S. Patent Application No. : US20150370787A1) teaches (Par. 0073):” a pair of language models which are most similar to each other with respect to some metric are merged iteratively (combined with equal weights, in one embodiment). A symmetric Kullback Leibler distance, which is typically used to compute distance between two probability distributions, or a similar metric may be used. In another embodiment, K-Means clustering is applied, where the candidate language models are first separated into N bins for N number of clusters. A language model is computed using linear interpolation of the language models inside it. Each language model is then moved to the bin which is the most similar, again using some distance or similarity metric.”
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DARIOUSH AGAHI/             Examiner, Art Unit 2656                                                                                                                                                                                           
/EDGAR X GUERRA-ERAZO/             Primary Examiner, Art Unit 2656