DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on 8/1/2019 has/have been considered by the examiner.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter without significantly more. The claims as whole, considering all claim elements both individually and in combination, do not amount to significantly more than an abstract idea.
	
	The independent claims 1, and 11 recites: “A method of speech transcription, the method comprising: computing a transcription hypothesis from a sequence of phonemes; 
The limitation of “computing a transcription hypothesis”, “computing according to a first model”, “computing according to a second model”, and “computing a hybrid score by interpolation” as drafted covers a mathematical algorithm (computational) activities, as such they all point to an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, claim 11 recites a non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for speech transcription as per the independent claim. For example, in Par. 0049 (also in FIG. 9) in the as filed specification states: “computing system 900, one or more processors 910 and main memory 920, a mass storage device 930, portable storage medium drive(s) 940, output devices 950, user input devices 960, a graphics display 970, and peripheral devices 980 are connected to each other by a single bus 995. However, the components may be connected through one or more data transport means. The processor unit 910 and main memory 920 may be connected via a local microprocessor bus, and the mass storage device 930, peripheral device(s) 980, portable storage device 940, and display system 970 may be connected via one or more input/output (I/O) buses. [0051] Mass storage device 930, which may be implemented with a magnetic disk drive, an optical disk drive, a flash drive, or other device, is a non-volatile storage 
Furthermore, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer which due to lack of specificity it is considered as a general computer (or processor) -see par. 0057 of the Applicant’s Specification “the computer system 900 of FIG. 9 can be a personal computer, hand held computing device, smart phone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device”. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Moreover, the limitation in the claims noted above taken individual or as an ordered set do not amount to significantly more than judicial exception. As such they are directed to an abstract idea as discussed, which performs mathematical concept activity. Thus neither of the additional elements nor limitations ‘as taken individually or ordered set’ amount to significantly more solution activity. The claims are not patent eligible.
	Claims 2 and 12 are directed toward mathematical concept. Wherein the dynamic variable is conditioned on the content of the transcription which is a mathematical concept which can be carried out with a generic computer. The claim does not include additional 
	Claim 3 and 13 are directed toward mathematical concept. Wherein the conditioning is based on word presence. Presence of word is a mathematical concept and algorithm performed on it is directing the claim toward an abstract concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
	Claim 5 and 15 are directed toward mathematical concept. Further comprising computing a second transcription hypothesis from the sequence of phonemes wherein the dynamic variable depends on the content of the second hypothesized transcription. The transcription is based on a mathematical algorithm which is carried out by a generic computer. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claim 6 and 16 are directed toward mathematical concept. Wherein the first model is an n-gram model and the second model is a neural network. Language model of any sort is a mathematical concept and algorithm performed on it is directing the claim toward an abstract concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claim 7 and 17 are directed toward mathematical concept. Wherein the interpolation weights are generated using rule-based logic. Interpolation is a mathematical concept and 
Claim 8 and 18 are directed toward mathematical concept. Wherein the interpolation weights are generated using a neural network. Interpolation is a mathematical concept and algorithm performed on it is directing the claim toward an abstract concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claim 9 and 19 are directed toward mathematical concept. Wherein the first model score and second model score are generated using a function that compresses the values of the first model score and the second model score. Scoring and modification of scoring is a mathematical concept and algorithm performed on it is directing the claim toward an abstract concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Claim 10 and 20 are directed toward mathematical concept. Wherein interpolation generates the hybrid score using a weighted sum function. Interpolation and modification of hybrid scoring is a mathematical concept and algorithm performed on it is directing the claim toward an abstract concept. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claim directed toward abstract idea. The claims are not patent eligible.
Therefore, claims 1-20 are not patent eligible under 35 USC 101.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

 (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.



Claims 1 – 3, 5, 9 – 13, 15, 19, and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Prabhavalkar et al. (US2020/0027444A1)(hereinafter "Prabhavalkar").

Regarding claim 1, Prabhavalkar teaches a method of speech transcription, the method comprising: computing a transcription hypothesis from a sequence of phonemes (Par. 0027:” ...determining, based on output of the speech recognition model in response to processing of input data for the training example, a set of hypotheses using beam search decoding; identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”).
computing, according to a first model, a first model score for the transcription; (Par. 0020:” In some implementations, generating a transcription for the utterance comprises: generating language model scores for the multiple candidate transcriptions using a language model; and determining the transcription based on the language model scores generated using the language model”).
computing, according to a second model, a second model score for the transcription (Par. 0127:” For example, a log-linear interpolation can be done between the LAS model [First model] and a finite-state transducer [FST]-based LM [Second model] trained to go from graphemes to words at each step [same transcription] of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”).
computing a hybrid score by interpolation between the first model score and second model score using interpolation weights, where the interpolation weights are in dependence upon a dynamic variable. (Par. 0127:” For example, a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term [dynamic variable] to promote longer transcripts and weighted by η”).

Regarding claim 11, Prabhavalkar teaches: a non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for speech transcription, the method comprising: computing a transcription hypothesis from a sequence of phonemes; (Par. 0140:” The computer program product instructions that, when executed, perform one or more methods, such as those described above. The information carrier is a computer- or machine-readable medium, such as the memory 504, the storage device 506, or memory on processor 502”, and Par. 0027:” ...determining, based on output of the speech recognition model in response to processing of input data for the training example, a set of hypotheses using beam search decoding; identifying an n-best list of hypotheses based on probabilities indicated by the output of the speech recognition model”). As a result, claim 11 is rejected as anticipated by Prabhavalkar under section 102 for the same reasons as claim 1.

Regarding claims 2 and 12, which depend from claims 1 and 11, respectively, Prabhavalkar further teaches “wherein the dynamic variable is conditioned on the content of the transcription”. (Par. 0051:” ... each chunk having a first predetermined number of speech frames representing speech occurring before speech content being predicted at the current time step and a second predetermined number of speech frames representing speech occurring after the speech content being predicted in the current time step.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”)

each chunk, wherein the probability distribution can indicate an element that does not correspond to a word element as a most likely prediction.”, and Par. 0044:”…. generating the transcription for the utterance comprises using beam search decoding to generate one or more candidate transcriptions based on the word element [word presence] scores”).

Regarding claims 5 and 15, which depend from claims 1 and 11, respectively, Prabhavalkar further teaches “further comprising computing a second transcription hypothesis from the sequence of phonemes wherein the dynamic variable depends on the content of the second hypothesized transcription”. (Par. 0051:” ] … successively processing [second transcription] chunks that each include different sets of speech frames with the speech recognition model, wherein the speech recognition model is configured to predict a variable number of word elements for each chunk processed”, and Par. 0127:” In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η.”) 

    PNG
    media_image1.png
    66
    344
    media_image1.png
    Greyscale



Regarding claims 9 and 19, which depend from claims 1 and 11, respectively, Prabhavalkar further teaches “wherein the first model score and second model score are generated using a function that compresses the values of the first model score and the second model score”, (Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion. In equation 7 below, p(y|x) is the score from the LAS model, which is combined with a score coming from an external LM p.sub.LM(x) weighted by an LM weight λ, and a coverage term to promote longer transcripts and weighted by η”). Note: the negative sign before lambda in the following equation is acting as a compression factor.

    PNG
    media_image1.png
    66
    344
    media_image1.png
    Greyscale

Regarding claims 10 and 20, which depend from claims 1 and 11, respectively, Prabhavalkar further teaches “wherein the interpolation generates the hybrid score using a weighted sum function”. (Par. 0010:” In some implementations, the context vector is a weighted sum of multiple encoder outputs for the utterance.”, and Par. 0127:” a log-linear interpolation can be done between the LAS model and a finite-state transducer (FST)-based LM trained to go from graphemes to words at each step of the beam search, also known as shallow fusion.”)


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 4, 7, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar  as applied to claim 2, 1, 12, and 11 respectively, in further view of Nakajima et al. (US20120271617A1)(hereinafter "Nakajima").

Regarding claims 4, and 14 Prabhavalkar teaches a method of speech transcription.
Prabhavalkar does not teach wherein the conditioning is based on semantic information.

With respect to claims 4 and 14, Nakajima teaches wherein the conditioning is based on semantic information (Par. 0024:” … Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood. In some instances, however, the likelihood that a particular word or phrase occurs in a particular context depends on the frequency of previous uses of the word or phrase, regardless of the semantic or grammatical accuracy of the word or phrase”).

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar in view of Nakajima to include semantic information to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).

Regarding claims 7, and 17 Prabhavalkar teaches a method of speech transcription.


With respect to claims 7 and 17, Nakajima teaches wherein the interpolation weights are generated using rule-based logic (Par. 0024:” .... Words or phrases that, according to the semantic or grammar rules of a given language, are semantically or grammatically incorrect, may be associated with a lower likelihood. Words or phrases that, according to the semantic or grammar rules of the given language, are semantically or grammatically correct, may be associated with a higher likelihood”).

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar in view of Nakajima to include rule-based logic in order to improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts, as evidence by Nakajima (see Par. 0003).

Claims 6, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar as applied to claim 1, and 11 respectively, in further view of Masumura et al. “Joint Unsupervised Adaptation of N-gram and RNN Language Models, Dec 2017” (hereinafter “Masumura”)

Regarding claims 6, and 16 Prabhavalkar teaches a method of speech transcription.


With respect to claims 6, and 16 Masumura teaches further wherein the first model is an n-gram model and the second model is a neural network (Section III Page 1589:”In language modeling, mixture models are composed by combining two or more LMs trained from disparate sources with mixture weights. The mixture models were often introduced for domain adaptation or unsupervised adaptation. N-gram mixture models and RNN mixture models are shown in Fig. 1.”)

Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Prabhavalkar in view of Masumura to include first model as n-gram model and the second model as a neural network, and compressing the values of the first model score and the second model score in order to show joint unsupervised adaptation method outperformed a method where no modeling was adapted and one that adapts only one n-gram or RNN, as evidence by Masumura (see Section 6, page 1591).


Claims 8, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Prabhavalkar as applied to claim 1, and 11 respectively, in further view of Jihyun LEE (US 20200160838 A1)(hereinafter “Lee”)

Regarding claims 8, and 18 Prabhavalkar teaches a method of speech transcription.
Prabhavalkar does not teach wherein the interpolation weights are generated using a neural network.
Lee teaches wherein the interpolation weights are generated using a neural network. (Par. 0030:” … implement a decoder configured to determine a first score of candidate texts based on the encoded speech, implement a weight determiner configured to determine weights for each of the respective language models based on an output of the encoder, determine a second score for the candidate texts based on the respective language models, apply the weights to the second score of the candidate texts obtained from the respective language models to obtain a weighted second score, …., based on a sum of the first score and the weighted second score corresponding to the target candidate text.”, and Par. 0035:”Each of the encoder, the decoder, and the weight determiner may be implemented on a neural network”, and Par. 0060:” … accurate result of speech recognition by dynamically determining a weight to be applied to an output of at least one language model based on a situation …., adjust a combination weight to be applied to an output of a language model based on the domain obtained through the classification, and thus effectively adjust an influence of the language model on a result of the speech recognition based on the domain of the speech input”).



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689.  The examiner can normally be reached on Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications 






/D.A./             Examiner, Art Unit 2656                                                                                                                                                                                           
/Paras D Shah/             Primary Examiner, Art Unit 2659                                                                                                                                                                                           

02/19/2021