Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. KR10-2020-0088919, filed on 07/17/2020.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/30/2020 is considered by the examiner.
Drawings
The drawing submitted on 11/30/2020 is considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a receiver configured to…”  in claim 9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2, 6-7, 9-10, 14-15 and 17-20 are rejected under 35 U.S.C. 102(a) (1) as being anticipated by Ganong et al.(US 2015/0248883 A1).

Regarding Claims 1 and 9, Ganong et al. teach: A speech signal processing method, comprising: receiving an input token (phone or phoneme) that is based on a speech signal; calculating first probability values respectively corresponding to candidate output tokens (multiple recognition results, each of which includes a string of words and a confidence value)  based on the input token ([0041] For example, in some probabilistic processes, the acoustic model(s) 102A may be used to identify a probability that a sound used in the speech input is a particular phone or phoneme of a language and to identify potential strings of words that correspond to the phones/phonemes, and the language model(s) 102B may be used to determine, based on how words or phrases are commonly used and arranged in the language in general, or in a particular domain, a probability that each of the strings of words might correspond to the speech input and thereby identify the most likely strings.[0042] In performing such a probabilistic process, an ASR system 102 may yield multiple recognition results, each of which includes a string of words and a confidence value. For example, the ASR system 102 might order the recognition results according to the confidence of the ASR system 102 that a result is a correct representation of the content of the speech input and then select at most N results from the ordered recognition results. The recognition results, however formatted, may include or represent a "top result" or "most likely result," which the ASR system 102 has identified as being most likely to be a correct representation of the content of the speech input, and N-1 alternative recognition results that the ASR system has identified as the results that are next most likely, after the top result, to be a correct representation of the speech input.); adjusting at least one of the first probability values based on a priority (incorrect word string or the significance of consequences of misrecognizing or criteria) of each of the first probability values ([0024] A top recognition result produced by the ASR system for the speech input may be the result that the probabilities indicate is most likely to be correct. Applicants have recognized and appreciated that when the top result of the ASR system includes the types of errors discussed above, in some cases one or more of the alternative recognition results produced by the ASR system may be more accurate in some ways and not include one or more of the errors. Continuing to use the examples above, while the top recognition result that the ASR system identified as most likely to be correct may erroneously include the word "malignant," in some cases one or more of the alternative results that were identified by the ASR system as possible (but less likely to be correct) recognition results may correctly include the word "nonmalignant." [0031] In other embodiments, however, the likelihood may be adjusted according to the significance of consequences that may occur in a domain if the string of words is incorrect (e.g., the significance of consequences of misrecognizing "malignant").); and processing the speech signal based on an adjusted probability value obtained by the adjusting ([0029] The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set. [0030] To make the determination, the word/phrase of the set that appears in the result may be iteratively replaced with each of the other words/phrases of the set. The string of words that results from each of the replacements may then be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in the language and/or domain to which the language model corresponds.).

Regarding Claims 2 and 10, Ganong et al. teach: The speech signal processing method of claim 1, wherein the adjusting comprises: determining whether a first probability value corresponding to a first candidate output token from among the candidate output tokens is included in a predetermined priority (N best results from ordered recognition results); and adjusting the first probability value based on a result of the determining (See rejection of claim 1 and  [0030] To make the determination, the word/phrase of the set that appears in the result may be iteratively replaced with each of the other words/phrases of the set. The string of words that results from each of the replacements may then be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in the language and/or domain to which the language model corresponds. [0031] In other embodiments, however, the likelihood may be adjusted according to the significance of consequences that may occur in a domain if the string of words is incorrect (e.g., the significance of consequences of misrecognizing "malignant"). [0042] For example, the ASR system 102 might order the recognition results according to the confidence of the ASR system 102 that a result is a correct representation of the content of the speech input and then select at most N results from the ordered recognition results. The N best results produced in this way may formatted as an "N best" list of recognition results, as a lattice of recognition results, or in any other suitable manner.).

Regarding Claims 6 and 14: The speech signal processing method of claim 1, wherein the processing comprises: outputting a text based on the speech signal (See rejection of claim 1 and [0024] In many ASR systems, the recognition result returned is the "best guess" of the ASR system of text that corresponds to a speech input, and that the ASR system may have produced one or more other alternative recognition results that the ASR system identified as possibly correct. Many ASR systems operate based on probabilistic processes, such that the ASR system may produce multiple possible recognition results, each of which is associated with a probability of being a correct output. A top recognition result produced by the ASR system for the speech input may be the result that the probabilities indicate is most likely to be correct.).

Regarding Claims 7 and 15, Ganong et al. teach: The speech signal processing method of claim 1, wherein the processing comprises: calculating second probability values respectively corresponding to candidate output tokens based on the input token; and determining an output token based on the first probability values and the second probability values ([0041] For example, in some probabilistic processes, the acoustic model(s) 102A may be used to identify a probability that a sound used in the speech input is a particular phone or phoneme of a language and to identify potential strings of words that correspond to the phones/phonemes, and the language model(s) 102B may be used to determine, based on how words or phrases are commonly used and arranged in the language in general, or in a particular domain, a probability that each of the strings of words might correspond to the speech input and thereby identify the most likely strings. [0042] In performing such a probabilistic process, an ASR system 102 may yield multiple recognition results, each of which includes a string of words and a confidence value.).

Regarding Claim17, Ganong et al. teach: A speech signal processing method, comprising: receiving an input token that is based on a speech signal; calculating first probability values respectively corresponding to first candidate output tokens based on the input token; calculating second probability values respectively corresponding to second candidate output tokens based on the input token (multiple recognition results, each of which includes a string of words and a confidence value) ([0041] For example, in some probabilistic processes, the acoustic model(s) 102A may be used to identify a probability that a sound used in the speech input is a particular phone or phoneme of a language and to identify potential strings of words that correspond to the phones/phonemes, and the language model(s) 102B may be used to determine, based on how words or phrases are commonly used and arranged in the language in general, or in a particular domain, a probability that each of the strings of words might correspond to the speech input and thereby identify the most likely strings.[0042] In performing such a probabilistic process, an ASR system 102 may yield multiple recognition results, each of which includes a string of words and a confidence value. For example, the ASR system 102 might order the recognition results according to the confidence of the ASR system 102 that a result is a correct representation of the content of the speech input and then select at most N results from the ordered recognition results. The recognition results, however formatted, may include or represent a "top result" or "most likely result," which the ASR system 102 has identified as being most likely to be a correct representation of the content of the speech input, and N-1 alternative recognition results that the ASR system has identified as the results that are next most likely, after the top result, to be a correct representation of the speech input.) ); generating a nonverbal token (newly-created string of words as a replacement or other word string from alternative results) and a probability corresponding to the nonverbal token based on at least one of the first probability values and the second probability values; and processing the speech signal based on the probability corresponding to the nonverbal token ([0029] The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set.  To make the determination, the word/phrase of the set that appears in the result may be iteratively replaced with each of the other words/phrases of the set. The string of words that results from each of the replacements may then be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in the language and/or domain to which the language model corresponds.).

Regarding Claim18, Ganong et al. teach:  The speech signal processing method of claim 17, wherein the generating comprises: duplicating(replaced alternative corresponding words/phrases from the  alternative recognition results or from the set), into the second candidate output tokens, a nonverbal token included in the first candidate output tokens; and duplicating, into the second probability values, a probability corresponding to the nonverbal token and included in the first probability values (See rejection of claim 17, [0029] The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set. [0030] To make the determination, the word/phrase of the set that appears in the result may be iteratively replaced with each of the other words/phrases of the set. The string of words that results from each of the replacements may then be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in the language and/or domain to which the language model corresponds.).

Regarding Claim19, Ganong et al. teach: The speech signal processing method of claim 17, wherein the generating comprises: duplicating, into the second candidate output tokens, a nonverbal token included in the first candidate output tokens; and mapping a greatest value among the second probability values to a probability corresponding to the duplicated nonverbal token (See the rejection of claim 17).

Regarding Claim 20, Ganong et al. teach:  The speech signal processing method of claim 17, further comprising: determining whether the nonverbal token is registered among the second candidate output tokens ([0030] As a third example, such sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with an acoustic model or a language model to evaluate a top recognition result to determine whether the top recognition result includes potential significant errors. For example, the top recognition result may be evaluated using sets of words/phrases to determine whether any of the words/phrases appear in the result. [0035] Determining whether a recognition result includes an unlikely word or phrase may be carried out in any suitable manner. For example, a set of unlikely words/phrases may be maintained for a domain and the words/phrases in the recognition result may be compared to the set to determine whether any of the words/phrases are unlikely in the domain and indicative of potential errors. As another example, a language model for the domain may be maintained that includes, for words and phrases, a value indicating a likelihood of the words or phrases appearing in speech or text related to the domain. In some embodiments, when the value of the domain language model for a word or phrase of a recognition result is below a threshold, the word or phrase may be identified as being unlikely to appear in the domain and indicative of a potential error.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 3 and 11, are rejected under 35 U.S.C. 103 as being unpatentable over Ganong  et al. in view of Acero et al.(AU 2018/266284 B2).

Regarding Claims 3 and 11, Ganong et al. do not teach: The speech signal processing method of claim 2, wherein the first candidate output token is a token corresponding to an end of a sentence.
Acero et al. teach: candidate output token is a token corresponding to an end of a sentence ([0248] The plurality of candidate text representations are determined by performing speech recognition on the stream of audio. Each candidate text representation is associated with a respective speech recognition confidence score. The speech recognition confidence scores can indicate the confidence that a particular candidate text representation is the correct text representation of the user utterance. In addition, the speech recognition confidence scores can indicate the confidence of any determined word in a candidate text representation of the plurality of candidate text representations. In some examples, the plurality of candidate text representations are the n-best candidate text representations having the n-highest speech recognition confidence scores. [0250] In some examples, block 1 108 is performed in real-time as the user utterance is being received at block 1104. In some examples, speech recognition is performed automatically upon receiving the stream of audio. In particular, words of the user utterance are decoded and transcribed as each portion of the user utterance is received. [0254] For example, the plurality of candidate text representations of block 1 108 can be analyzed to determine whether an end-of- sentence condition is detected in the one or more candidate text representations. In some examples, the end-of-sentence condition is detected if the ending portions of the one or more candidate text representations match a predetermined sequence of words. In some examples, a language model is used to detect an end-of-sentence condition in the one or more candidate text representations.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ganong et al. to include the teaching of Acero et al. above in order to indicate the confidence that a particular candidate text representation is the correct text representation of the user utterance.

Claim(s) 8 and 16, are rejected under 35 U.S.C. 103 as being unpatentable over Ganong et al. in view of Yoshida (WO 2011/016129 A1).

Regarding Claims 8 and 16, Ganong et al. teach: determining, to be the output token, a candidate output token having a greatest weighted sum (confidence value) from among the candidate output tokens ([0042] Through applying the models 102A, 102B, an ASR system 102 that implements such a probabilistic process can determine a result of a speech recognition process that is a string of one or more words that might correspond to the speech input. The ASR system 102 also produces, for the result, a confidence of the ASR system 102 (which may be represented as a probability on a scale of 0 to 1, as a percentage from 0 percent to 100 percent, or in any other way) that the result is a correct representation of the speech input. In performing such a probabilistic process, an ASR system 102 may yield multiple recognition results, each of which includes a string of words and a confidence value. When an ASR system 102 produces multiple recognition results, the ASR system 102 might order and filter the results in some way so as to output N results as the results of the speech recognition process, where N is an integer of two or more. For example, the ASR system 102 might order the recognition results according to the confidence of the ASR system 102 that a result is a correct representation of the content of the speech input and then select at most N results from the ordered recognition results. The N best results produced in this way may formatted as an "N best" list of recognition results, as a lattice of recognition results, or in any other suitable manner. The recognition results, however formatted, may include or represent a "top result" or "most likely result," which the ASR system 102 has identified as being most likely to be a correct representation of the content of the speech input, and N-1 alternative recognition results that the ASR system has identified as the results that are next most likely, after the top result, to be a correct representation of the speech input.)
Ganong et al. do not teach: The speech signal processing method of claim 7, wherein the determining comprises: obtaining a weighted sum of each of the first probability values and each of the second probability values. 
Yoshida teaches: obtaining a weighted sum of each of the first probability values and each of the second probability values (Outline Configuration: Then, the first recognition unit 3 recognizes a recognition word (hereinafter referred to as “recognition word WR1”) as a recognition result and a logarithmized likelihood corresponding thereto (hereinafter referred to as “likelihood L1”). Is supplied to the recognition result processing unit 5. Then, the second recognizing unit 4 recognizes a recognized word (hereinafter referred to as “recognized word WR2”) as a recognition result and a logarithmic likelihood corresponding thereto (hereinafter referred to as “likelihood L2”). Supply to part 5. Then, the recognition result processing unit 5 sorts the weighted likelihoods Lw1 and Lw2 together, thereby recognizing the recognized words WR1 and WR2 (hereinafter collectively referred to as “recognized words WR”) and the like.).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Ganong et al. to include the teaching of Yoshida above in order to determine final recognition result using collectively weighting likelihood.
 Allowable Subject Matter
Claims 4-5 and 12-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 21-23 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record alone or in combination failed to teach for claim 21, “obtaining a weighted sum of each of the first probability values and each of the second probability values, in response to the first probability value being included in the predetermined priority; and processing the speech signal based on the weighted sum to output a text, wherein the first neural network is based on a language model, and the second neural network is based on a speech recognition model.”.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Zang et al.(EP 3520036 B1) teach: In response to determining that the ordered concatenation of the sampled valid tokens for previous output positions and the sampled valid token for the current output position is a valid decomposition of the target sequence, the system adjusts the current values of the parameters of the neural network to increase the likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions. In some implementations, adjusting the current values of the parameters of the neural network involves performing an iteration of a neural network training procedure to increase a logarithm of a product of the respective scores for each token in the valid decomposition in the score distribution for the output position corresponding to the position of the token in the sampled valid decomposition. For example, the neural network training procedure may be backpropagation or backpropagation through time.
The prior art of record Ravera et al. (EP 0955628 A2) teach: This known system performs a first recognition by means of hidden Markov models, providing a list of the N best recognition hypotheses (for instance: 20), i.e. of the N sentences that have the highest probability to be the sentence being actually uttered, along with their likelihood scores. The Markov recognition stage also provides for a phonetic segmentation of each hypothesis and transfers the segmentation result to a second recognition stage, based on a neural network. This stage performs recognition starting from the phonetic segments supplied by the first Markov step and provides in turn a list of hypotheses, each associated with a likelihood score, according to the neural recognition technique. Both scores are then linearly combined so as to form a single list, and the best hypothesis originating from such a combination is chosen as recognized utterance.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656