DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Final Office Action mailed 2/8/2022, applicant has submitted an amendment filed 4/8/2022.
Claim(s) 1, 4-8, and 11-13, has/have been amended.  
Response to Arguments
The claims approved by Applicant (see Examiner’s Amendment, below) are in allowable form (i.e. the 112 rejections of the Final Office Action mailed 2/8/2022 are overcome).  
Due to time constraints, this Notice of Allowability is being sent, but some things worth noting about the claims include the following:
The claim total is currently 3 independent claims and 18 claims (i.e. there is “space” for 2 more equivalent claims if Applicant would like to add 2 more equivalent dependent claims for claim 12 and/or claim 13).
There is no equivalent for claim 10 among the equivalent dependent claims for claim 12 (i.e. Applicant could add an equivalent dependent claim for claim 10 [depending on claim 19 and including the “wherein the device operation status further includes…” clause of claim 10] since there is still “space” for 2 more equivalent claims)
Claim 20 depends on claim 12 and not on claim 19 (i.e. claim 11 could also be broadened to depend on claim 1 [including amending “the utterance situation information” in lines 3-4 of claim 11 to recite –an utterance situation information--])

EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given by Jong Wan Suh on 4/22/2022.

The application has been amended as follows: 

Make the following amendments relative to the claims filed 4/8/2022.

1. (Currently Amended) An artificial intelligence apparatus for recognizing speech of a user, comprising:
a microphone; and
a processor configured to:
obtain, via the microphone, speech data including speech of a user
for each of a plurality of sections of the speech data:
calculate a probability for each of a plurality of words
determine, using a speech recognition log, a frequency weight for each of the plurality of words
for each of the plurality of words, calibrate the probability by multiplying the calculated probability by the determined frequency weight
convert the speech data into text by (i) selecting, for each section of the plurality of sections, a word calibrated probability and (ii) combining the selected words and

	perform control based on the text

2-3. (Canceled)

4. (Currently Amended) The artificial intelligence apparatus of claim 1, wherein the processor is configured to:
normalize the calibrated probabilities
convert the speech data into the text by selecting, for each section of the plurality of sections, a word having a highest normalized calibrated probability

5. (Currently Amended) The artificial intelligence apparatus of claim 1, wherein the processor is configured to:
check a usage frequency of a word
set, based on usage frequencies of words, a frequency weight of a word to be higher than a frequency weight of another word

6. (Currently Amended) The artificial intelligence apparatus of claim 5, wherein the processor is configured to:
determine a private frequency weight for a word
determine a public frequency weight for a word
calibrate a probability for a word
wherein the private speech recognition log includes speech recognition logs in the artificial intelligence apparatus, and
wherein the public speech recognition log includes speech recognition logs in an external apparatus

7. (Currently Amended) The artificial intelligence apparatus of claim 6, wherein the processor is configured to:
determine an integrated frequency weight by calculating a weighted sum of a private frequency weight of a word and a public frequency weight of a word
calibrate a probability

8. (Currently Amended) The artificial intelligence apparatus of claim 1, wherein the processor is configured to: 
determine utterance situation information corresponding to an utterance situation, and
calibrate a frequency weight for a word using the utterance situation information


9. (Currently Amended) The artificial intelligence apparatus of claim 8, 
wherein the utterance situation information includes at least one of a type of the artificial intelligence apparatus, an installation location of the artificial intelligence apparatus, a location of the user, a main user, and a device operation status, and
wherein the device operation status includes operation status information of the artificial intelligence apparatus.

10. (Previously Presented) The artificial intelligence apparatus of claim 9, further comprising a communication circuit configured to communicate with at least one external device,
wherein the device operation status further includes operation status information of the at least one external device.

11. (Currently Amended) The artificial intelligence apparatus of claim 10, wherein the processor is configured to:
determine a calibration weight based on
calibrate a frequency weight by multiplying a frequency weight by the calibration weight



12. (Currently Amended) A computer-implemented method of recognizing speech of a user, the method comprising:
obtaining, via a microphone, speech data including speech of a user
for each of a plurality of sections of the speech data:
calculating a probability for each of a plurality of words
determining, using a speech recognition log, a frequency weight for each of the plurality of words
for each of the plurality of words, calibrating the probability by multiplying the calculated probability by the determined frequency weight
converting the speech data into text by (i) selecting, for each section of the plurality of sections, a word calibrated probability and (ii) combining the selected words and

performing control based on the text

13. (Currently Amended) A non-transitory processor-readable medium having recorded a program for performing a method of recognizing speech of a user, the method comprising:
obtaining, via a microphone, speech data including speech of a user
for each of a plurality of sections of the speech data:
calculating a probability for each of a plurality of words
determining, using a speech recognition log, a frequency weight for each of the plurality of words
for each of the plurality of words, calibrating the probability by multiplying the calculated probability by the frequency weight
converting the speech data into text by (i) selecting, for each section of the plurality of sections, a word calibrated probability and (ii) combining the selected words and

performing control based on the text

14. (New) The computer-implemented method of claim 12, further comprising:
normalizing the calibrated probabilities, and
converting the speech data into the text by selecting, for each section of the plurality of sections, a word having a highest normalized calibrated probability.

15. (New) The computer-implemented method of claim 12, further comprising:
checking a usage frequency of a word using the speech recognition log, and
setting, based on usage frequencies of words, a frequency weight of a word to be higher than a frequency weight of another word.

16. (New) The computer-implemented method of claim 15, further comprising:
determining a private frequency weight for a word using a private speech recognition log, 
determining a public frequency weight for a word using a public speech recognition log, and
calibrating a probability for a word using the private frequency weight and the public frequency weight,
wherein the private speech recognition log includes speech recognition logs in an artificial intelligence apparatus, and
wherein the public speech recognition log includes speech recognition logs in an external apparatus.

17. (New) The computer-implemented method of claim 16, further comprising:
determining an integrated frequency weight by calculating a weighted sum of a private frequency weight of a word and a public frequency weight of a word, and
calibrating a probability using the integrated frequency weight.

18. (New) The computer-implemented method of claim 12, further comprising: 
determining utterance situation information corresponding to an utterance situation, and
calibrating a frequency weight for a word using the utterance situation information.

19. (New) The computer-implemented method of claim 18, 
wherein the utterance situation information includes at least one of a type of an artificial intelligence apparatus, an installation location of the artificial intelligence apparatus, a location of the user, a main user, and a device operation status, and
wherein the device operation status includes operation status information of the artificial intelligence apparatus.

20. (New) The computer-implemented method of claim 12, further comprising:
determining a calibration weight based on utterance situation information, and
calibrating a frequency weight by multiplying a frequency weight by the calibration weight.


Allowable Subject Matter
Claims 1 and 4-20 are allowed.
The following is an examiner’s statement of reasons for allowance:

As per Claim(s) 1 (and similarly claims 12-13, and consequently claims 4-11 which depends on claim 1), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) An artificial intelligence apparatus for recognizing speech of a user, comprising: a microphone; and a processor configured to: obtain, via the microphone, speech data including speech of a user, calculate a probability for each of a plurality of words in each of sections of the speech data, (i.e. each/every section has a plurality of words determine a frequency weight for each of the plurality of words in each of the sections using a speech recognition log, calibrate the probability by multiplying the calculated probability by the frequency weight, convert the speech data into text by combining words having a highest probability for each of the sections, generate a speech recognition result based on the text, and perform control corresponding to the generated speech recognition result.
2016/0005402 teaches “Automatic speech recognizers typically generate a set of alternative hypotheses 134 (i.e., candidate words) for each recognized region in an audio stream. For example, when the automatic transcription system 128a attempts to recognize the spoken word "knot," the system 128a may generate a list of alternative hypotheses 134 consisting of the words "knot," "not," "naught," and "nit," in that order. The system 128a typically associates a confidence measure with each hypothesis representing a degree of confidence that the hypothesis accurately represents the corresponding audio region. The final output of an automatic speech recognizer, such as the draft transcript 124, typically only includes the best hypothesis (i.e., the hypothesis having the highest confidence measure) for each corresponding region in the audio stream 102. If, however, the draft transcript 124 includes information about competing hypotheses, or if the relevance identifier 112 otherwise has access to the competing hypotheses 134, the relevance identifier 112 may use such competing hypothesis information 134 to generate the relevance score R 114” (paragraph 41) and “For example, the relevance identifier 112 may identify the prior relevance R.sub.H of the competing hypothesis having the highest prior relevance of all competing hypotheses for the current document region (step 504). In the example above, in which the competing hypotheses are "knot," "not," "naught," and "nit," the word "not" most likely has the highest prior relevance. In such a case, the relevance identifier 112 may use the prior relevance of the word "not" as the value of R.sub.H even though the word "not" does not appear in the draft transcript 124. Elevating the relevance of the word "knot" in this way may be useful because it is important to bring the word to the attention of the proofreader 126 in the event that the highly-relevant word "not" was misrecognized as "knot."” (paragraph 42).  This reference suggests generating a highest scoring hypothesis for each of a plurality of regions in an audio stream, where each region is a spoken word (paragraph 41 describes where a set of hypotheses are generated “for each recognized region” and then describe an example where a list of alternative hypotheses is generated when the system attempts to recognize “the spoken word ‘knot’”).  “regions”, in this reference, appear to be important regions or regions that are likely to have been transcribed incorrectly (paragraph 10), but even assuming the regions are not adjacent to each other, paragraph 41 describes where the best hypothesis for each corresponding region are still part of a draft transcript (and are thus combined with other words of the draft transcript to form a draft transcript “speech recognition result”).
2014/0343935 teaches “Outputting the final recognition results may include determining the final scores of the one or more word candidates for the time span based on the probability values and reliabilities of the one or more word candidates for the time span, and outputting a word candidate having a highest value for the time span as one of the final recognition results” (paragraph 24) and “The final recognition result output unit 30 outputs final recognition results based on the results of the speech recognition and verification unit 32. The final recognition result output unit 30 determines final scores based on the probability values and reliabilities of the one or more word candidates for each time span. Furthermore, the final recognition result output unit 30 may output a word candidate having the highest value for each time span as a final recognition result. That is, the final recognition result output unit 30 may search all the paths of a word lattice, may determine a path having the highest value, and may present the determined path as a final recognition result” (paragraph 41).  This reference appears to describe generating final recognition results, one for each of a plurality of time spans, based on a highest scoring word candidate for each time span.  This reference does not appear to describe where a single result is generated by combining the highest scoring candidates for each time span (but this may not be required for claim 3 of this application)
It would not necessarily be obvious to combine the n-gram occurrence frequency weighting of Lloyd with the scoring of each candidate for each time span/region in the previous two references (and vice versa) because Lloyd suggests weighting an entire transcription’s score based on word-specific weights, not where each word’s score is weighted by a respective word-specific weight (i.e. not where the weighting is word-by-word).
10387568 teaches “The routine 400A then proceeds to operation 414, where a keyword score KS for the given candidate keyword 132 is calculated. In one implementation, the keyword score KS is calculated as KS=.SIGMA..sub.i.di-elect cons.keywordWS.sub.i, where the summation runs through every word contained in the given candidate keyword 132. The keyword score 134 can also be calculated as a weighted sum of the word scores for the words contained in the candidate keyword 132 and the weight for each word score can be utilized to reflect the importance of the corresponding word. From operation 414, the routine 400A proceeds to operation 416, where it ends” and “A keyword can be a unigram, i.e. consisting of a single word, or a multi-gram, i.e. consisting of multiple words”.  This reference is directed to extracting keywords from documents, but does at least suggest where a score for a multi-word keyword is calculated based on a weighted sum of word scores for words in a keyword.  In this reference, weights are used to reflect importance of a corresponding word (not frequency).
2014/0278407 teaches “If the candidate transcription was found, the computing system 110 determines a probability score for the candidate transcription using the first component (308). If the candidate transcription was not found, the computing system determines a probability score for the candidate transcription using a second component of the language model (310). For example, the computing system 110 can "back off" to a generalized n-gram model. The computing system 110 normalizes the probability score from the second component (312), for example, by multiplying the probability score by a weighting value that calibrates the second component relative to the first component. The computing system 110 then evaluates the candidate transcription using the probability score (314), which is either a score from the first component or a normalized score from the second component” (paragraph 97) and “The computing system 110 can generate a language model 120 that includes two components, a first component that can assign scores to a defined set of language sequences, and a second component that can assign scores to any language sequence” (paragraph 23).
2019/0258717 teaches “The curating of the knowledge continues at step 795 where the scores are interpreted to determine whether the initial interpretation is reliable and, when reliable, adds the initial interpretation to a knowledge database. For example, a weighting approach is utilized to aggregate scores to produce a confidence level. For instance, weighting factors are multiplied by each component of the scores (e.g., an alignment component, the source reliability component, an information age component) to produce intermediate scores for aggregation to produce the conference level. The confidence level indicates that the initial interpretation is reliable when the confidence level is greater than a confidence threshold” and “A third interpreting step includes determining the confidence level based on a weighting approach and scores for the confirming entigen groups and other scores for the disconfirming entigen groups. For example, the processing module multiplies each score by a weighting factors for confirming and disconfirming to produce an intermediate confidence level and aggregates the intermediate confidence levels to produce the confidence level”
10609037 teaches “the scores may be combined in a weighted combination, such as a weighted sum, by multiplying each score by a corresponding weight and summing the weighted values to determine an aggregate correspondence score. The weights may be determined with a variety of different techniques. In some cases, the weights may be hand tuned manually, for instance, by a system administrator or developer. In some cases, the weights may be changed dynamically based on feedback, such as feedback indicative of false positives or false negatives in the determination of block 34” and “In some cases, certain attributes are expected to be more indicative of attacks than others, as some attributes are expected to change relatively frequently and are less indicative of the second computing device being impersonated. Accordingly, some embodiments may determine a score for each pair of attributes compared between the known profile and the observe profile and that score may be weighted based on the likelihood of differences indicating an attack. In some cases, the score is a binary value of zero or one indicating whether to values match exactly, like a device maker name, or a binary value of zero or one indicating whether a given operating system version has decremented or incremented between capturing the known profile and the observed profile. In some cases, the score is a cardinal value indicating differences between known and observed attributes, like a number of increments between application versions, or an edit distance between strings”;
2020/0243092 (foreign priority date precedes effective date of this application) teaches “The converter 30H converts the speech data into a character string for every word string by adopting a word string having the highest appearance probability among a set of the word strings which are obtained by arranging the read morphemes in time series”.
2008/0208582 teaches “in speech than simply the recognition of each word spoken. As a result, while conventional speech recognition techniques focus on identifying a set of alternatives for each spoken word, determining which alternative has the highest probability of actually being the word that was spoken and discarding the remainder of the alternatives, embodiments of the invention recognize that the discarded alternatives may have significant value in identifying”.  These references appear to teach (but also teach away from) word-by-word recognition.

	Upon further search (in response to the amendment filed 12/22/2021):
10210860 teaches “In example neural network 1400 an output layer 1410 is provided that outputs the probability that the input corresponds to the associated word represented by the output node. In step 1420, corresponding to customization layer 207, the probabilities are adjusted by dividing by the frequency of the word in the general training set and multiplying by the frequency of the word in the custom training set. The resulting values are used as the new word probabilities, and the word with the highest probability after customization is selected as the output of the neural network. The effect of the customization is, roughly, to remove the prior for the word from the general domain and replace it with the prior for the word from the custom domain” (col. 25, lines 9-22).  Col. 1, lines 44-61 suggests where speech recognition probabilities are calibrated by frequency of occurrence of a respective word in a custom dataset and frequency of occurrence of a respective word in a general training set.  Col. 22, lines 20-57 appear to describe where a custom dataset is a set of frequencies, classifications, phonemes, audio features, pronunciations, and new words, and where a custom domain only has a list of frequent words and their frequencies.  Col. 22, lines 58-67 appear to describe where custom domain is speech recognition data that is produced by performing speech recognition on phone calls of a particular company (as opposed to general speech recordings).  Col. 23, lines 1-13 teaches away from a particular way of customization for a custom domain, but does not teach away from custom domains themselves.  Col. 24, line 54 – col. 25, line 8 describes where a “training set” or a list of frequent words and their frequencies for a custom domain is provided, and customization is performed for frequent words that are unseen or have low frequencies in a general model.  Col. 24, lines 4-29 describes where a training set for a custom domain comprises audio files and corresponding text transcripts (which suggests where the training set in col. 24, line 54 – col. 25, line 8 can be a “log”/record of words spoken in the audio files).  Col. 4, lines 30-45 describes where speech recognition output is a word-by-word transcription of input audio (which, together with col. 25, lines 9-22, suggests where the word-by-word transcription is made by combining words with the highest probability for each of a plurality of sections of input audio).  This reference appears to be directed to customizing a neural network prior to performing speech recognition by adding a customization layer, not to determining and applying weights to probabilities calculated for words of microphone-obtained user speech (i.e. this reference appears to be directed to adding a customization layer that weights output layer probabilities during training of a speech recognition system, not determining frequency weights during the running of a speech recognition system).
2007/0083374 teaches “This example demonstrates the desirability for a voice processing software to detect a sudden change in the use of a historically uncommon word or phrase (e.g., "Ivan"), and to dynamically adjust the probabilities of that word in the language model being used by the voice processing system. As a result of implementing such detection and adjustment, the number of errors that occur during a voice recognition session may be decreased. Additionally, after some amount of time the language model may dynamically re-adjust the probability if the frequency of use of the anomaly word again becomes infrequent” (paragraph 41).  This reference appears to adjust the language model probability, and does not appear to specifically calibrate a determined probability for a word.
7505969 teaches “As shown, method includes providing a products database having product records containing information regarding an associated product in step 62, parsing a document in step 64, and determining word scores of the words in the document based on the frequency of the words in the document in step 66. The word scores are adjusted by predetermined weightings corresponding to the use of each word in the document in step 68, and a keyword query search string is constructed using words having the highest word scores in step 70. In step 72, the product records of the products database are searched to identify products satisfying the keyword query search string, and in step 74, product scores are assigned to the identified products based on matches to the keyword query search string” (col. 17, lines 17-42).  This reference suggests generating a keyword query search string sing words having the highest weight-adjusted word scores.  This reference appears to be directed to matching products to a document, and not to speech recognition.
2017/0357632 teaches “Once the electronic device 900 has adjusted probabilities assigned to words, the electronic device 900 thereafter identifies (e.g., selects) one or more words corresponding to any number of highest weighted probabilities and provides the identified words as candidate words 920. Optionally, only words having weighted probabilities exceeding a threshold are identified as candidate words 920” (paragraph 260).  This reference appears to describe weighting probabilities for words based on language.
2008/0270138 teaches “Weight given to a word or other textual content by the time-based weighting algorithm and/or the frequency-based weighting algorithm can increase the probability that the ASR algorithm will use the word. In an exemplary embodiment, each word weighted by the frequency-based weighting algorithm can receive the same amount of weight. Similarly, each word weighted by the time-based weighting algorithm can receive the same amount of weight. Alternatively, different words can receive different weights and/or decaying time periods based on word characteristics. For example, the frequency-based weighting algorithm can give more weight to a word with a unique phonetic sequence than a word with a more common phonetic sequence. Other word characteristics can include phonetic length of the word, the frequency with which the word occurs in the visual (or audiovisual) content or textual metadata content, and/or the time interval during which the word appears. In an alternative embodiment, the weighting algorithms may not be used such that all of the selected textual content has the same likelihood of being used by the ASR algorithm” (paragraph 57).  This reference describes where different words can have different weights.
	2005/0192802 teaches “In one embodiment, if ambiguity exists in the output of the word based disambiguating engine 107, a phrase based disambiguating engine 113 further checks the result against the phrase list 115, which may include word bi-grams, trigrams, etc. One or more previously recognized words may be combined with the current word to match with the phrases in the phrase list 115. The usage frequency of the phrases can be used to modify the probabilities of matching for the word candidates to generate the phrase candidates and their associated probabilities of matching 117. Even when no ambiguity exists, the phrase based disambiguating engine may be used to predict the next word based on the previously recognized word and the phrase list 115” (paragraph 45).

	Upon further search (in response to the amendment filed 4/8/22):
	2016/0267902 teaches where adjustment of statistical weights may occur concurrently with processing of a recognized word (paragraph 52).  Adjustment of statistical weights and probabilities based on frequency appears to be a separate process relative to generating a transcript of input speech (see e.g. paragraph 57), and does not appear to be used to generate probabilities used to determine words of a transcript of input speech.
	2010/0114571 teaches weighting appearance frequencies by degrees of reliability (paragraphs 133 and 152)
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 4/27/2022
/ERIC YEN/Primary Examiner, Art Unit 2658