Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) were submitted on November 24, 2020 and June 11, 2021. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.	
Response to Arguments and Amendments
The amendment filed on August 15, 2022 has been entered. Claims 1-20 are pending in the application. Applicant has amended claims 1-2, 4, 6, 8, 10-13, 15, and 17-20.
The applicant claims that Rao fails to disclose “…identify confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section, and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters as an utterance character of the character section” as in claim 1. The examiner agrees with this assertion. 
The applicant claims that Rao and Bai do not teach or suggest feature of “assigning a lower weight to a confusion probability of the recognition character in which the pause section exists than when no pause section exists” as in claim 4. The examiner agrees with this assertion.
Applicant’s arguments with respect to the 35 U.S.C. 101 rejections for claims 19-20 have been considered and are persuasive. Accordingly, these rejections have been withdrawn. 
Applicant’s arguments with respect to the 35 U.S.C. 102 and 103 rejections for claims 1-3, 5-12, and 14-20 have been considered but are moot because the arguments are directed towards amended claim language, addressed on new grounds of rejection below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically taught as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-12, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rao (U.S. Publication No. 20080120102) in view of Qin (U.S. Patent No. 6513005).
Regarding claim 1, Rao discloses an electronic device ([0048] - The system itself may exist as Software or may be implemented on a computing device), comprising:
a voice receiver (Figure 3A – Receive Speech Input 301); 
and a processor configured to ([0048] - may include memory (ROM, RAM etc.), storage, processors (fixed-point, floating-point etc.), interface ports, and other hardware components):
obtain a recognition character converted from a character section of a user voice input received through the voice receiver (Figure 3A – Partial Spelling Input 306),
However, Rao does not disclose an electronic device, comprising:
a processor configured to:
identify confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section;
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters.
Qin does teach an electronic device, comprising:
a processor configured to:
identify confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”);
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rao to incorporate the teachings of Qin in order to implement an electronic device, comprising: a processor configured to: identify confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section; and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters. Doing so allows for high error correction efficiency (Qin Col 4, Rows 26-33).
Regarding claim 2, Rao in view of Qin teaches all of the limitations as in claim 1, above. 
Rao discloses the electronic device, wherein the processor is configured to: convert the user voice input received through the voice receiver into a character string ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize),
and divide the character string into each character ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize).
Regarding claim 5, Rao in view of Qin teaches all of the limitations as in claim 1, above.  Rao discloses the electronic device, further comprising: 
a memory, wherein the processor is configured to store history information associated with an identification result of the candidate character in the memory (Figure 9A – Store word + pronunciation in as new word and compile network [0048] - may include memory (ROM, RAM etc.), storage, processors (fixed-point, floating-point etc.), interface ports, and other hardware components).
Regarding claim 6, Rao in view of Qin teaches all of the limitations as in claim 5, above. 
Rao discloses the electronic device, wherein the processor is configured to identify the confusion probabilities of the plurality of candidate characters based on a confusion matrix ([0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected. [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores).
Regarding claim 7, Rao in view of Qin teaches all of the limitations as in claim 6, above. 
Rao discloses the electronic device, wherein the processor is configured to update the confusion matrix based on the history information associated with the identification result of the candidate character (Figure 9A – Store word + pronunciation in as new word and compile network [0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected).
Regarding claim 8, Rao in view of Qin teaches all of the limitations as in claim 1, above. 
Rao discloses the electronic device, wherein the processor is configured to identify the similarities of the plurality of candidate characters for the acoustic feature of the character section based on acoustic feature models of a plurality of pre-stored candidate characters (Figure 1 – SP-2: Wave or features corresponding to word to be predicted [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores).
Regarding claim 9, Rao in view of Qin teaches all of the limitations as in claim 8, above. 
Rao discloses the electronic device, wherein the processor is configured to update an acoustic feature model among the acoustic feature models based on the history information associated with the identification result of the candidate character (Figure 1 – Acoustic Model, ASR System 121 – Inputs: Language Model with A.DAT + Acoustic Model + Features from SP-1 or SP-2).
Regarding claim 10, Rao in view of Qin teaches all of the limitations as in claim 1, above. 
Rao discloses the electronic device, wherein the processor is configured to obtain correction probabilities by applying the confusion probabilities which are identified based on a confusion matrix for the plurality of candidate characters and the similarities for the acoustic 34Docket No.: 1572.1783feature of the character section which are identified based on acoustic feature models for a plurality of pre-stored candidate characters ([0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected. [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores, [0093] - the user's original utterance is stored by the ASR engine for further processing and correction of errors).
Regarding claim 11, Rao discloses a method for controlling an electronic device ([0009] - method for speech-to-text prediction of spoken text), comprising:
obtaining a recognition character converted from a character section of a user voice input received through a voice receiver (Figure 3A – Partial Spelling Input 306);
However, Rao does not teach a method for controlling an electronic device, comprising:
identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section;
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters.
Qin does teach a method for controlling an electronic device, comprising:
identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”);
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rao to incorporate the teachings of Qin in order to implement a method for controlling an electronic device, comprising: identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section; and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters. Doing so allows for high error correction efficiency (Qin Col 4, Rows 26-33).
Regarding claim 12, Rao in view of Qin teaches all of the limitations as in claim 11, above. 
Rao discloses the method, further comprising: 
converting the user voice input received through the voice receiver into a character string ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize);
and dividing the character string into each character ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize).
Regarding claim 14, Rao in view of Qin teaches all of the limitations as in claim 11, above. 
Rao discloses the method, further comprising: storing history information associated with an identification result of the candidate character (Figure 9A – Store word + pronunciation in as new word and compile network [0048] - may include memory (ROM, RAM etc.), storage, processors (fixed-point, floating-point etc.), interface ports, and other hardware components).	Regarding claim 15, Rao in view of Qin teaches all of the limitations as in claim 14, above. 
Rao discloses the method, further comprising: identifying the confusion probabilities based on a confusion matrix for the plurality of candidate characters ([0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected. [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores).
Regarding claim 16, Rao in view of Qin teaches all of the limitations as in claim 15, above. 
Rao discloses the method, further comprising: updating history information associated with the identification result of the candidate character. (Figure 9A – Store word + pronunciation in as new word and compile network [0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected).
Regarding claim 17, Rao in view of Qin teaches all of the limitations as in claim 11, above. 
Rao discloses the method, further comprising: identifying the similarities of the plurality of candidate characters for the acoustic feature for the acoustic feature of the character section based on acoustic feature models of a plurality of pre-stored candidate characters (Figure 1 – SP-2: Wave or features corresponding to word to be predicted [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores).
Regarding claim 18, Rao in view of Qin teaches all of the limitations as in claim 11, above.
Rao discloses the method, further comprising: obtaining correction probabilities by applying the confusion probabilities which are identified based on a confusion matrix for the plurality of candidate characters and the similarities for the acoustic feature of the character section which are identified based on acoustic feature models for a plurality of pre-stored candidate characters ([0082] - a combination of longer and more frequent words may be grouped together and using the so-called confusion matrix (well-known in speech recognition) the top 200 words may be selected. [0083] - the maximum-likelihood probability is computed using a combination of acoustic scores (calculated using dynamic programming as in ViterbiSearch) and the language model scores, [0093] - the user's original utterance is stored by the ASR engine for further processing and correction of errors).
Regarding claim 19, Rao discloses a non-transitory computer-readable storage medium in which a computer program executable by a computer is stored ([0157] - storage medium 1613), wherein the computer is configured to execute an operation of:
obtaining a recognition character converted from a character section of a user voice input received through a voice receiver (Figure 3A – Partial Spelling Input),
However, Rao does not disclose a non-transitory computer-readable storage medium in which a computer program executable by a computer is stored, wherein the computer is configured to execute an operation of:
identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section;
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters.
Qin does teach a non-transitory computer-readable storage medium in which a computer program executable by a computer is stored, wherein the computer is configured to execute an operation of:
identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”);
and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters (Col 4, Rows 26-33 – “Step 208: the acoustic models obtained from Step 207 are then combined with character and word level language models, leading to a probabilistic evaluation of the likelihood figures of candidate characters (or words) during the Sequential Stroke input process. These integrated models are used to rank and order the current Set of candidates in the Stroke based input System for high error correction efficiency”; Col 7, Rows 18-21 – “The combined acoustic (confusion matrix) and language models will be used to rank the current active candidates and order them according to their overall likelihood value”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rao to incorporate the teachings of Qin in order to implement a non-transitory computer-readable storage medium in which a computer program executable by a computer is stored, wherein the computer is configured to execute an operation of: identifying obtaining confusion probabilities of a plurality of candidate characters in association with the obtained recognition character, and similarities of the plurality of candidate characters for an acoustic feature of the character section; and based on the identified confusion probabilities and the identified similarities, identify one of the plurality of candidate characters. Doing so allows for high error correction efficiency (Qin Col 4, Rows 26-33).
Regarding claim 20, Rao in view of Qin teaches all of the limitations as in claim 19, above.
Rao discloses the computer-readable storage medium, wherein the computer executes an operation of:
converting the user voice input received through the voice receiver into a character string ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize).
and dividing the character string into each character ([0063] - the result of the ASR engine is displayed along with the N-best choices (block 220). If the user inputs an additional letter (block 222), then the system can either use that additional letter to further reduce the active lexicon (block 224) and re-recognize).
Claims 3 is rejected under 35 U.S.C. 103 as being unpatentable over Rao (U.S. Publication No. 20080120102) in view of Qin (U.S. Patent No. 6513005) and further in view of Bai (U.S. Publication No. 20210127003).
Regarding claim 3, Rao in view of Qin teaches all of the limitations as in claim 2, above.
However, Rao in view of Qin does not teach the electronic device, wherein the processor is configured to analyze whether a pause section exists between characters of the character string.
Bai does teach the electronic device, wherein the processor is configured to analyze whether a pause section exists between characters of the character string ([0110] - The classification of the sound signals may sometimes include other categories, such as a pause in the middle of voice).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rao in view of Qin to incorporate the teachings of Bai in order to implement the electronic device, wherein the processor is configured to analyze whether a pause section exists between characters of the character string. Doing so allows the system to identify and discriminate sounds for human-machine interaction and sounds for non-human-machine interaction accurately, thereby increasing the accuracy and intelligence of interactive voice-control, and improving the user experience of human-machine interaction (Bai [0026]).
Allowable Subject Matter
Claims 4 and 13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance: dependent claims 4 and 13 contain the limitations “assigning a lower weight to a confusion probability of the recognition character in which the pause section exists than when no pause section exists”. At the time of the effective filing date of the application, these limitations had not been fully anticipated and it would not have been obvious to one of ordinary skill in the art to combine elements of the prior art to meet this limitation.  
The closest prior art, Bai (U.S. Publication No. 20210127003) either singularly or in combination fail to anticipate or render obvious the above described limitations.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Czyryba (U.S. Patent No. 11127394) teaches the method and system of high accuracy keyphrase detection for low resource devices. Pudipeddi (U.S. Publication No. 20210119152) teaches data parallelism in distributed training of artificial intelligence models. Vemeulen (U.S. Patent No. 7720683) teaches the method and apparatus of specifying and performing speech recognition operations.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658                                                                                                                                                                                                        





17