DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 22, 2021 has been entered.
 

Response to Amendment
This office action is responsive to applicant’s remarks received on December 22, 2021. Claims 1-18, 20 and newly added Claim 21 are pending.


Response to Arguments
Applicant’s arguments with respect to the amended claims filed on December 22, 2021 have been fully considered but they are not persuasive.

A:  Applicant’s Remarks
For applicant’s remarks “See Applicant Arguments/Remarks Made in an Amendment” filed on December 22, 2021.

A:  Examiner’s Response
The jest of Applicant’s arguments rely on the fact that the cited references either alone or in combination do not teach, disclose or suggest determining speech errors indicating a mismatch between a word the user intends to say and what the user actually says for the audio input using at least a sensitivity level. 
Examiner understands Applicant’s arguments but respectfully disagree. In accordance with Applicant’s specification at Paragraph 0013, a speech error is a mismatch between what a user intends to say and what the user actually says. D’Amato ‘572 teaches keyword matching, comparison and evaluation, false positives, cross-check results, the final intent determination of keywords and etc. For instance, D’Amato ‘572 at Paragraphs 0162-0164 teaches, discloses or suggests determining speech errors indicating a mismatch between a word the user intends to say and what the user actually says for the audio input using at least a sensitivity level. For example, 
Paragraph 0163 teaches that some error in performing keyword matching is expected. Within examples, the local NLU may generate a confidence score when determining an intent, which indicates how closely the transcribed words in the signal SASR match the corresponding keywords in the library of the local NLU. Performing an operation according to a determined intent is based on the confidence score for keywords matched in the signal SASR. For instance, the NMD 703 may perform an operation according to a determined intent when the confidence score for a given sound exceeds a given threshold value (e.g., 0.5 on a scale of 0-1, indicating that the given 
Moreover, Paragraph 0164 teaches that keyword matching can be performed via NLUs of two or more different NMDs on a local network, and the results can be compared or otherwise combined to cross-check the results, thereby increasing confidence and reducing the rate of false positives. For example, a first NMD may identify a keyword in voice input with a first confidence score. A second NMD may separately perform keyword detection on the same voice input (either by separately capturing the same user speech or by receiving sound input data from the first NMD transmitted over the local area network). The second NMD may transmit the results of its keyword matching to the first NMD for comparison and evaluation. If, for example, the first and second NMD each identified the same keyword, a false positive is less likely. If, by contrast, the first and second NMD each identified a different keyword (or if one did not identify a keyword at all), then a false positive is more likely, and the first NMD may decline to take further action. In some embodiments, the identified keywords and/or any associated confidence scores can be compared between the two NMDs to make a final intent determination. With this said, the cited references teach, disclose or suggests the Applicant’s claimed invention. As a result, it is submitted that the present application is not in condition for allowance.


Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-18, 20 & 21 are rejected under 35 U.S.C. 103 as being unpatentable over Doyle (US 7,668,710 B2 hereinafter, Doyle ‘710) in combination with D’Amato et al. (US 20210035572 A1 hereinafter, D’Amato ‘572).
Regarding claim 12; Doyle ‘710 discloses a system (Fig. 9A, System 910) 
comprising: 
a processing system (Fig. 9A, CPU 901);
a storage system (Fig. 9A, Memory 902);
and instructions stored on the storage system that when executed by the processing system (i.e. Computer software 920 is, typically, stored in storage media 906 and is loaded into memory 902 prior to execution. Computer software 920 may comprise system software 921 and application software 222. System software 921 includes control software such as an operating system that controls the low-level operations of computing system 910. Column 22, lines 50-62);
direct the processing system to at least: receive audio input comprising one or more spoken words (Fig. 3, Step 310 i.e. At step 310, a human operator retrieves recorded information about a call by selecting an individual call from list box 210. Column 10, lines 11-25);
determine speech errors for the audio input using at least a sensitivity level (Fig. 5, Step 510 i.e. application software 222 is utilized to improve the quality of voice recognition services provided by voice recognition gateway 100. To accomplish this, at step 510 in Fig. 5, application software 222 causes the system to analyze system logs, such as call log 136, transcription log 430, and error log 440, to determine the accuracy or efficiency level of the system.  System settings, such as 
determine whether an amount and type of speech errors requires adjustment to the sensitivity level (i.e. Using the information recorded in system logs, the system determines and assigns an accuracy and efficiency level to the existing voice recognition system at step 520. Based on the calculated accuracy and efficiency levels, at step 530, the system determines if the voice recognition system needs to be adjusted to produce better results. Column 12, lines 19-34)
adjust the sensitivity level to a second sensitivity level based on the amount and type of the speech errors (i.e. The voice recognition system settings, including the confidence threshold, are adjusted so that the number of false accepts and false reject type errors are minimized and the number of out-of-grammar correct rejects are maximized. The system calculates the recognition accuracy rate based on the number of correct accepts and correct rejects recorded in the system logs. By analyzing the accuracy results, the system may adjust the confidence threshold to improve recognition accuracy. Column 18, lines 20-39)
the second sensitivity level being different than the sensitivity level (i.e. The system confidence threshold is set so that out-of-grammar correct reject rate is at least 50% and the in-grammar false reject rate is equal to twice the in-grammar false accept rate. Column 18, lines 20-39)
and re-determine the speech errors for the audio input using at least the second sensitivity level (Fig. 5, Steps 550-560 i.e. Once the system has determined the sources of error, then at step 550 the system hypothesizes solutions that can resolve system inefficiencies and prevent certain errors from occurring. Exemplifying methods for hypothesizing solutions for various sources of error are illustrated in FIGS. 7A through 7B. Once the system has hypothesized one or more solutions, the system at step 560 reconfigures the voice recognition system based on one or more hypothesized solutions. The system may, for example, add or delete certain phonetic definitions or pronunciations to the recognition grammar or modify threshold settings depending on the types of errors detected and the solutions hypothesized. Column 12, lines 49-60)
Doyle ‘710 teaches most of the subject matter as described as above except the fact that Doyle ‘710 describes an input as a user’s utterance. Applicant’s Application uses an audio input. For example, Applicant’s application the Abstract teaches that the sensitivity feature can receive audio input comprising one or more spoken words. Doyle ‘710 teaches at the Abstract that the voice recognition information comprises a recognized voice command associated with the user 
Although, Doyle ‘710 at Fig. 5, Step 510, Column 11, lines 45-65 teaches determining speech errors for the audio input using at least a sensitivity level, Doyle ‘710 does not expressly disclose determining speech errors indicating a mismatch between what the user intends to say and what the user actually says for the audio input using at least a sensitivity level.
D’Amato ‘572 discloses determining speech errors indicating a mismatch between a word the user intends to say and what the user actually says for the audio input using at least a sensitivity level (i.e. Some error in performing keyword matching is expected. Within examples, the local NLU may generate a confidence score when determining an intent, which indicates how closely the transcribed words in the signal SASR match the corresponding keywords in the library of the local NLU. Performing an operation according to a determined intent is based on the confidence score for keywords matched in the signal SASR. For instance, the NMD 703 may perform an operation according to a determined intent when the confidence score for a given sound exceeds a given threshold value (e.g., 0.5 on a scale of 0-1, indicating that the given sound is more likely than not the command keyword). Conversely, when the confidence score for a given intent is at or below the given threshold value, the NMD 703 does not perform the operation according to the determined intent. The identified keywords and/or any associated confidence scores can be compared between the two NMDs to make a final intent determination. Paragraphs 0162-0164).
(D’Amato ‘572 at “Field Of The Disclosure”). 
	At the time the invention was effectively filed, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Doyle ‘710 by adding determining speech errors indicating a mismatch between what the user intends to say and what the user actually says for the audio input using at least a sensitivity level as taught by D’Amato ‘572. The motivation for doing so would have been advantageous because processing the user’s intent as opposed to what the user said, is processing what the user actually means. Hence, the user would be satisfied with the execution of the output. Therefore, it would have been obvious to combine Doyle ‘710 with D’Amato ‘572 to obtain the invention as specified.

Regarding claim 13; Doyle ‘710 discloses wherein the second sensitivity level is lower than the sensitivity level (See Appendixes A & B i.e. Appendixes A & B shows a list that reflects the recognition accuracy level of the system at various confidence thresholds.  Columns 23-24). 

Regarding claim 14; Doyle ‘710 discloses wherein the second sensitivity level is higher than the sensitivity level (See Appendixes A & B i.e. Appendixes A & B shows a list that reflects the recognition accuracy level of the system at various confidence thresholds.  Columns 23-24).

Regarding claim 15; Doyle ‘710 discloses wherein the instructions to determine the speech errors for the audio input based on at least the sensitivity level direct the processing system to: obtain a speech score for each spoken word of the audio input, wherein the speech score comprises a mispronunciation score, a repetition score, an insertion score, a substitution score, an omission score, a hesitation score, or a combination thereof (i.e. For example, if "read it" is being confused 
and apply the sensitivity level to the obtained speech score to determine the speech errors for the audio input (i.e. The voice recognition system settings, including the confidence threshold, are adjusted so that the number of false accepts and false reject type errors are minimized and the number of out-of-grammar correct rejects are maximized. The system calculates the recognition accuracy rate based on the number of correct accepts and correct rejects recorded in the system logs. By analyzing the accuracy results, the system may adjust the confidence threshold to improve recognition accuracy. Column 18, lines 20-39)

Regarding claim 16; Doyle ‘710 discloses wherein the instructions to apply the sensitivity level to the obtained speech score to determine the speech errors direct the processing system to: for each word of the one or more spoken words in the audio input, determine if the speech score is above a threshold value, the threshold value being set by the sensitivity level (i.e. Another source of error may be a confidence threshold level that is too high or too low. In some embodiments a source of error is determined to be a high confidence threshold, if a high rate of IGFR type errors are detected. Column 3, line 66 thru Column 4, line 16)
and if the speech score is above the threshold value, flagging the word of the one or more spoken words in the audio input as having a speech error (i.e. In a voice recognition system with high confidence threshold setting, even a slight difference in acoustic similarity can cause the voice recognition system to reject a user utterance it otherwise should have accepted, leading to an IGFR. Column 3, line 66 thru Column 4, line 16).

Regarding claim 1; Claim 1 contains substantially the same subject matter as claim 12. Therefore, claim 1 is rejected on the same grounds as claim 12.

Regarding claim 2; Claim 2 contains substantially the same subject matter as claim 13. Therefore, claim 2 is rejected on the same grounds as claim 13.

Regarding claim 3; Doyle ‘710 discloses wherein the amount of speech errors is 25% and the type of speech errors is a mispronunciation error (See Appendixes A & B i.e. Appendixes A & B shows a list that reflects the recognition accuracy level of the system at various confidence thresholds. Appendixes A & B also shows the percentages Correct/Reject.  Columns 23-24).

Regarding claim 4; Claim 4 contains substantially the same subject matter as claim 14. Therefore, claim 4 is rejected on the same grounds as claim 14.

Regarding claim 5; Doyle ‘710 discloses wherein the amount of speech errors is none (See Appendixes A & B i.e. Appendixes A & B shows a list that reflects the recognition accuracy level of the system at various confidence thresholds. Appendixes A & B also shows wherein the speech errors are none (0%).  Columns 23-24)
and the type of speech errors is a mispronunciation error (i.e. For example, if "read it" is being confused with "delete it" because the two phrases are acoustically similar, then the system would, for example, remove "delete it" from the grammar's vocabulary and substitute it with "get rid of it". The phrase "get rid of it" is not acoustically similar to "read it" and therefore cannot be as easily confused by the system. Column 15, lines 49-55).

Regarding claim 6; Claim 6 contains substantially the same subject matter as claim 15. Therefore, claim 6 is rejected on the same grounds as claim 15.

Regarding claim 7; Claim 7 contains substantially the same subject matter as claim 15. Therefore, claim 7 is rejected on the same grounds as claim 15.

Regarding claim 8; Doyle ‘710 discloses wherein the obtaining of the speech score for each spoken word of the audio input comprises: communicating the audio input to a speech service; (i.e. A modified version of LDA can be used to find the closest alignment of two phoneme strings, generating an acoustic similarity score in the process. For example, the chart provided immediately below is a Levenstein phoneme alignment for the word "operating" with its canonical pronunciation along the vertical axis and a recognized phoneme string based on what the user said along the horizontal access. The score in the upper right corner represents the "cost of matching" the two words. In this example, the cost is 4. It can be noted that "cl" represents an unvoiced or silent phoneme. Column 16, lines 1-18) 

Regarding claim 9; Claim 9 contains substantially the same subject matter as claim 16. Therefore, claim 9 is rejected on the same grounds as claim 16.

Regarding claim 10; Doyle ‘710 discloses surfacing a visual indication of each of the determined speech errors in an application (See Appendixes A & B i.e. Appendixes A & B shows a visual representation of the speech errors at Columns 23-24. The LDA chart also show a visual representation of a simple metric for testing the similarity of strings at Column 16).

Regarding claim 11; Doyle ‘710 discloses determining one or more reading or speaking tools that correspond to the determined speech errors; and providing the one or more reading or speaking tools for a display of an application (i.e. Transcription software 420 can be utilized to view and analyze each entry in call log 136 to determine whether a user utterance was properly recognized, and if not, the possible reasons for the improper recognition. When transcription software 420 is executed, a GUI, such as that shown in Fig. 2, is provided to the human operator. The GUI may include several text and list boxes that display information about entries in call log 136. Boxes 210, 220, 230, 240, 250, 255, and 260 are exemplifying interface tools that can be utilized to implement transcription software 420's GUI. Column 9, lines 49-59)

Regarding claim 17; Claim 17 contains substantially the same subject matter as claim 12. Therefore, claim 17 is rejected on the same grounds as claim 12. However, claim 17 further 

Regarding claim 18; Claim 18 contains substantially the same subject matter as claim 15. Therefore, claim 18 is rejected on the same grounds as claim 15.

Regarding claim 20; Claim 20 contains substantially the same subject matter as claim 6. Therefore, claim 20 is rejected on the same grounds as claim 6.

Regarding claim 21; Doyle ‘710 discloses wherein the speech error is an oral reading miscue made while the user is performing an oral reading (i.e. voice recognition information produced by a voice recognition system in response to recognizing a user utterance is analyzed. The voice recognition information comprises a recognized voice command associated with the user utterance and a reference to an audio file that includes the user utterance. Based on the analysis, a recognition error may be identified and the source of the error determined. See Abstract)


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy P. Goddard can be reached on 517-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Acting SPE
Art Unit 2677



/MARCUS T RILEY/Primary Examiner, Art Unit 2677