DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/20/2021 was filed after the mailing date of the Non-Final Rejection on 07/09/2021.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
This communication is responsive to the applicant’s amendment dated 10/12/2021.  The applicant(s) amended claims 1, 3, and 20 and canceled claim 21.

Response to Arguments
Applicant's arguments with respect to claims 1-20 have been considered but are moot in view of the new ground(s) of rejection because the arguments pertain to the newly amended limitations.

Claim Rejections - 35 USC § 103
Claims 1-3, 7, 12-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ljolje (US 20110077942 A1) in view of in view of Romano et al. (20160019885 A1), further in view of Loukina et al. (US 20150248898 A1).

claims 1 and 20, Ljolje teaches:
“a determination unit configured to determine at least one factor that causes an error in speech recognition, on a basis of a result of the speech recognition and information regarding an utterance” (par. 0025; ‘The system 100 first detects via a processor a misrecognized speech query from a user (202).’ ‘In some cases, the system can detect instead a situation which is likely to cause misrecognized speech, such as a prompt where a user's speech was previously misrecognized. The system 100 then determines a tendency of the user to repeat speech queries based on previous user interactions (204). The system 100 can determine the user's tendency based on the user's usage history, a user profile, similarities between the user and others who are likely to repeat speech queries, geographic data, social network information, time of day, type of query, background noise, and/or any other relevant information.’).
However, Ljolje does not expressly teach:
“a notification control unit configured to cause the at least one determined factor to be notified”; 
“wherein the notification control unit causes the at least one determined factor to be notified by indicating a degree of confidence of the result of the speech recognition for each word or phrase in the result of the speech recognition,” 
“wherein the degree of confidence of the result of the speech recognition for each word or phrase is expressed by a size of a color-coded display region on which the word or phrase is superimposed,” and

Romano teaches:
 “a notification control unit configured to cause the at least one determined factor to be notified” (par. 0055; ‘In some embodiments, the confidence level may also be visually indicated in the word cloud 410, such as by color, shade, font type, size, or other appearance quality of the word or phrase in the word cloud.’);
“wherein the notification control unit causes the at least one determined factor to be notified by indicating a degree of confidence of the result of the speech recognition for each word or phrase in the result of the speech recognition” (par. 0055; ‘In some embodiments, the confidence level may also be visually indicated in the word cloud 410, such as by color, shade, font type, size, or other appearance quality of the word or phrase in the word cloud.’);
“wherein the degree of confidence of the result of the speech recognition for each word or phrase is expressed by a size of a color-coded display region [[on which the word or phrase is superimposed]]” (par. 0055; ‘In some embodiments, the confidence level may also be visually indicated in the word cloud 410, such as by color, shade, font type, size, or other appearance quality of the word or phrase in the word cloud.’);
 and
“wherein the determination unit and the notification control unit are each implemented via at least one processor” (par. 0023; ‘Similarly, while description as provided herein refers to a computing system 1200 and a processing system 1206, it is to be recognized that implementations of such systems can be performed using one or 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ljoljie’s methods of determining causes for misrecognized speech by incorporating Romano’s word cloud 410 in order to express the confidence level visually in a similar manner. The combination would provide a way to indicate how likely it is that the word or phrase was identified and classified correctly. (Romano: par. 0055)
Ljoljie and Romano do not expressly teach word or phrase superimposed on a region, as in:
“wherein the degree of confidence of the result of the speech recognition for each word or phrase is expressed by a size of a color-coded display region on which the word or phrase is superimposed” (emphasis added)
 Loukina teaches “color-coded display region on which the word or phrase is superimposed” (par. 0030; ‘At 502, a first display is provided that highlights words having word intelligibility scores less than a threshold value. At 504, a second display uses highlighting and question marks to identify words having word intelligibility scores less than a threshold value. In a third example at 506, different color highlights or a gradient of colors are used to indicate word intelligibility scores.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the method of visually indicating confidence levels taught by Ljoljie in view of Romano by incorporating Loukina’s different color highlights or gradients of colors used to indicate word intelligibility scores in order to express the 

Regarding claim 2 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the determination unit determines the factor for each word or phrase in the result of the speech recognition” (Ljolje: par. 0029; ‘The user 302 interrupts yet again and says "no . . . Kevin Weel-yuuumms" 316 in order to further accentuate the misrecognized portions.’).

Regarding claim 3 (dep. on claim 2), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the notification control unit causes the at least one determined factor to be notified in association with each word or phrase in the result of the speech recognition” (Romano: par. 0055; ‘In some embodiments, the confidence level may also be visually indicated in the word cloud 410, such as by color, shade, font type, size, or other appearance quality of the word or phrase in the word cloud.’).

Regarding claim 7 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the information regarding an utterance includes information showing environmental noise, and the determination unit determines the factor derived from environmental noise on a basis of the information showing the environmental noise” (Ljolje: par. 0009; ‘The system 100 can determine the user's tendency based on the user's usage history, a user profile, similarities between the user and others who are likely to repeat speech queries, geographic data, social network information, time of day, type of query, background noise, and/or any other relevant information.’).

Regarding claim 12 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the information regarding an utterance includes information regarding speech recognition processing corresponding to the result of the speech recognition, and the determination unit determines the factor derived from the speech recognition processing on a basis of the information regarding the speech recognition” (Ljolje: par. 0025; ‘The system 100 first detects via a processor a misrecognized speech query from a user (202).’).

Regarding claim 13 (dep. on claim 12), the combination of Ljolje in view of Romano and Loukina further teaches:


Regarding claim 14 (dep. on claim 12), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the determination unit determines, as the factor, an utterance being hard to recognize” (Ljolje: par. 0025; ‘The system 100 first detects via a processor a misrecognized speech query from a user (202).’ ‘In some cases, the system can detect instead a situation which is likely to cause misrecognized speech, such as a prompt where a user's speech was previously misrecognized. The system 100 then determines a tendency of the user to repeat speech queries based on previous user interactions (204). The system 100 can determine the user's tendency based on the user's usage history, a user profile, similarities between the user and others who are likely to repeat speech queries, geographic data, social network information, time of day, type of query, 

Regarding claim 15 (dep. on claim 12), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the determination unit determines, as the factor, the degree of confidence of the result of the speech recognition being smaller than a threshold” (Ljolje: par. 0025; ‘The system can detect misrecognized speech by comparing a speech recognition confidence score to a misrecognition threshold, for example.’).

Regarding claim 16 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the notification control unit causes the factor to be visually notified” (Romano: par. 0055; ‘In some embodiments, the confidence level may also be visually indicated in the word cloud 410, such as by color, shade, font type, size, or other appearance quality of the word or phrase in the word cloud.’). 

Regarding claim 17 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein the notification control unit causes the factor to be auditorily notified” (the Examiner takes official notice. Audio feedback is well-known in the art. Therefore, it would have been obvious to provide the confidence levels taught by Ljolje in view of Romano and Loukina through audio feedback.).  

claim 18 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches:
“wherein, in a case where a plurality of the factors are determined, the notification control unit selects one factor from the plurality of factors, and causes the selected factor to be notified” (Ljolje: par. 0025; ‘The system 100 first detects via a processor a misrecognized speech query from a user (202).’ ‘In some cases, the system can detect instead a situation which is likely to cause misrecognized speech, such as a prompt where a user's speech was previously misrecognized. The system 100 then determines a tendency of the user to repeat speech queries based on previous user interactions (204). The system 100 can determine the user's tendency based on the user's usage history, a user profile, similarities between the user and others who are likely to repeat speech queries, geographic data, social network information, time of day, type of query, background noise, and/or any other relevant information.’ It would have been obvious to notify a selected factor.).

Claims 4-6, 8-11 are rejected under 35 U.S.C. 103 as being unpatentable over Ljolje in view of Romano and Loukina as applied to claim 1 above, and further in view of Kim et al. (20080101556 A1).

Regarding claim 4 (dep. on claim 1), Ljolje in view of Romano and Loukina does not teach:
“wherein the information regarding an utterance includes information showing sound volume of an utterance, and the determination unit determines the factor derived 
Kim teaches:
“wherein the information regarding an utterance includes information showing sound volume of an utterance, and the determination unit determines the factor derived from sound volume on a basis of the information showing the sound volume of the utterance” (par. 0009; ‘Further, the present invention provides an apparatus and method for analyzing potential failure reasons that may cause speech recognition failures in a speech recognition process, such as noise, a transmission error, speech volume, speech rate and so forth, and automatically providing an analysis result to a user.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ljolje in view of Romano and Lukina’s method of determining error conditions by incorporating Kim’s method of analyzing potential failure reasons that may cause speech recognition failures in order to determine the factor derived from sound volume on a basis of the information showing the sound volume of the utterance. The combination improves recognition environments, which results in an improvement in true recognition rate. (Kim: par. 0094)

Regarding claim 5 (dep. on claim 4), the combination of Ljolje in view of Romano, Loukina, and Kim further teaches:
“wherein the determination unit determines, as the factor, the sound volume being too large” (Kim: par. 0019; ‘More specially, the memory unit 150 stores a first loudness level for identifying loudly spoken speech, a second loudness level for 

Regarding claim 6 (dep. on claim 4), the combination of Ljolje in view of Romano, Loukina, and Kim further teaches:
“wherein the determination unit determines, as the factor, the sound volume being too small” (Kim: par. 0019; ‘More specially, the memory unit 150 stores a first loudness level for identifying loudly spoken speech, a second loudness level for identifying quietly spoken speech, a first rate level for identifying rapidly spoken speech, and a second rate level for identifying slowly spoken speech.’).

Regarding claim 8 (dep. on claim 7), the combination of Ljolje in view of Romano, Loukina, and Kim further teaches:
“wherein the information regarding an utterance further includes information showing sound volume of an utterance, and the determination unit determines the factor derived from environmental noise on a basis of the information showing the sound volume of the utterance and the information showing the environmental noise” (Kim: par. 0019; ‘More specially, the memory unit 150 stores a first loudness level for identifying loudly spoken speech, a second loudness level for identifying quietly spoken speech, a first rate level for identifying rapidly spoken speech, and a second rate level for identifying slowly spoken speech.’).

claim 9 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina does not expressly teach:
“wherein the information regarding an utterance includes information showing utterance speed, and the determination unit determines the factor derived from utterance speed on a basis of the information showing the utterance speed.”
Kim teaches:
“wherein the information regarding an utterance includes information showing utterance speed, and the determination unit determines the factor derived from utterance speed on a basis of the information showing the utterance speed” (par. 0086; ‘It must be possible to derive the probability that a speech recognition result is valid, by using an energy level analysis result, an ambient noise estimation result and a speech rate check-result.’). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ljolje in view of Romano and Loukina’s method of determining error conditions by incorporating Kim’s probability that a speech recognition result is valid in order to determine a factor derived from utterance speed on a basis of information showing utterance speed. The combination improves recognition environments, which results in an improvement in true recognition rate. (Kim: par. 0094)

Regarding claim 10 (dep. on claim 9), the combination of Ljolje in view of Romano, Loukina, and Kim further teaches:
“wherein the determination unit determines the factor derived from utterance speed by comparing the utterance speed shown by the information showing the 

Regarding claim 11 (dep. on claim 10), the combination of Ljolje in view of Romano, Loukina, and Kim further teaches:
“wherein the standard value is one of a value associated with a speaker, a value settled on a basis of an attribute of the speaker, and a value independent of the speaker” (Kim: par. 0046; ‘The speech rate analysis function refers to a function of estimating the number of syllables in speech data spoken by a user and presenting a result of analyzing an speech rate according to the corresponding number of syllables.’).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Ljolje in view of Romano and Loukina as applied to claim 1 above, and further in view of Gandhi et al. (US 20040015351 A1).

Regarding claim 19 (dep. on claim 1), the combination of Ljolje in view of Romano and Loukina further teaches a plurality of factors.
However, Ljolje in view of Romano and Loukina do not expressly teach:

Gandhi teaches:
“wherein, in a case where a plurality of the factors are determined, the notification control unit causes the plurality of factors to be notified while being switched” (par. 0036; ‘For example, such can be the case where, as determined by the field speech recognition system, an audio signal includes too much noise for reliable recognition to occur. Other error conditions also can be noted such as where the user speaks over a voice prompt as indicated by "[spoke too soon]" and where only silence is detected as indicated by "[silence]".’; par. 0038; ‘The method can begin in a state wherein a field speech recognition system has compiled a transaction log specifying text results and parameters such as date and time information and any other configuration parameters and failure conditions the speech recognition system is capable of logging.’).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the error conditions taught by Ljolje in view of Romano and Loukina by incorporating the failure conditions logging taught by Gandhi such that the plurality of factors are notified while being switched. The combination allows for determinations as to whether the field speech recognition system is properly configured for the audio environments from which the speech recognition system receives speech. (Gandhi: par. 0044)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK VILLENA whose telephone number is (571)270-3191. The examiner can normally be reached 10 am - 6pm EST Monday through Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARK . VILLENA
Examiner
Art Unit 2658



/MARK VILLENA/Examiner, Art Unit 2658