DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 9/04/2019. Claims 1-20 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-4, 15-18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007, November). Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces (pp. 358-365).
Regarding claim 1, Audhkhasi teaches a data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of (see Audhkhasi, Fig. 4 , 401, 407, 409): receiving a transcript for the audio data, the transcript including a plurality of words spoken during the speech rehearsal session (see Audhkhasi, col. 4, lines 52-55 teaches in block 307 the recording is passed through a standard Automatic Speech Recognition (ASR) system to obtain the word-level hypothesis); determining a number of syllables in each of the plurality of words (see Audhkhasi Col. 5 lines 11-13 Block 311 involves developing lexical features from the speech sample, for example, the phone level hypotheses and word level hypotheses; interpreted as number of syllables in each of the words); calculating a speaking rate based at least in part on the number of syllables (see Audhkhasi, col. 6 lines 5-6, Upon completing 311 the method proceeds to 313 to compute the rate of speech for the speaker); determining if the speaking rate is within a threshold range (see Audhkhasi, col. 8, lines 14-21 The fluency score serves as a simple guideline for evaluating the fluency of an individual. It may be used for various purposes, including for example, to gauge the speaker's progress in mastering a new language, in the evaluation process of an employee, to help the speaker improve his/her language skills, to help make a decision in the hiring process, or other such purposes relating to one's mastery of spoken language skills; fluency score in the evaluation process is interpreted as speaking rate within a threshold range); enabling display of a notification on a display device in real time, if the speaking rate falls outside the threshold range (see Audhkhasi, col. 8, lines 47-50, col 8 lines 62-65 The feedback may be formatted to display the speaker's detrimental language characteristics in the order in which they contribute towards the speaker's disfluency) and receiving audio data from a speech rehearsal session (see Audhkhasi, col. 2, lines 9-10, teaches gathering a speech sample from the person to be evaluated; col. 4, lines 30-32 the speaker may speak into a microphone suitable for recording voice or other audio). However, Audhkhasi fails to teach receiving audio data from a speech rehearsal session over a network, the speech rehearsal session being performed for a digital presentation. However, Kurihara teaches receiving audio data from a speech rehearsal session over a network, the speech rehearsal session being performed for a digital presentation (see Kurihara, pg. 360, sect. 4.1, The Presentation Sensei system consists of several modules connected by a network (Figure 2). The audio analysis module continuously analyses the input signal from a microphone and pg. 360 sect 4.1 provides the integration module with the results of the utterance duration detection, pitch (F0) detection, and filled pause detection).
Audhkhasi and Kurihara are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi on automatically evaluating a person's spoken fluency and providing a score that quantifies the fluency of the speaker with the  recommendations for improving the delivery of the presentation teachings of Kurihara to reduce the number of fillers in the presentations( see Kurihara, pg. 359, sect. 1).
	Regarding claim 2, Audhkhasi in view of Kurihara teach the data processing system of claim 1.  Audhkhasi further teaches wherein the transcript includes metadata from which a time period for a duration of the audio data may be calculated (see Audhkhasi, Col 6 lines 7-12 Once the ASR processing is complete the rate of speech can be determined. The rate of speech calculation is based on transcript of lexical features resulting from the ASR processing. The duration of the phones may be used to compute rate of speech).
	Regarding claim 3, Audhkhasi in view of Kurihara teach the data processing system of claim 2.  Kurihara further teaches wherein the speaking rate is calculated based at least in part on the time period (see Kurihara, pg. 361, 4,4 The recognition results (a series of moras) and their corresponding utterance durations are sent to the integration module. The speaking rate excluding silence is calculated by dividing the number of moras by the duration).
Audhkhasi and Kurihara are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi on automatically evaluating a person's spoken fluency and providing a score that quantifies the fluency of the speaker with the  recommendations for improving the delivery of the presentation teachings of Kurihara to reduce the number of fillers in the presentations( see Kurihara, pg. 359, sect. 1).
Regarding claim 4, Audhkhasi in view of Kurihara teach the data processing system of claim 1.  Audhkhasi further teaches wherein the number of syllables is determined by detecting a number of syllable nuclei in the plurality of words (see Audhkhasi, Col 5, lines 25-40 The "Repetition" form of disfluency can be located by detecting instances of closely occurring N-grams, that is, the closely-occurring exact and inexact repeat N-grams. An N-gram is said to be closely-occurring if the distance between consecutive occurrences of a N-gram is less than W words, where W is a predefined variable. The optimal value of the variable W can be learned from training data or other empirical results. A typical range for the variable W is 1 to 5 words although it could be different for different training datasets or domains; N-gram interpreted as syllable nuclei).
Regarding claim 15, Audhkhasi teaches method for providing speech rehearsal assistant during a presentation rehearsal comprising: receiving audio data from a speech rehearsal session over a network (see Audhkhasi, col. 2, lines 9-10,  part of the process involves gathering a speech sample from the person to be evaluated, by having the person speak, ad hoc, on a given topic, or by engaging the person in a conversation, or by recording a live conversation of the person); receiving a transcript for the audio data, the transcript including a plurality of words spoken during the speech rehearsal session (see Audhkhasi, col. 4, lines 52-55 teaches in block 307 the recording is passed through a standard Automatic Speech Recognition (ASR) system to obtain the word-level hypothesis); calculating a real time speaking rate for the speech rehearsal session (see Audhkhasi, col. 6 lines 5-6, Upon completing 311 the method proceeds to 313 to compute the rate of speech for the speaker); determining if the speaking rate is within a threshold range (see Audhkhasi, col. 8, lines 14-21 The fluency score serves as a simple guideline for evaluating the fluency of an individual. It may be used for various purposes, including for example, to gauge the speaker's progress in mastering a new language, in the evaluation process of an employee, to help the speaker improve his/her language skills, to help make a decision in the hiring process, or other such purposes relating to one's mastery of spoken language skills; fluency score in the evaluation process is interpreted as speaking rate within a threshold range); detecting utterance of a filler phrase or sound during the speech rehearsal session using at least in part a machine learning model trained for identifying filler phrases and sounds in a text (see Audhkhasi col. 4 lines 61-67, In block 307 the recording is passed through a standard Automatic Speech Recognition (ASR) system. Once the initial speech recognition processing is complete in 307 the method proceeds to 309. In block 309 the prosodic features of the speech sample are calculated. The prosodic features of interest generally include filled-pause and amount of silence based features. The filled-pauses features may be detected using measures based on the stability of the formants of the speech signal; filled-pause and silence is interpreted as filler sound, ASR system the machine learning model ); upon at least one of determining the speaking rate falls outside the threshold range or detecting the utterance of the filler phrase or sound, enabling real time display of a notification on a display device (see Audhkhasi, col. 8, lines 47-50, col 8 lines 62-65 The feedback may be formatted to display the speaker's detrimental language characteristics in the order in which they contribute towards the speaker's disfluency).
	Kurihara further teaches upon at least one of determining the speaking rate falls outside the threshold range or detecting the utterance of the filler phrase or sound, enabling real time display of a notification on a display device (see pg. 360, sect. 4.2 The online feedback provides the presenter with short term statistics of the indices during a presentation. “Real time monitor (Figure 3 left)” indicates the indices as they are. They are used by the presenter to check the recent status visually. “Alerts” are the notifications of 6 kinds of information shown in Figure 3 right).
Audhkhasi and Kurihara are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi on automatically evaluating a person's spoken fluency and providing a score that quantifies the fluency of the speaker with the  recommendations for improving the delivery of the presentation teachings of Kurihara to reduce the number of fillers in the presentations( see Kurihara, pg. 359, sect. 1).
Regarding claim 16, Audhkhasi in view of Kurihara teach method of claim 15. Audhkhasi further teaches wherein the transcript includes metadata from which a time period for a duration of the audio data may be calculated and the speaking rate is calculated based at least in part on the time period (see Audhkhasi, Col 6 lines 7-12 Once the ASR processing is complete the rate of speech can be determined. The rate of speech calculation is based on transcript of lexical features resulting from the ASR processing. The duration of the phones may be used to compute rate of speech).
Regarding claim 17, Audhkhasi in view of Kurihara teach method of claim 15. Kurihara further teaches wherein the speaking rate is calculated based in part on a number of syllables detected in the plurality of words (see Kurihara, pg. 361, sect. 4.4 The speech recognition module executes a mora-based speech recognition from the presenter’s voice input using a microphone. The recognition results (a series of moras) and their corresponding utterance durations are sent to the integration module. The speaking rate excluding silence is calculated by dividing the number of moras by the duration).
Audhkhasi and Kurihara are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi on automatically evaluating a person's spoken fluency and providing a score that quantifies the fluency of the speaker with the  recommendations for improving the delivery of the presentation teachings of Kurihara to reduce the number of fillers in the presentations( see Kurihara, pg. 359, sect. 1).
Regarding claim 18, Audhkhasi in view of Kurihara teach the data processing system of claim 1.  Audhkhasi further teaches wherein the number of syllables is determined by detecting a number of syllable nuclei in the plurality of words (see Audhkhasi, Col 5, lines 25-40 The "Repetition" form of disfluency can be located by detecting instances of closely occurring N-grams, that is, the closely-occurring exact and inexact repeat N-grams. An N-gram is said to be closely-occurring if the distance between consecutive occurrences of a N-gram is less than W words, where W is a predefined variable. The optimal value of the variable W can be learned from training data or other empirical results. A typical range for the variable W is 1 to 5 words although it could be different for different training datasets or domains; N-gram interpreted as syllable nuclei).
Regarding claim 20, Audhkhasi in view of Kurihara teach the data processing system of claim 15. Audhkhasi further teaches detecting disfluency during the speech rehearsal session based at least in part on the audio data (see Audhkhasi, col 6 lines 12-20 Block 315 involves hypothesizing the disfluency characteristics from the speaker's speech sample. Some of the examples of disfluency characteristics are: (a) unnaturally long and/or frequent silent pauses (i.e., silences) in the speech signal, (b) insertions of filled-pauses like "ahh", "umm" and/or vowel-extensions like "theee", (c) frequent use of a word/phrase during the speaker's turn (e.g., "basically", "you know"), (d) frequent and closely occurring repetitions of exact and/or in-exact N-grams, and (e) a combination of two or more of the above characteristics ).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007, November). Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces (pp. 358-365) further in view of S. Shangavi, S. Jeyamaalmarukan, A. Jathevan, M. Umatharsini and P. Samarasinghe, "Self-Speech Evaluation with Speech Recognition and Gesture Analysis," 2018 National Information Technology Conference (NITC), 2018, pp. 1-7.
Regarding claim 5, Audhkhasi in view of Kurihara teach the data processing system of claim 4, however fail to teach the syllable nuclei is detected by examining a plurality of parameters of the audio data, the plurality of parameters including pitch and intensity. However, Shangavi teaches the syllable nuclei is detected by examining a plurality of parameters of the audio data, the plurality of parameters including pitch and intensity (see Shangavi, pg. 5, sect II D , c Display Variation in the Speech Transcript, in this phase system converts audio into text format to show the places in the speech transcript where pitch and volume variations are used. Microsoft Cognitive Services [18] is used for audio to text conversion. When detecting vocal variations (Pitch, Volume) separately in a speech, the system identifies the time period of the variation and shows those places in the speech transcript ).
Audhkhasi, Kurihara and Shangavi are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi and Kurihara on automatically evaluating a person's spoken fluency and recommendations for improving the delivery of the presentation with the tracking of vocal variations teachings of Shangavi to give feedback on the modulation in the sounds formed by speaking ( see Shangavi, sect. I).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007, November). Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces (pp. 358-365) further in view of S. Shangavi, S. Jeyamaalmarukan, A. Jathevan, M. Umatharsini and P. Samarasinghe, "Self-Speech Evaluation with Speech Recognition and Gesture Analysis," 2018 National Information Technology Conference (NITC), 2018, pp. 1-7 further in view of  Lu, US Patent Application Publication 2009/0089062.
Regarding claim 6, Audhkhasi in view of Kurihara further in view of  Shangavi teach the data processing system of claim 5. Kurihara further teaches determine an utterance time based on the audio data ( see Kurihara, pg. 361, sect. 4.4 The speech recognition module executes a mora-based speech recognition from the presenter’s voice input using a microphone.
The recognition results (a series of moras) and their corresponding utterance durations are sent to the integration module).
Audhkhasi, Kurihara and Shangavi are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi and Shangavi on automatically evaluating a person's spoken fluency with the recommendations for improving the delivery of the presentation teachings of Kurihara to reduce the number of fillers in the presentations( see Kurihara, pg. 359, sect. 1). However, fails to teach calculate the speaking rate based at least in part on the number syllable nuclei and the utterance time.  However, Lu teaches calculate the speaking rate based at least in part on the number syllable nuclei and the utterance time (see Lu, [0016]  during operation of the embodiment shown in FIG. 1, the Speech Speed Determining Logic 30 counts the number of syllables identified in the input speech, and continuously determines a current syllable rate for the input speech).  
Audhkhasi, Kurihara , Shangavi and Lu are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi, Kurihara and Shangavi on automatically evaluating a person's spoken fluency and presentation skills with the public speaking evaluation tool of Lu to helps a user practice public speaking ( see Lu, [0004]).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007, November). Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces (pp. 358-365) further in view of  Spizzo, US Patent Application Publication 2015/0310852.
Regarding claim 7, Audhkhasi in view of Kurihara teach the data processing system of claim 1. However, fail to teach wherein the threshold range is determined based on a historical information relating to a user who is conducting the speech rehearsal session. However, Spizzo teaches wherein the threshold range is determined based on a historical information relating to a user who is conducting the speech rehearsal session (see Spizzo, [0034] FIG. 3 illustrates example 300 of pre-defined criteria for rating speech effectiveness, in accordance with an embodiment of the present invention. In this embodiment, each type of speech problem is given a weight factor based on the speaking mode. A threshold basis is defined for each speech problem, and a threshold or rating is calculated for each speech problem; interpreted as historical information relating to the user).
Audhkhasi, Kurihara and Spizzo are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi and Kurihara on automatically evaluating a person's spoken fluency and presentation skills with the speech analysis and rating tool of Spizzo to improve speech effectiveness ( see Spizzo, [0005]).
Claims 8-9 and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of S. Shangavi, S. Jeyamaalmarukan, A. Jathevan, M. Umatharsini and P. Samarasinghe, "Self-Speech Evaluation with Speech Recognition and Gesture Analysis," 2018 National Information Technology Conference (NITC), 2018, pp. 1-7.
Regarding claim 8, Audhkhasi teaches a data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of (see Audhkhasi, Fig. 4 , 401, 407, 409): receiving audio data from a speech rehearsal session over a network (see Audhkhasi, col. 2, lines 9-10,  part of the process involves gathering a speech sample from the person to be evaluated, by having the person speak, ad hoc, on a given topic, or by engaging the person in a conversation, or by recording a live conversation of the person); receiving a transcript for the audio data, the transcript including a plurality of words spoken during the speech rehearsal session (see Audhkhasi, col. 4, lines 52-55 teaches in block 307 the recording is passed through a standard Automatic Speech Recognition (ASR) system to obtain the word-level hypothesis); detecting utterance of a filler phrase or sound during the speech rehearsal session using at least in part a machine learning model trained for identifying filler phrases and sounds in a text (see Audhkhasi col. 4 lines 61-67, In block 307 the recording is passed through a standard Automatic Speech Recognition (ASR) system. Once the initial speech recognition processing is complete in 307 the method proceeds to 309. In block 309 the prosodic features of the speech sample are calculated. The prosodic features of interest generally include filled-pause and amount of silence based features. The filled-pauses features may be detected using measures based on the stability of the formants of the speech signal; filled-pause and silence is interpreted as filler sound, ASR system the machine learning model ); upon detecting the utterance of the filler phrase or sound, enabling real time display of a notification on a display device (see Audhkhasi, col. 8, lines 47-50, col 8 lines 62-65 The feedback may be formatted to display the speaker's detrimental language characteristics in the order in which they contribute towards the speaker's disfluency); wherein detecting the utterance of the filler phrase or sound is done based on at least one of the transcript of the audio data or the audio data (see Audhkhasi, col. 5 lines 11-24 teaches in Block 311, the ASR involves performing a Viterbi search to match the neural network output scores to target words assumed to be in the input speech in order to determine the word that was most likely uttered, The relative frequency of various words is computed to find out speaker-specific discourse-markers (e.g., "you know," "basically," "I mean" and so on); the filled-pauses and other disfluency indicators are interpreted as filler words).  Shangavi further teaches detecting utterance of a filler phrase or sound during the speech rehearsal session using at least in part a machine learning model trained for identifying filler phrases and sounds in a text (see Shangavi, pg. 4 &, section II B & III B: In order to identify the filler words, the system should convert speech to text and the process is done through the audio transcription. The audio should be in “.wav format”, sample rate-16 000 kHz and audio encoding 16 bits. We used Microsoft cognitive Service to identify the filler words [19]. Filler words are highlighted, and numbers of highlighted words are shown in interface. The system will identify the filler words and count the number of times spoken. The system is tested by Word Error Rate (WER) and Word Accuracy. Those are the common metrics of the performance of speech recognition or a machine translation system; Microsoft Cognitive Service is interpreted as the machine learning model to identify the filler words).
Audhkhasi and Shangavi are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi on automatically evaluating a person's spoken fluency with self-evaluation toolkit teachings of Shangavi to give feedback on the speech delivery of a user ( see Shangavi, sect. I).
Regarding claim 9, Audhkhasi in view of Shangavi teach the data processing system of claim 8. Audhkhasi further teaches wherein detecting the utterance of the filler phrase or sound based on the audio data includes examining parameters of the audio data including pitch or intensity (see Audhkhasi, col 7 lines 49-53 In block 317 prosodic features are used to evaluate hypothesized disfluency characteristics. For example, prosodic features may be used to disambiguate whether a likely discourse marker or other disfluency characteristic is a contributing part of the sentence, or simply contributes to disfluency ;  col 6 lines 12-20 Block 315 involves hypothesizing the disfluency characteristics from the speaker's speech sample. Some of the examples of disfluency characteristics are: (a) unnaturally long and/or frequent silent pauses (i.e., silences) in the speech signal, (b) insertions of filled-pauses like "ahh", "umm" and/or vowel-extensions like "theee", (c) frequent use of a word/phrase during the speaker's turn (e.g., "basically", "you know"), (d) frequent and closely occurring repetitions of exact and/or in-exact N-grams, and (e) a combination of two or more of the above characteristics).
Regarding claim 11, Audhkhasi in view of Shangavi teach the data processing system of claim 8. Audhkhasi further teaches wherein the machine learning model is a natural language processing model utilized to identify if a word is part of a phrase or sentence (see Audhkhasi, col. 5 lines 11-17, Block 311 involves developing lexical features from the speech sample, for example, the phone level hypotheses and word level hypotheses. The ASR system may use a neural network to classify features into phonetic-based categories at each frame. Typically, ASR involves performing a Viterbi search to match the neural network output scores to target words assumed to be in the input speech in order to determine the word that was most likely uttered).
Regarding claim 12, Audhkhasi in view of Shangavi teach the data processing system of claim 8. Audhkhasi further teaches wherein the executable instructions, when executed by the processor, further cause the data processing system to detect disfluency during the speech rehearsal session based at least in part on the audio data (see Audhkhasi, col 6 lines 12-20 Block 315 involves hypothesizing the disfluency characteristics from the speaker's speech sample. Some of the examples of disfluency characteristics are: (a) unnaturally long and/or frequent silent pauses (i.e., silences) in the speech signal, (b) insertions of filled-pauses like "ahh", "umm" and/or vowel-extensions like "theee", (c) frequent use of a word/phrase during the speaker's turn (e.g., "basically", "you know"), (d) frequent and closely occurring repetitions of exact and/or in-exact N-grams, and (e) a combination of two or more of the above characteristics ).
Regarding claim 13, Audhkhasi in view of Shangavi teach the data processing system of claim 12. Audhkhasi further teaches wherein detecting disfluency includes detecting an inflection point (see Audhkhasi, col 7 lines 49-53 In block 317 prosodic features are used to evaluate hypothesized disfluency characteristics. For example, prosodic features may be used to disambiguate whether a likely discourse marker or other disfluency characteristic is a contributing part of the sentence, or simply contributes to disfluency; interpreted to detecting inflection point).
Regarding claim 14, Audhkhasi in view of Shangavi teach the data processing system of claim 12. Audhkhasi further teaches wherein the notification includes a notice about the detected disfluency (see Audhkhasi, col. 8, lines 47-50, col 8 lines 62-65 The feedback may be formatted to display the speaker's detrimental language characteristics in the order in which they contribute towards the speaker's disfluency).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of S. Shangavi, S. Jeyamaalmarukan, A. Jathevan, M. Umatharsini and P. Samarasinghe, "Self-Speech Evaluation with Speech Recognition and Gesture Analysis," 2018 National Information Technology Conference (NITC), 2018, pp. 1-7 further in view of Kaushik, L., Sangwan, A., & Hansen, J. H. (2015). Laughter and filler detection in naturalistic audio.
Regarding claim 10, Audhkhasi in view of Shangavi teach the data processing system of claim 9, however fail to teach detecting the utterance of the filler phrase or sound includes examining parameters of the audio data including pitch, intensity, or frequency using a deep neural network. However, Kaushik teaches detecting the utterance of the filler phrase or sound includes examining parameters of the audio data including pitch, intensity, or frequency using a deep neural network (see Kaushik, pg. 2509, sect. 1& 2 the baseline system uses 141- dimensional (141-d) openSMILE [29] feature set, which includes MFCCs (Mel-filter Cepstral Coefficients), F0, and voicing probabilities. The features are used to train a 3-way DNN (Deep Neural Network) classifier. Particularly, the DNN classifier is trained to distinguish between garbage, filler and laughter. Garbage refers to all non-filler and non-laughter frames (and it includes speech) Laughter and fillers are signals that evolve in time and frequency, and classifiers that can detect and exploit joint timefrequency patterns are likely to model the signal better and deliver superior performance ).
Audhkhasi, Shangavi and Kaushik are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi and Shangavi on automatically evaluating a person's spoken fluency with self-evaluation toolkit with the  filler and laughter classification system of neural network teachings of Kaushik to improve the automatic detection of the laughter and fillers in verbal speech ( see Kaushik, pg. 2509 sect. 1).
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Audhkhasi et. al, US Patent 8,457,967 in view of Kurihara, K., Goto, M., Ogata, J., Matsusaka, Y., & Igarashi, T. (2007, November). Presentation sensei: a presentation training system using speech and image processing. In Proceedings of the 9th international conference on Multimodal interfaces (pp. 358-365 further in view of Kaushik, L., Sangwan, A., & Hansen, J. H. (2015). Laughter and filler detection in naturalistic audio.
Regarding claim 19, Audhkhasi in view of Kurihara teach the data processing system of claim 15, however fail to teach detecting the utterance of the filler phrase or sound includes examining parameters of the audio data including pitch, intensity, or frequency using a deep neural network. However, Kaushik teaches detecting the utterance of the filler phrase or sound includes examining parameters of the audio data including pitch, intensity, or frequency using a deep neural network (see Kaushik, pg. 2509, sect. 1& 2 the baseline system uses 141- dimensional (141-d) openSMILE [29] feature set, which includes MFCCs (Mel-filter Cepstral Coefficients), F0, and voicing probabilities. The features are used to train a 3-way DNN (Deep Neural Network) classifier. Particularly, the DNN classifier is trained to distinguish between garbage, filler and laughter. Garbage refers to all non-filler and non-laughter frames (and it includes speech) Laughter and fillers are signals that evolve in time and frequency, and classifiers that can detect and exploit joint time frequency patterns are likely to model the signal better and deliver superior performance ).
Audhkhasi, Kurihara and Kaushik are considered to be analogous to the claimed invention because they relate to evaluation of speaker’s spoken communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Audhkhasi and Kurihara on automatically evaluating a person's spoken fluency with self-evaluation toolkit with the  filler and laughter classification system of neural network teachings of Kaushik to improve the automatic detection of the laughter and fillers in verbal speech ( see Kaushik, pg. 2509 sect. 1).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Bassemir, US Patent 9,792,908, teaches analysis engine to generate new tools and services by training the analysis engine for generating the analysis report according to the selected speech delivery analysis criteria (see Bassemir, Fig. 2).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656