DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This office action is in response to Applicant’s reply filed Nov. 20, 2021. Claims 4 and 14 have been amended. Claims 1-19 are pending.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 9/19/22 has been entered.
 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 5, 7-12, 15, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Publication No. 2014/0278421 A1 to Komissarchik et al. (hereinafter “Komissarchik”) in view of US Publication No. 2004/0166480 A1 to Wen et al. (hereinafter “Wen”).
Concerning claim 1, Komissarchik discloses A computer-implemented method for analysing an audio signal representing speech of a user and for providing feedback to the user based on the speech (¶ [0010], ¶ [0012], ¶ [0020]), comprising: 
generating a representative sequence of phonemes based at least in part on the received body of text (¶ [0012] (“Such representative word sets preferably include all phonemes and triphones needed to correctly pronounce a predetermined percentage of the words in the non-native language.”), Fig. 5C, (¶ [0045], ¶ [0047]-[0052], ¶ [0088]); 
receiving an input audio signal, the input audio signal including a recording of a user reading the body of text (Fig. 3 (40, 41), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42”); 
identifying audio components in the input audio signal, and creating a mapping between the audio components and corresponding phonemes in the representative sequence of phonemes (Fig. 3 (42), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42 of ASR 16 returns list of recognition results 43”), ¶ [0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052] describing the algorithm used to map the received audio to the most closely matched node star, ¶ [0012]); 
based on the mapping, comparing respective audio components in the input audio signal to an expected audio component for a corresponding phoneme in the sequence of phonemes (¶ [0045], ¶ [0047-0052] (“If the highest ranked result 44 a . . . 44 n matches center phrase 45, the pronunciation is deemed acceptable … If none of the top N results (e.g., N=4) match center phrase 45 or the ray phrases for that center phrase, the pronunciation is deemed unacceptable … If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”) The center phrase corresponds to the phonemes that make up the expected audio component while the ray phrases correspond to phonemes that make up likely mispronunciations.); 
based on the comparison, determining a score for each audio component indicating a level of similarity between the respective audio component in the input audio signal and the expected audio component for the corresponding phoneme (¶ [0044-0045] (“In the illustrative embodiment of FIG. 3, ASR engine 42 provides up to 10 results ordered by a confidence score. In some cases not only the order of results but the confidence scores for each of the results may be provided. The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052]); 
based on the respective scores for each audio component, identifying in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme (¶ [0044-0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything.”), ¶ [0047-0052], ¶ [0062-0064] (“feedback for this particular mispronunciation will be provided to the user if this mispronunciation has a higher score than other mispronunciations.”), ¶ [0074-0075]); 
based on the identified mispronunciation of a particular phoneme, identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme (¶ [0033-0039] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user, gender, lisp, ASR confusions, reliability of recognition, ability to devise the reason for mispronunciation”), ¶ [0066-0076] (“Still referring to FIG. 4, process 51 of determining the reliability of a node in the curriculum is defined by the ability of a user to consistently improve his or her pronunciation of the node's phrase and the ability of the present invention to provide correct feedback on mispronunciation.”) Nodes are built to reliably identify mispronunciations that are common features of a user’s speech. Additionally, “persistent errors in basic pronunciation skills” may be recognized over time and corrected by alternative training tracks and feedback (¶ [0025])); and 
providing feedback to the user based on the identified feature of the user's speech (¶ [0012], ¶ [0027], ¶ [0032],  ¶ [0051] (“If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”), ¶ [0062-0064], ¶ [0066-0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”)).  
Komissarchik lacks specifically disclosing, however Wen discloses receiving a body of text (¶ [0026], ¶ [0027]).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the use of receiving a body of text as disclosed by Wen in the system of Komissarchik in order to make a more user friendly system which can manipulate the types of words used.

Concerning claim 2, Komissarchik discloses wherein analysing the received body of text and generating a representative sequence of phonemes comprises using a phonetic dictionary and/or rules of pronunciation for a language or accent of the body of the text (¶[0032] (“Each ASR has its own peculiarities and potentially different dictionaries, so the results may be different for different ASR's.”), ¶ [0038] The automatic speech recognition system uses a dictionary as part of the system to analyze the received body of text.).  

Concerning claim 5, Komissarchik discloses wherein identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme includes: identifying the type of mispronunciation of the particular phoneme the user mispronounces, the type of mispronunciation including at least one of: mispronunciation of the particular phoneme in a particular phoneme context, wherein a phoneme occurs in a particular phoneme context when it occurs in a particular position within a word and/or adjacent to another particular phoneme; and mispronunciation of the particular phoneme by substitution, wherein mispronunciation of the particular phoneme involves pronouncing the particular phoneme as a different phoneme (¶ [0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”), ¶ [0032] describing mappings to common substitutions of phonemes for example “node star 33 corresponds to the word ‘bat’ and includes rays for the mispronunciations ‘bad’, ‘bet’, ‘pat’ and ‘pet.’”), ¶ [0047-0052] Fig. 5G (feedback describes mispronouncing the phoneme “ah” as a different phoneme), ¶[0088]).  

Concerning claim 7, Komissarchik discloses wherein providing feedback to the user based on the identified feature of the user's speech includes at least one of: displaying a message to the user, the message informing the user of the identified feature of the user's speech; providing the user with a video or audio tutorial, the video or audio tutorial relating to the identified feature of the user's speech; and providing the user with an exercise, the exercise relating to the identified feature of the user's speech (Fig. 5g (showing feedback in the form of a message, an exercise in the form of another pronunciation attempt after reviewing the feedback, an audio tutorial for the pronunciation), ¶ [0088] (“FIG. 5 g is a sample phrase recognition page 70, which displays the results of user attempts to pronounce a phrase and feedback of the system with regard to pronunciation errors.”).  
Concerning claim 8, Komissarchik lacks disclosing, however, Wen discloses wherein the body of text is input by the user (¶ [0026], ¶ [0027]).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the use of receiving a body of text as disclosed by Wen in the system of Komissarchik in order to make a more user friendly system which can manipulate the types of words used.

Concerning claim 9, Komissarchik discloses wherein the step of identifying audio components in the input audio signal, and creating a mapping is done in real or near to real-time as the input audio signal is received (¶ [0011] (“the inventive system and methods enable the student to acquire knowledge and skills to pronounce properly phonemes and sequences of phones like triphones in real time. Instead of requiring study of the non-native language in a classroom environment, the system and methods of the present invention enable a student to invoke and use the system in real-time situations using mobile communications devices, such as cell phones and wireless hotspots.”).    

Concerning claim 10, Komissarchik discloses A non-transitory recording medium, readable by a computer and having recorded thereon a computer program configured to perform the method of claim 1 (see claim 1, ¶ [0015] (“FIGS. 1 a and 1 b are, respectively, a schematic diagram of the system of the present invention comprising software modules programmed to operate on a computer system of conventional design”)).  

Concerning claim 11, Komissarchik discloses an apparatus for analysing an audio signal representing speech of a user and for providing feedback to the user based on the speech (¶ [0010], ¶ [0012], ¶ [0020]), the apparatus comprising: 
a text-analyser configured to at least generate a representative sequence of phonemes based at least in part on the received body of text (¶ [0012] (“Such representative word sets preferably include all phonemes and triphones needed to correctly pronounce a predetermined percentage of the words in the non-native language.”), Fig. 5C, (¶ [0045], ¶ [0047]-[0052], ¶ [0088]); 
an audio-input receiver configured to receive an input audio signal, the input audio signal including a recording of a user reading the body of text (Fig. 3 (40, 41), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42”); 
an audio-mapper configured to identify audio components in the input audio signal, and create a mapping between the audio components and corresponding phonemes in the representative sequence of phonemes (Fig. 3 (42), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42 of ASR 16 returns list of recognition results 43”), ¶ [0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052] describing the algorithm used to map the received audio to the most closely matched node star, ¶ [0012]); 
a comparator configured to, based on the mapping, compare respective audio components in the input audio signal to an expected audio component for a corresponding phoneme in the sequence of phonemes (¶ [0045], ¶ [0047-0052] (“If the highest ranked result 44 a . . . 44 n matches center phrase 45, the pronunciation is deemed acceptable … If none of the top N results (e.g., N=4) match center phrase 45 or the ray phrases for that center phrase, the pronunciation is deemed unacceptable … If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”) The center phrase corresponds to the phonemes that make up the expected audio component while the ray phrases correspond to phonemes that make up likely mispronunciations.); 
a scorer configured to, based on the comparison, determine a score for each audio component indicating a level of similarity between the respective audio component in the input audio signal and the expected audio component for the corresponding phoneme (¶ [0044-0045] (“In the illustrative embodiment of FIG. 3, ASR engine 42 provides up to 10 results ordered by a confidence score. In some cases not only the order of results but the confidence scores for each of the results may be provided. The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052]); 
a pattern identifier configured to, based on the respective scores for each audio component, identify in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme (¶ [0044-0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything.”), ¶ [0047-0052], ¶ [0062-0064] (“feedback for this particular mispronunciation will be provided to the user if this mispronunciation has a higher score than other mispronunciations.”), ¶ [0074-0075]); 
a mispronunciation feature identifier configured to, based on the identified mispronunciation of a particular phoneme, identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme (¶ [0033-0039] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user, gender, lisp, ASR confusions, reliability of recognition, ability to devise the reason for mispronunciation”), ¶ [0066-0076] (“Still referring to FIG. 4, process 51 of determining the reliability of a node in the curriculum is defined by the ability of a user to consistently improve his or her pronunciation of the node's phrase and the ability of the present invention to provide correct feedback on mispronunciation.”) Nodes are built to reliably identify mispronunciations that are common features of a user’s speech. Additionally, “persistent errors in basic pronunciation skills” may be recognized over time and corrected by alternative training tracks and feedback (¶ [0025])); and a feedback module configured to provide feedback to the user based on the identified feature of the user's speech (¶ [0012], ¶ [0027], ¶ [0032],  ¶ [0051] (“If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”), ¶ [0062-0064], ¶ [0066-0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”)).  
Komissarchik lacks specifically disclosing, however Wen discloses receiving a body of text (¶ [0026], ¶ [0027]).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the use of receiving a body of text as disclosed by Wen in the system of Komissarchik in order to make a more user friendly system which can manipulate the types of words used.

Concerning claim 12, Komissarchik discloses wherein the text analyser analyses the received body of text and generates a representative sequence of phonemes using a phonetic dictionary and/or rules of pronunciation for a language or accent of the body of the text (¶[0032] (“Each ASR has its own peculiarities and potentially different dictionaries, so the results may be different for different ASR's.”), ¶ [0038] The automatic speech recognition system uses a dictionary as part of the system to analyze the received body of text.).  
  
Concerning claim 15, Komissarchik discloses wherein the pattern identifier identifies a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme by:  identifying the type of mispronunciation of the particular phoneme the user mispronounces, the type of mispronunciation including at least one of: mispronunciation of the particular phoneme in a particular phoneme context, wherein a phoneme occurs in a particular phoneme context when it occurs in a particular position within a word and/or adjacent to another particular phoneme; and mispronunciation of the particular phoneme by substitution, wherein mispronunciation of the particular phoneme involves pronouncing the particular phoneme as a different phoneme (¶ [0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”), ¶ [0032] describing mappings to common substitutions of phonemes for example “node star 33 corresponds to the word ‘bat’ and includes rays for the mispronunciations ‘bad’, ‘bet’, ‘pat’ and ‘pet.’”), ¶ [0047-0052] Fig. 5G (feedback describes mispronouncing the phoneme “ah” as a different phoneme), ¶[0088]).  
  
Concerning claim 17, Komissarchik discloses wherein the feedback module provides feedback to the user based on the identified feature of the user's speech by at least one of: displaying a message to the user, the message informing the user of the identified feature of the user's speech; providing the user with a video or audio tutorial, the video or audio tutorial relating to the identified feature of the user's speech; and providing the user with an exercise, the exercise relating to the identified feature of the user's speech (Fig. 5g (showing feedback in the form of a message, an exercise in the form of another pronunciation attempt after reviewing the feedback, an audio tutorial for the pronunciation), ¶ [0088] (“FIG. 5 g is a sample phrase recognition page 70, which displays the results of user attempts to pronounce a phrase and feedback of the system with regard to pronunciation errors.”)

Concerning claim 18, Komissarchik discloses wherein the body of text is input by the user (¶ [0029] (“Curriculum 11 may be updated and enhanced at any time. For example… some nodes and tracks may be generated manually and/or automatically derived from topic descriptions or articles about topics selected by the user.”) The user may manually enter a nodes or body of text to be read.)  

Concerning claim 19, Komissarchik discloses wherein the step of identifying audio components in the input audio signal, and creating a mapping is done in real or near to real-time as the input audio signal is received (¶ [0011] (“the inventive system and methods enable the student to acquire knowledge and skills to pronounce properly phonemes and sequences of phones like triphones in real time. Instead of requiring study of the non-native language in a classroom environment, the system and methods of the present invention enable a student to invoke and use the system in real-time situations using mobile communications devices, such as cell phones and wireless hotspots.”)

Claims 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Komissarchik, Wen, and further in view of U.S. Publication No. 2019/0130894 to Jin (hereinafter “Jin”). 
Concerning claim 3, Komissarchik does not expressly disclose the use of timestamps but Jin teaches wherein creating the mapping between audio components and corresponding phonemes in the representative sequence of phonemes comprises determining a plurality of timestamps, each timestamp being associated with a phoneme in the representative sequence of phonemes ((¶ [0071]); wherein the timestamp for a particular phoneme in the representative sequence of phonemes represents a point in time in the input audio signal at which an audio component is expected to align with its corresponding phoneme in the representative sequence of phonemes (¶ [0071] (“target voice waveform 132 and voice transcript 134 may be received at alignment module 230. Alignment module 230 may perform a force alignment process to align target voice waveform 132 to voice transcript 134. Alignment module 230 may then generate target phoneme alignment map 224”), ¶ [0074] (“a P2FA forced alignment module is utilized. The inputs to a forced alignment module (e.g., 230) may be both a sequence of phonemes and an audio waveform. According to one embodiment, an output generated by a forced alignment module (e.g., 230) may be a sequence of timestamps indicating when each phoneme begins and ends.”).  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of timestamped phonemes to perform mapping as taught in Jin. Since both references teach methods and systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. Both references also need to solve the problem of accurately mapping spoken words to text and associated sounds or phonemes. A POSITA would have been motivated to combine Komissarchik and Jin because the use of timestamp data along with the audio and phoneme data could have been predictably added to the system of Komissarchik. Further, a POSITA would want to use the added timestamp data to more accurately map the spoken phonemes to the expected phonemes in the representative sequence and the actual phoneme. 

Concerning claim 13, Komissarchik discloses The apparatus of claim 11(see claim 1). Komissarchik does not expressly disclose the use of timestamps but Jin teaches wherein the audio-mapper creates the mapping between audio components and corresponding phonemes in the representative sequence of phonemes by determining a plurality of timestamps, each timestamp being associated with a phoneme in the representative sequence of phonemes ((¶ [0071]); wherein the timestamp for a particular phoneme in the representative sequence of phonemes represents a point in time in the input audio signal at which an audio component is expected to align with its corresponding phoneme in the representative sequence of phonemes(¶ [0071] (“target voice waveform 132 and voice transcript 134 may be received at alignment module 230. Alignment module 230 may perform a force alignment process to align target voice waveform 132 to voice transcript 134. Alignment module 230 may then generate target phoneme alignment map 224”), ¶ [0074] (“a P2FA forced alignment module is utilized. The inputs to a forced alignment module (e.g., 230) may be both a sequence of phonemes and an audio waveform. According to one embodiment, an output generated by a forced alignment module (e.g., 230) may be a sequence of timestamps indicating when each phoneme begins and ends.”).  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of timestamped phonemes to perform mapping as taught in Jin. Since both references teach methods and systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. Both references also need to solve the problem of accurately mapping spoken words to text and associated sounds or phonemes. A POSITA would have been motivated to combine Komissarchik and Jin because the use of timestamp data along with the audio and phoneme data could have been predictably added to the system of Komissarchik. Further, a POSITA would want to use the added timestamp data to more accurately map the spoken phonemes to the expected phonemes in the representative sequence and the actual phoneme.

Claims 4, 6, 14 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Komissarchik, Wen and further in view of U.S. Publication No. 2018/0268728 to Burdis (hereinafter “Burdis”). 
Concerning claim 4, Komissarchik discloses  The computer implemented method of claim 1 (see claim 1). Komissarchik discloses using scores and comparing the scores to multiple phoneme scores to determine whether a mispronunciation has occurred (¶ [0048-0052], but does not expressly disclose averaging scores. Burdis teaches wherein identifying in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme comprises: calculating an average score from the scores for each audio component corresponding to the particular phoneme (¶ [0070] (“the score module 206 assigns a score to each syllable of a user's voice response based on the accuracy with which the user pronounces each sound of the syllable. For instance, the analysis module 204 may compare sounds or phonemes of the user's voice response to the sounds or phonemes of the predefined response to determine whether the user is accurately pronouncing the phonemes. The accuracy may be determined according to how similar the user's pronunciation of the phonemes is compared to the pronunciation of the phonemes of the predefined voice response, which may be based on a comparison of the speech signatures for each phoneme. The score module 206 then assigns the syllable a score based on the scores assigned to each phoneme that comprises the syllable. For example, the score module 206 may aggregate phoneme scores, average phoneme scores, and/or the like.”)); and determining, by reference of the average score either to a threshold value or to the average scores assigned to audio components corresponding to other phonemes, that the user mispronounces the particular phoneme corresponding to the particular phoneme (¶ [0068] (“if the analysis module 204 determines that a sound or syllable of the user's voice response is within a threshold range of the correspond sound or syllable of the predefined response, then the score module 204 may assign that particular sound or syllable a score that indicates how similar a sound or syllable of the user's voice response is to the predefined response.”), ¶ [0070], ¶[0071] (“The score module 206, in a further embodiment, identifies syllables that the user pronounced accurately. The score module 206, for example, may identify an accurately pronounced syllable in response to the score assigned to the syllable satisfying a predetermined threshold syllable score.”)).  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of using averaged scores as taught in Burdis in the algorithm of either Komissarchik, Burdis, or both to determine mispronunciations. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of an average score could have been predictably substituted in the system of Komissarchik for a single unaveraged score. Alternatively, the system of Burdis where an average score is taken and compared to a threshold could have been predictably added in addition to the Komissarchik to confirm whether a mispronunciation has occurred. A POSITA would want to use an average score rather than a single score to determine if a mispronunciation is occurring consistently and therefore requires feedback.

Concerning claim 6, Komissarchik discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text (¶ [0033] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user … Each of these factors may be described as follows: Native language. Native speakers of one language often will have similar mispronunciation errors when attempting to pronounce a specific second language. For example, native Japanese speakers tend to confuse English ‘l’ and ‘r’ sounds, which are virtually indistinguishable in Japanese. This mispronunciation error can cause considerable confusion to native English speakers.”) The method in Komissarchik takes into account the likely accent of the student.).  To the extent Komissarchik does not discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text teaches this limitation. (¶ [0050] (“In certain embodiments, the method 500 compares 508 one or more characteristics of the response to one or more corresponding characteristics of a predefined response. For example, the one or more characteristics may include sounds (phonemes), syllables, accents, emphases, grammar, punctuation, and/or the like. In some embodiments, the method 500 determines 510 a score for each characteristic of the response based on the comparison with the predefined response.”) The characteristics of the predefined response or expected audio component include an accent.) It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for the phonemes of the expected audio component to be based on a particular accent as taught in Burdis. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of a particular accent when creating the curriculum included expected audio could have been predictably implemented in Komissarchik when building and modifying the curriculum. The method of Komissarchik is already designed to work for a number of different languages and the same method could build curriculums for local accents or dialects. A POSITA would want to incorporate particular accents for students aiming to sound like native speakers of a particular region.

Concerning claim 14, Komissarchik discloses using scores and comparing the scores to multiple phoneme scores to determine whether a mispronunciation has occurred (¶ [0048-0052], but does not expressly disclose averaging scores. Burdis teaches wherein the audio mapper identifies in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme by: calculating an average score from the scores for each audio component corresponding to the particular phoneme (¶ [0070] (“the score module 206 assigns a score to each syllable of a user's voice response based on the accuracy with which the user pronounces each sound of the syllable. For instance, the analysis module 204 may compare sounds or phonemes of the user's voice response to the sounds or phonemes of the predefined response to determine whether the user is accurately pronouncing the phonemes. The accuracy may be determined according to how similar the user's pronunciation of the phonemes is compared to the pronunciation of the phonemes of the predefined voice response, which may be based on a comparison of the speech signatures for each phoneme. The score module 206 then assigns the syllable a score based on the scores assigned to each phoneme that comprises the syllable. For example, the score module 206 may aggregate phoneme scores, average phoneme scores, and/or the like.”)); and determining, by reference of the average score either to a threshold value or to the average scores assigned to audio components corresponding to other phonemes, that the user mispronounces the particular phoneme corresponding to the particular phoneme(¶ [0068] (“if the analysis module 204 determines that a sound or syllable of the user's voice response is within a threshold range of the correspond sound or syllable of the predefined response, then the score module 204 may assign that particular sound or syllable a score that indicates how similar a sound or syllable of the user's voice response is to the predefined response.”), ¶ [0070], ¶[0071] (“The score module 206, in a further embodiment, identifies syllables that the user pronounced accurately. The score module 206, for example, may identify an accurately pronounced syllable in response to the score assigned to the syllable satisfying a predetermined threshold syllable score.”)).  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of using averaged scores as taught in Burdis in the algorithm of either Komissarchik, Burdis, or both to determine mispronunciations. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of an average score could have been predictably substituted in the system of Komissarchik for a single unaveraged score. Alternatively, the system of Burdis where an average score is taken and compared to a threshold could have been predictably added in addition to the Komissarchik to confirm whether a mispronunciation has occurred. A POSITA would want to use an average score rather than a single score to determine if a mispronunciation is occurring consistently and therefore requires feedback.  

Concerning claim 16, Komissarchik discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text(¶ [0033] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user … Each of these factors may be described as follows: Native language. Native speakers of one language often will have similar mispronunciation errors when attempting to pronounce a specific second language. For example, native Japanese speakers tend to confuse English ‘l’ and ‘r’ sounds, which are virtually indistinguishable in Japanese. This mispronunciation error can cause considerable confusion to native English speakers.”) The method in Komissarchik takes into account the likely accent of the student.).  To the extent Komissarchik does not discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text teaches this limitation. (¶ [0050] (“In certain embodiments, the method 500 compares 508 one or more characteristics of the response to one or more corresponding characteristics of a predefined response. For example, the one or more characteristics may include sounds (phonemes), syllables, accents, emphases, grammar, punctuation, and/or the like. In some embodiments, the method 500 determines 510 a score for each characteristic of the response based on the comparison with the predefined response.”) The characteristics of the predefined response or expected audio component include an accent.) It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for the phonemes of the expected audio component to be based on a particular accent as taught in Burdis. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of a particular accent when creating the curriculum included expected audio could have been predictably implemented in Komissarchik when building and modifying the curriculum. The method of Komissarchik is already designed to work for a number of different languages and the same method could build curriculums for local accents or dialects. A POSITA would want to incorporate particular accents for students aiming to sound like native speakers of a particular region.

Response to Arguments
Applicant's arguments filed 9/19/21 regarding 35 U.S.C. 102 and 103 have been fully considered but are moot based on the new grounds of rejection. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MALINA D BLAISE whose telephone number is (571)270-3398. The examiner can normally be reached Mon. - Thurs. 7:00 am - 5:00 pm (PT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xuan Thai can be reached on 571-272-7147. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MALINA D. BLAISE
Primary Examiner
Art Unit 3715



/MALINA D. BLAISE/Primary Examiner, Art Unit 3715