DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This office action is in response to Applicant’s reply filed Nov. 20, 2021. Claims 4 and 14 have been amended. Claims 1-19 are pending.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2, 5, 7-12, 15, and 17-19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by U.S. Patent Pub. No. 20140278421 (“Komissarchik”).
Concerning claim 1, Komissarchik discloses A computer-implemented method for analysing an audio signal representing speech of a user and for providing feedback to the user based on the speech (¶ [0010], ¶ [0012], ¶ [0020]), comprising: 
analysing a received body of text to generate a representative sequence of phonemes (¶ [0012] (“Such representative word sets preferably include all phonemes and triphones needed 
receiving an input audio signal, the input audio signal including a recording of a user reading the body of text (Fig. 3 (40, 41), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42”); 
identifying audio components in the input audio signal, and creating a mapping between the audio components and corresponding phonemes in the representative sequence of phonemes (Fig. 3 (42), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42 of ASR 16 returns list of recognition results 43”), ¶ [0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052] describing the algorithm used to map the received audio to the most closely matched node star, ¶ [0012]); 
based on the mapping, comparing respective audio components in the input audio signal to an expected audio component for a corresponding phoneme in the sequence of phonemes (¶ [0045], ¶ [0047-0052] (“If the highest ranked result 44 a . . . 44 n matches center phrase 45, the pronunciation is deemed acceptable … If none of the top N results (e.g., N=4) match center phrase 45 or the ray phrases for that center phrase, the pronunciation is deemed unacceptable … If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of ; 
based on the comparison, determining a score for each audio component indicating a level of similarity between the respective audio component in the input audio signal and the expected audio component for the corresponding phoneme (¶ [0044-0045] (“In the illustrative embodiment of FIG. 3, ASR engine 42 provides up to 10 results ordered by a confidence score. In some cases not only the order of results but the confidence scores for each of the results may be provided. The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find the ray emanating from the node star that best matches the ASR results.”), ¶ [0047-0052]); 
based on the respective scores for each audio component, identifying in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme (¶ [0044-0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything.”), ¶ [0047-0052], ¶ [0062-0064] (“feedback for this particular mispronunciation will be provided to the user if this mispronunciation has a higher score than other mispronunciations.”), ¶ [0074-0075]); 
based on the identified mispronunciation of a particular phoneme, identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme (¶ [0033-0039] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user, gender, lisp, ASR confusions, reliability of recognition, ability to devise the reason for mispronunciation”), ¶ [0066-0076] (“Still referring to FIG. 4, process 51 of determining the reliability of a node in the curriculum is defined by the ability of a user to consistently improve his or her pronunciation of the node's phrase and the ability of the present invention to provide correct feedback on mispronunciation.”) Nodes are built to reliably identify mispronunciations that are common features of a user’s speech. Additionally, “persistent errors in basic pronunciation skills” may be recognized over time and corrected by alternative training tracks and feedback (¶ [0025])); and 
providing feedback to the user based on the identified feature of the user's speech (¶ [0012], ¶ [0027], ¶ [0032],  ¶ [0051] (“If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”), ¶ [0062-0064], ¶ [0066-0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”)).  
Concerning claim 2, Komissarchik discloses The computer-implemented method of claim 1, wherein analysing the received body of text and generating a representative sequence of phonemes comprises using a phonetic dictionary and/or rules of pronunciation for a language or accent of the body of the text (¶[0032] (“Each ASR has its own peculiarities and potentially different dictionaries, so the results may be different for different ASR's.”), ¶ [0038] The automatic speech recognition system uses a dictionary as part of the system to analyze the received body of text.).  
 The computer implemented method according to any preceding claim, of claim 1, wherein identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme includes: 
identifying the type of mispronunciation of the particular phoneme the user mispronounces, the type of mispronunciation including at least one of: mispronunciation of the particular phoneme in a particular phoneme context, wherein a phoneme occurs in a particular phoneme context when it occurs in a particular position within a word and/or adjacent to another particular phoneme; and mispronunciation of the particular phoneme by substitution, wherein mispronunciation of the particular phoneme involves pronouncing the particular phoneme as a different phoneme (¶ [0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”), ¶ [0032] describing mappings to common substitutions of phonemes for example “node star 33 corresponds to the word ‘bat’ and includes rays for the mispronunciations ‘bad’, ‘bet’, ‘pat’ and ‘pet.’”), ¶ [0047-0052] Fig. 5G (feedback describes mispronouncing the phoneme “ah” as a different phoneme), ¶[0088]).  

Concerning claim 7, Komissarchik discloses The computer implemented method according to claim 1, wherein providing feedback to the user based on the identified feature of the user's speech includes at least one of: displaying a message to the user, the message informing the user of the identified feature of the user's speech; providing the user with a video or audio tutorial, the video or audio tutorial relating to the identified feature of the user's speech; and providing the user with an exercise, the exercise relating to the identified feature of the user's speech (Fig. 5g (showing feedback in the form of a message, an exercise in the form of another pronunciation attempt after reviewing the feedback, an audio tutorial for the pronunciation), ¶ [0088] (“FIG. 5 g is a sample phrase recognition page 70, which displays the results of user attempts to pronounce a phrase and feedback of the system with regard to pronunciation errors.”).  
Concerning claim 8, Komissarchik discloses The computer implemented method according to claim 1, wherein the body of text is input by the user (¶ [0029] (“Curriculum 11 may be updated and enhanced at any time. For example… some nodes and tracks may be generated manually and/or automatically derived from topic descriptions or articles about topics selected by the user.”) The user may manually enter a nodes or body of text to be read.).  
Concerning claim 9, Komissarchik discloses The computer implemented method according to any preceding claim, of claim 1, wherein the step of identifying audio components in the input audio signal, and creating a mapping is done in real or near to real-time as the input audio signal is received (¶ [0011] (“the inventive system and methods enable the student to acquire knowledge and skills to pronounce properly phonemes and sequences of phones like triphones in real time. Instead of requiring study of the non-native language in a classroom environment, the system and methods of the present invention enable a student to invoke and use the system in real-time situations using mobile communications devices, such as cell phones and wireless hotspots.”).    
Concerning claim 10, Komissarchik discloses A non-transitory recording medium, readable by a computer and having recorded thereon a computer program configured to perform the method of claim 1 (see claim 1, ¶ [0015] (“FIGS. 1 a and 1 b are, respectively, a schematic diagram of the system of the present invention comprising software modules programmed to operate on a computer system of conventional design”)).  
11. (Currently Amended) An apparatus for analysing an audio signal representing speech of a user and for providing feedback to the user based on the speech (¶ [0010], ¶ [0012], ¶ [0020]), the apparatus comprising: 
a text-analyser configured to analyse a received body of text to generate a representative sequence of phonemes (¶ [0012] (“Such representative word sets preferably include all phonemes and triphones needed to correctly pronounce a predetermined percentage of the words in the non-native language.”), Fig. 5C, ¶ [0088]); 
an audio-input receiver configured to receive an input audio signal, the input audio signal including a recording of a user reading the body of text (Fig. 3 (40, 41), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42”); 
an audio-mapper configured to identify audio components in the input audio signal, and create a mapping between the audio components and corresponding phonemes in the representative sequence of phonemes (Fig. 3 (42), ¶ [0032] (“Referring now to FIG. 3, when a user pronounces phrase 40 from a particular node, resulting utterance 41 is supplied to automatic speech recognition engine 42 of ASR 16 returns list of recognition results 43”), ¶ [0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one aspect of the invention, the algorithm attempts to find ; 
a comparator configured to, based on the mapping, compare respective audio components in the input audio signal to an expected audio component for a corresponding phoneme in the sequence of phonemes (¶ [0045], ¶ [0047-0052] (“If the highest ranked result 44 a . . . 44 n matches center phrase 45, the pronunciation is deemed acceptable … If none of the top N results (e.g., N=4) match center phrase 45 or the ray phrases for that center phrase, the pronunciation is deemed unacceptable … If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”) The center phrase corresponds to the phonemes that make up the expected audio component while the ray phrases correspond to phonemes that make up likely mispronunciations.); 
a scorer configured to, based on the comparison, determine a score for each audio component indicating a level of similarity between the respective audio component in the input audio signal and the expected audio component for the corresponding phoneme (¶ [0044-0045] (“In the illustrative embodiment of FIG. 3, ASR engine 42 provides up to 10 results ordered by a confidence score. In some cases not only the order of results but the confidence scores for each of the results may be provided. The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything. In accordance with one ; 
a pattern identifier configured to, based on the respective scores for each audio component, identify in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme (¶ [0044-0045] (“The following algorithm takes the ASR results for the utterance spoken for phrase 45 of a node and using the corresponding node star (see FIG. 2 c) returns a score and provides feedback on what was mispronounced, if anything.”), ¶ [0047-0052], ¶ [0062-0064] (“feedback for this particular mispronunciation will be provided to the user if this mispronunciation has a higher score than other mispronunciations.”), ¶ [0074-0075]); 
a mispronunciation feature identifier configured to, based on the identified mispronunciation of a particular phoneme, identifying a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme (¶ [0033-0039] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user, gender, lisp, ASR confusions, reliability of recognition, ability to devise the reason for mispronunciation”), ¶ [0066-0076] (“Still referring to FIG. 4, process 51 of determining the reliability of a node in the curriculum is defined by the ability of a user to consistently improve his or her pronunciation of the node's phrase and the ability of the present invention to provide correct feedback on mispronunciation.”) Nodes are built to reliably identify mispronunciations that are common features of a user’s speech. Additionally, “persistent errors in basic pronunciation skills” may be recognized over time and corrected by alternative training tracks and feedback (¶ [0025])); and 
a feedback module configured to provide feedback to the user based on the identified feature of the user's speech (¶ [0012], ¶ [0027], ¶ [0032],  ¶ [0051] (“If center phrase 45 is ranked lower than a ray phrase by N (e.g., N=4) in list of recognition results 43, the user feedback corresponding to the highest ranked ray phrase is selected and displayed to the user.”), ¶ [0062-0064], ¶ [0066-0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”)).  
Concerning claim 12, Komissarchik discloses The apparatus of claim 11, wherein the text analyser analyses the received body of text and generates a representative sequence of phonemes using a phonetic dictionary and/or rules of pronunciation for a language or accent of the body of the text (¶[0032] (“Each ASR has its own peculiarities and potentially different dictionaries, so the results may be different for different ASR's.”), ¶ [0038] The automatic speech recognition system uses a dictionary as part of the system to analyze the received body of text.).  
  
Concerning claim 15, Komissarchik discloses The apparatus of claim 11, wherein the pattern identifier identifies a feature of the user's speech that requires direction to more accurately pronounce the particular phoneme by: 
identifying the type of mispronunciation of the particular phoneme the user mispronounces, the type of mispronunciation including at least one of: mispronunciation of the particular phoneme in a particular phoneme context, wherein a phoneme occurs in a particular phoneme context when it occurs in a particular position within a word and/or adjacent to another particular phoneme; and mispronunciation of the particular phoneme by substitution, wherein mispronunciation of the particular phoneme involves pronouncing the particular phoneme as a different phoneme (¶ [0076] (“For each ray and each substitution, omission of or addition into the phonetic transcription of the feedback text is supplied from a list of feedbacks built in advance for a corresponding mispronunciation.”), ¶ [0032] describing mappings to common substitutions of phonemes for example “node star 33 corresponds to the word ‘bat’ and includes rays for the mispronunciations ‘bad’, ‘bet’, ‘pat’ and ‘pet.’”), ¶ [0047-0052] Fig. 5G (feedback describes mispronouncing the phoneme “ah” as a different phoneme), ¶[0088]).  
  
Concerning claim 17, Komissarchik discloses The apparatus according to claim 11, wherein the feedback module provides feedback to the user based on the identified feature of the user's speech by at least one of: displaying a message to the user, the message informing the user of the identified feature of the user's speech; providing the user with a video or audio tutorial, the video or audio tutorial relating to the identified feature of the user's speech; and providing the user with an exercise, the exercise relating to the identified feature of the user's speech (Fig. 5g (showing feedback in the form of a message, an exercise in the form of another pronunciation attempt after reviewing the feedback, an audio tutorial for the pronunciation), ¶ [0088] (“FIG. 5 g is a sample phrase recognition page 70, which displays the results of user attempts to pronounce a phrase and feedback of the system with regard to pronunciation errors.”).  
 The apparatus according to claim 11, wherein the body of text is input by the user (¶ [0029] (“Curriculum 11 may be updated and enhanced at any time. For example… some nodes and tracks may be generated manually and/or automatically derived from topic descriptions or articles about topics selected by the user.”) The user may manually enter a nodes or body of text to be read.).  
Concerning claim 19, Komissarchik discloses The apparatus according to claim 11, wherein the step of identifying audio components in the input audio signal, and creating a mapping is done in real or near to real-time as the input audio signal is received (¶ [0011] (“the inventive system and methods enable the student to acquire knowledge and skills to pronounce properly phonemes and sequences of phones like triphones in real time. Instead of requiring study of the non-native language in a classroom environment, the system and methods of the present invention enable a student to invoke and use the system in real-time situations using mobile communications devices, such as cell phones and wireless hotspots.”).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Komissarchik in view of U.S. Patent Pub. No. 20190130894 (“Jin”). 
Concerning claim 3, Komissarchik discloses The computer-implemented method of claim 1 (see claim 1). Komissarchik does not expressly disclose the use of timestamps but Jin teaches wherein creating the mapping between audio components and corresponding phonemes in the representative sequence of phonemes comprises determining a plurality of timestamps, each timestamp being associated with a phoneme in the representative sequence of phonemes ((¶ [0071]); wherein the timestamp for a particular phoneme in the representative sequence of phonemes represents a point in time in the input audio signal at which an audio component is expected to align with its corresponding phoneme in the representative sequence of phonemes (¶ [0071] (“target voice waveform 132 and voice transcript 134 may be received at alignment module 230. Alignment module 230 may perform a force alignment process to align target voice waveform 132 to voice transcript 134. Alignment module 230 may then generate target phoneme alignment map 224”), ¶ [0074] (“a P2FA forced alignment module is utilized. The inputs to a forced alignment module (e.g., 230) may be both a sequence of phonemes and an audio waveform. According to one embodiment, an output .  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of timestamped phonemes to perform mapping as taught in Jin. Since both references teach methods and systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. Both references also need to solve the problem of accurately mapping spoken words to text and associated sounds or phonemes. A POSITA would have been motivated to combine Komissarchik and Jin because the use of timestamp data along with the audio and phoneme data could have been predictably added to the system of Komissarchik. Further, a POSITA would want to use the added timestamp data to more accurately map the spoken phonemes to the expected phonemes in the representative sequence and the actual phoneme. 
Concerning claim 13, Komissarchik discloses The apparatus of claim 11(see claim 1). Komissarchik does not expressly disclose the use of timestamps but Jin teaches wherein the audio-mapper creates the mapping between audio components and corresponding phonemes in the representative sequence of phonemes by determining a plurality of timestamps, each timestamp being associated with a phoneme in the representative sequence of phonemes ((¶ [0071]); wherein the timestamp for a particular phoneme in the representative sequence of phonemes represents a point in time in the input audio signal at which an audio component is expected to align with its corresponding phoneme in the representative sequence of phonemes(¶ [0071] (“target voice waveform 132 and voice transcript 134 may be received at alignment module 230. Alignment module 230 may perform a .  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of timestamped phonemes to perform mapping as taught in Jin. Since both references teach methods and systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. Both references also need to solve the problem of accurately mapping spoken words to text and associated sounds or phonemes. A POSITA would have been motivated to combine Komissarchik and Jin because the use of timestamp data along with the audio and phoneme data could have been predictably added to the system of Komissarchik. Further, a POSITA would want to use the added timestamp data to more accurately map the spoken phonemes to the expected phonemes in the representative sequence and the actual phoneme.

Claims 4, 6, 14 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Komissarchik in view of U.S. Patent Pub. No. 20180268728 (“Burdis”). 
Concerning claim 4, Komissarchik discloses  The computer implemented method of claim 1 (see claim 1). Komissarchik discloses using scores and comparing the scores to multiple phoneme scores to determine whether a mispronunciation has occurred (¶ [0048-0052], but does  wherein identifying in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme comprises: calculating an average score from the scores for each audio component corresponding to the particular phoneme (¶ [0070] (“the score module 206 assigns a score to each syllable of a user's voice response based on the accuracy with which the user pronounces each sound of the syllable. For instance, the analysis module 204 may compare sounds or phonemes of the user's voice response to the sounds or phonemes of the predefined response to determine whether the user is accurately pronouncing the phonemes. The accuracy may be determined according to how similar the user's pronunciation of the phonemes is compared to the pronunciation of the phonemes of the predefined voice response, which may be based on a comparison of the speech signatures for each phoneme. The score module 206 then assigns the syllable a score based on the scores assigned to each phoneme that comprises the syllable. For example, the score module 206 may aggregate phoneme scores, average phoneme scores, and/or the like.”)); and determining, by reference of the average score either to a threshold value or to the average scores assigned to audio components corresponding to other phonemes, that the user mispronounces the particular phoneme corresponding to the particular phoneme (¶ [0068] (“if the analysis module 204 determines that a sound or syllable of the user's voice response is within a threshold range of the correspond sound or syllable of the predefined response, then the score module 204 may assign that particular sound or syllable a score that indicates how similar a sound or syllable of the user's voice response is to the predefined response.”), ¶ [0070], ¶[0071] (“The score module 206, in a further embodiment, identifies syllables that the user pronounced accurately. The score module 206, for example, may identify an accurately pronounced syllable in response to the score assigned to the syllable satisfying a .  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of using averaged scores as taught in Burdis in the algorithm of either Komissarchik, Burdis, or both to determine mispronunciations. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of an average score could have been predictably substituted in the system of Komissarchik for a single unaveraged score. Alternatively, the system of Burdis where an average score is taken and compared to a threshold could have been predictably added in addition to the Komissarchik to confirm whether a mispronunciation has occurred. A POSITA would want to use an average score rather than a single score to determine if a mispronunciation is occurring consistently and therefore requires feedback.
Concerning claim 6, Komissarchik discloses The computer implemented method according to claim 1 (see claim 1), wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text (¶ [0033] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user … Each of these factors may be described as follows: Native language. Native speakers of one language often will have similar mispronunciation errors when attempting to pronounce a specific second language. For example, native Japanese speakers tend to confuse English ‘l’ and ‘r’ sounds, which are virtually indistinguishable in Japanese. This mispronunciation error can cause considerable confusion to native English speakers.”) The .  To the extent Komissarchik does not discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text teaches this limitation. (¶ [0050] (“In certain embodiments, the method 500 compares 508 one or more characteristics of the response to one or more corresponding characteristics of a predefined response. For example, the one or more characteristics may include sounds (phonemes), syllables, accents, emphases, grammar, punctuation, and/or the like. In some embodiments, the method 500 determines 510 a score for each characteristic of the response based on the comparison with the predefined response.”) The characteristics of the predefined response or expected audio component include an accent.) It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for the phonemes of the expected audio component to be based on a particular accent as taught in Burdis. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of a particular accent when creating the curriculum included expected audio could have been predictably implemented in Komissarchik when building and modifying the curriculum. The method of Komissarchik is already designed to work for a number of different languages and the same method could build curriculums for local accents or dialects. A POSITA would want to incorporate particular accents for students aiming to sound like native speakers of a particular region.
Concerning claim 14, Komissarchik discloses The apparatus of claim 11 (see claim 1). Komissarchik discloses using scores and comparing the scores to multiple phoneme scores to  wherein the audio mapper identifies in the input audio signal a pattern of audio components where the user mispronounces a particular phoneme by: calculating an average score from the scores for each audio component corresponding to the particular phoneme (¶ [0070] (“the score module 206 assigns a score to each syllable of a user's voice response based on the accuracy with which the user pronounces each sound of the syllable. For instance, the analysis module 204 may compare sounds or phonemes of the user's voice response to the sounds or phonemes of the predefined response to determine whether the user is accurately pronouncing the phonemes. The accuracy may be determined according to how similar the user's pronunciation of the phonemes is compared to the pronunciation of the phonemes of the predefined voice response, which may be based on a comparison of the speech signatures for each phoneme. The score module 206 then assigns the syllable a score based on the scores assigned to each phoneme that comprises the syllable. For example, the score module 206 may aggregate phoneme scores, average phoneme scores, and/or the like.”)); and determining, by reference of the average score either to a threshold value or to the average scores assigned to audio components corresponding to other phonemes, that the user mispronounces the particular phoneme corresponding to the particular phoneme(¶ [0068] (“if the analysis module 204 determines that a sound or syllable of the user's voice response is within a threshold range of the correspond sound or syllable of the predefined response, then the score module 204 may assign that particular sound or syllable a score that indicates how similar a sound or syllable of the user's voice response is to the predefined response.”), ¶ [0070], ¶[0071] (“The score module 206, in a further embodiment, identifies syllables that the user pronounced accurately. The score module 206, for example, may identify .  It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for addition of using averaged scores as taught in Burdis in the algorithm of either Komissarchik, Burdis, or both to determine mispronunciations. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of an average score could have been predictably substituted in the system of Komissarchik for a single unaveraged score. Alternatively, the system of Burdis where an average score is taken and compared to a threshold could have been predictably added in addition to the Komissarchik to confirm whether a mispronunciation has occurred. A POSITA would want to use an average score rather than a single score to determine if a mispronunciation is occurring consistently and therefore requires feedback.  
Concerning claim 16, Komissarchik discloses The apparatus according to claim 11, wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text(¶ [0033] (“The number and content of the rays emanating from the node star is defined by a number of factors, including the native language of a user … Each of these factors may be described as follows: Native language. Native speakers of one language often will have similar mispronunciation errors when attempting to pronounce a specific second language. For example, native Japanese speakers tend to confuse English ‘l’ and ‘r’ sounds, which are virtually indistinguishable in Japanese. This mispronunciation error can cause .  To the extent Komissarchik does not discloses wherein the expected audio component of the phonemes represents a particular accent of a language of the body of text teaches this limitation. (¶ [0050] (“In certain embodiments, the method 500 compares 508 one or more characteristics of the response to one or more corresponding characteristics of a predefined response. For example, the one or more characteristics may include sounds (phonemes), syllables, accents, emphases, grammar, punctuation, and/or the like. In some embodiments, the method 500 determines 510 a score for each characteristic of the response based on the comparison with the predefined response.”) The characteristics of the predefined response or expected audio component include an accent.) It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Komissarchik for the phonemes of the expected audio component to be based on a particular accent as taught in Burdis. Since both references teach methods of teaching proper language pronunciation that involve systems for mapping phonemes with audio and transcripts as part of a speech recognition and analysis process the references are from the same field of endeavor. A POSITA would have been motivated to combine Komissarchik and Burdis because the use of a particular accent when creating the curriculum included expected audio could have been predictably implemented in Komissarchik when building and modifying the curriculum. The method of Komissarchik is already designed to work for a number of different languages and the same method could build curriculums for local accents or dialects. A POSITA would want to incorporate particular accents for students aiming to sound like native speakers of a particular region.
Response to Arguments
Applicant’s arguments and amendments filed Nov. 20, 2021, with respect to the 35 U.S.C. 112(b) rejections of claims 4 and 14 have been fully considered and are persuasive.  The 35 U.S.C. 112(b) rejection of claims 4 and 14 has been withdrawn. 
Applicant's arguments filed Nov. 20, 2021 regarding 35 U.S.C. 102 and 103 have been fully considered but they are not persuasive. Komissarchik discloses “analysing a received body of text to generate a representative sequence of phonemes.” Kommissarchik paragraph 12 discloses “representative word sets,” which fit within the broadest reasonable interpretation of a body of text. Further the “representative word sets preferably include all phonemes and triphones needed to correctly pronounce a predetermined percentage of the words in the non-native language,” which would require receipt of the text and analysis at some point to determine a sequence of phonemes. These representative phonemes and the body of text is displayed to the user as described with reference to Figures 5a-5j. Regarding disclosure of “creating a mapping between the audio components and corresponding phonemes” Komissarchik discloses a mapping between the audio of a user and potential mispronunciations in the node stars to provide feedback to users (¶ [0045], ¶ [0047-0052]). This process involves mapping each term to the correct node, and then mapping the audio from the user’s pronunciation to either the correct phonemes of the node star or the ray that best represents the mispronounced individual sound or phoneme. Similarly, this mapping process “comparing respective audio components in the input audio signal to an expected audio component for a corresponding phoneme in the sequence of phonemes.” Where the highest match or mapping corresponds to the center phrase 45 the received audio input corresponds to the expected audio component. ¶ [0047-0052]. If, instead, the center phrase 45 is ranked lower than a ray phrase by N, there is a mismatch between . 

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ELIZABETH V JOHNSON whose telephone number is (313)446-6616.  The examiner can normally be reached on 7-4:30 Mon-Thurs.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kang Hu can be reached on (571) 270-1344.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/E.V.J./Examiner, Art Unit 3715                                                                                                                                                                                                        
February 26, 2022



/MALINA D. BLAISE/Primary Examiner, Art Unit 3715