DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/14/2022 has been entered.
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1,2 4-12, 14-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant has amended the claims to include converting, with one or more speakers, the transformed electrical audio signal into a transformed sound of the user's voice in real time while the user is making the sound, wherein the transformed sound is outputted by the one or more speakers in such a way as to be audible to only the user to reduce stuttering by the user. A new search was made and art was found to Rashid which teaches a fluency aid, and in particular to a fluency aid for use by persons suffering from a stammer or other speech-related conditions to aid fluency of speaking, see par. [0001]. The fluency aid is further operable to emit a sound which is derived from the user's own voice. This is known as masked auditory feedback (MAF). MAF may be used in addition to the output masking sound described above, in that the masking signal may be combined with a voice signal representing the user's voice.  according to one or more examples of the present aspects the fluency aid is operable to derive a voice signal based on the user's own voice (which may have been detected by the voice detector), wherein the voice signal and the masking signal are combined to produce a masked auditory feedback, MAF, signal, see par. [0013-0014]. According to an example of a further aspect there is provided a telephone, headphones, acoustic noise cancelling headphones, smart watch, or other portable device comprising the fluency aid as described above. These and any other wearable devices may include a fluency aid as described above, see par. [0037].
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4, 9-11, 14, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Florencio U.S. PAP 2010/00195812 in view of Rashid U.S. PAP 2019/0231583.

Regarding claim 1 Florencio teaches  a method comprising: 
converting sound of a user's voice into an electrical audio signal (apply an audio transform to an audio portion of communications that are propagated by way of the multiparty communication session, see par. [0006]); 
transforming the electrical audio signal to produce a transformed electrical audio signal (audio transformation component 120 that can apply audio transform 122 to an audio portion of communication 118, see par. [0032]); 
converting, with one or more speakers, the transformed electrical audio signal into transformed sound of the user's voice, wherein the transformed sound is outputted by the one or more speakers in such a way as to be audible to the user (communications 118 over public channel 114 can be output by fixed loudspeakers, whereas communications 118 over private channel 116 can be output to the proper headset, see par. [0037]),  
wherein the transforming is performed in each of a set of modes, which set includes a first mode, in which the transforming causes the transformed sound of the user's voice to have a whispered sound effect (the architecture can apply a whisper transform to private communications in order to, e.g., indicate the communication is not public, see par. [0007, 0034]), 
and a second mode, in which the transforming causes the transformed sound of the user's voice to have a reverberation sound effect (a user may decide to make a superior's voice stronger or more reverberant, e.g., to indicate a degree of importance, see par. [0046]), 
and each mode in the set occurs during a time period in which no other mode in the set occurs (alternatively, the user might decide to apply, say, a Mickey Mouse transform to the superior's voice to, e.g. lighten the mood, see par. [0046]);
taking measurements of stuttering by the user during each of multiple time windows in which the transforming occurs (speech characteristics that indicate timidity, uncertainty, or the like might also be deemed inappropriate to some or all targets of communication 118, or associated behaviors such as stuttering, see par. [0040]); 
detecting, based on the measurements, an increase in the stuttering (indicia of anger, insecurity, stuttering or the like can be identified, see par. [0068]); 
and in response to the detecting of the increase, changing which mode of transforming is occurring, by changing from one mode in the set to another mode in the set to reduce stuttering by the user (At reference numeral 904, speech processing techniques can be employed for suppressing or changing the identified vocal characteristic in order to effectuate the emotion transform, see par. [0068]).
However Florencio does not teach converting, with one or more speakers, the transformed electrical audio signal into a transformed sound of the user's voice in real time while the user is making the sound, wherein the transformed sound is outputted by the one or more speakers in such a way as to be audible to only the user to reduce stuttering by the user.
In a similar field of endeavor Rashid teaches a fluency aid, and in particular to a fluency aid for use by persons suffering from a stammer or other speech-related conditions to aid fluency of speaking, see par. [0001]. The fluency aid is further operable to emit a sound which is derived from the user's own voice. This is known as masked auditory feedback (MAF). MAF may be used in addition to the output masking sound described above, in that the masking signal may be combined with a voice signal representing the user's voice.  according to one or more examples of the present aspects the fluency aid is operable to derive a voice signal based on the user's own voice (which may have been detected by the voice detector), wherein the voice signal and the masking signal are combined to produce a masked auditory feedback, MAF, signal, see par. [0013-0014]. According to an example of a further aspect there is provided a telephone, headphones, acoustic noise cancelling headphones, smart watch, or other portable device comprising the fluency aid as described above. These and any other wearable devices may include a fluency aid as described above, see par. [0037].
It would have been obvious to one of ordinary skill in the art to combine the Florencio invention with the teachings of Rashid for the benefit of aiding aid fluency of speaking, see par. [0001].
Regarding claim 4 Florencio teaches the method of claim 1, further comprising performing a speaker identification algorithm to determine whether a voice is the user's voice (auditory cues to aid in identifying a particular speaker, see par. [0042]); 
and the transforming is performed only for time intervals in which the user is speaking (audio transformation component 120 can intelligently determine or infer which type of audio transform 122 to apply as well as when and/or for whom to apply audio transforms 122, see par. [0055]).
Regarding claim 9 Florencio teaches the method of claim 1, further comprising changing, in response to input from the user:  which mode in the set of modes is employed in the transforming (audio transformation component can apply a variety of audio transforms based upon a type of target audience, see par. [0034]) ; or  one or more parameters of a mode in the set of modes (applying the whisper transform for private communications may be turned on only by users who desire to employ the feature. Similarly, a user may decide to make a superior's voice stronger or more reverberant, e.g., to indicate a degree of importance. Additionally or alternatively, the user might decide to apply, say, a Mickey Mouse transform to the superior's voice to, e.g. lighten the mood, see par. [0046]).
Regarding claim 10 Florencio teaches the method of claim 1, further comprising changing, in accordance with a selection made by a computer:  which mode in the set of modes is employed in the transforming; or  one or more parameters of a mode in the set of modes (for enriching multiparty communication environments by way of emotion, ambience, or pace transforms is illustrated. At reference numeral 902, an emotion transform can be applied when an identified vocal characteristic associated with an emotion or behavior is deemed inappropriate for the target audience. For example, indicia of anger, insecurity, stuttering or the like can be identified At reference numeral 904, speech processing techniques can be employed for suppressing or changing the identified vocal characteristic in order to effectuate the emotion transform, see par. [0068]).
Regarding claim 11 Florencio teaches an apparatus (processor, see par. [0022]) comprising: 
 a microphone that is configured to convert sound of a user's voice into an electrical audio signal (microphone, see par. [00814]); 
 a digital signal processor that is programmed to perform a transformation which transforms the electrical audio signal into a transformed electrical audio signal (processor, see par. [0022]); and 
 one or more speakers that are configured to convert the transformed electrical audio signal into transformed sound of the user's voice in such a way that the transformed sound is audible to the user (apply an audio transform to an audio portion of communications that are propagated by way of the multiparty communication session, see par. [0006]; audio transformation component 120 that can apply audio transform 122 to an audio portion of communication 118, see par. [0032]); 
wherein the transforming is performed in each of a set of modes, which set includes  a first mode, in which the transforming causes the transformed sound of the user's voice to have a whispered sound effect (the architecture can apply a whisper transform to private communications in order to, e.g., indicate the communication is not public, see par. [0007, 0034]), 
and  a second mode, in which the transforming causes the transformed sound of the user's voice to have a reverberation sound effect (a user may decide to make a superior's voice stronger or more reverberant, e.g., to indicate a degree of importance, see par. [0046]), 
and each mode in the set occurs during a time period in which no other mode in the set occurs (alternatively, the user might decide to apply, say, a Mickey Mouse transform to the superior's voice to, e.g. lighten the mood, see par. [0046]);
wherein the apparatus is configured to:
 take measurements of stuttering by the user during each of multiple time windows in which the transforming occurs (speech characteristics that indicate timidity, uncertainty, or the like might also be deemed inappropriate to some or all targets of communication 118, or associated behaviors such as stuttering, see par. [0040]); 
 detect, based on the measurements, an increase in the stuttering (indicia of anger, insecurity, stuttering or the like can be identified, see par. [0068]); 
and  in response to the detecting of the increase, changing which mode of transforming is occurring, by changing from one mode in the set to another mode in the set to reduce the stuttering (At reference numeral 904, speech processing techniques can be employed for suppressing or changing the identified vocal characteristic in order to effectuate the emotion transform, see par. [0068]).
However Florencio does not teach converting, with one or more speakers, the transformed electrical audio signal into a transformed sound of the user's voice in real time while the user is making the sound, wherein the transformed sound is outputted by the one or more speakers in such a way as to be audible to only the user to reduce stuttering by the user.
In a similar field of endeavor Rashid teaches a fluency aid, and in particular to a fluency aid for use by persons suffering from a stammer or other speech-related conditions to aid fluency of speaking, see par. [0001]. The fluency aid is further operable to emit a sound which is derived from the user's own voice. This is known as masked auditory feedback (MAF). MAF may be used in addition to the output masking sound described above, in that the masking signal may be combined with a voice signal representing the user's voice.  according to one or more examples of the present aspects the fluency aid is operable to derive a voice signal based on the user's own voice (which may have been detected by the voice detector), wherein the voice signal and the masking signal are combined to produce a masked auditory feedback, MAF, signal, see par. [0013-0014]. According to an example of a further aspect there is provided a telephone, headphones, acoustic noise cancelling headphones, smart watch, or other portable device comprising the fluency aid as described above. These and any other wearable devices may include a fluency aid as described above, see par. [0037].
It would have been obvious to one of ordinary skill in the art to combine the Florencio invention with the teachings of Rashid for the benefit of aiding aid fluency of speaking, see par. [0001].
Regarding claim 14 Florencio teaches the apparatus of claim 11, wherein the apparatus is configured:  perform a speaker identification algorithm to determine whether a voice is the user's voice(auditory cues to aid in identifying a particular speaker, see par. [0042]); and  perform the transformation only for time intervals in which the user is speaking (audio transformation component 120 can intelligently determine or infer which type of audio transform 122 to apply as well as when and/or for whom to apply audio transforms 122, see par. [0055]).
Regarding claim 19 Florencio teaches the apparatus of claim 11, wherein:  the apparatus further includes one or more computers; and  the one or more computers are programmed to analyze a user's voice to determine when to change which mode in the set of modes is employed in the transforming (audio transformation component can apply a variety of audio transforms based upon a type of target audience, see par. [0034]) ; or one or more parameters of a mode in the set of modes (applying the whisper transform for private communications may be turned on only by users who desire to employ the feature. Similarly, a user may decide to make a superior's voice stronger or more reverberant, e.g., to indicate a degree of importance. Additionally or alternatively, the user might decide to apply, say, a Mickey Mouse transform to the superior's voice to, e.g. lighten the mood, see par. [0046]).
Regarding claim 20 Florencio teaches the apparatus of claim 11, wherein:  the apparatus further includes one or more computers; and  the one or more computers are programmed to accept data indicative of a user's input and to output, in accordance with the user's input, instructions to change which mode in the set of modes is employed in the transforming, or to change one or more parameters of a mode in the set of modes (for enriching multiparty communication environments by way of emotion, ambience, or pace transforms is illustrated. At reference numeral 902, an emotion transform can be applied when an identified vocal characteristic associated with an emotion or behavior is deemed inappropriate for the target audience. For example, indicia of anger, insecurity, stuttering or the like can be identified At reference numeral 904, speech processing techniques can be employed for suppressing or changing the identified vocal characteristic in order to effectuate the emotion transform, see par. [0068]).

Claim(s) 2, 6-8, 12, 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Florencio U.S. PAP 2010/00195812, in view of Rashid U.S. PAP 2019/0231583, further in view of Yoshizawa U.S. PAP 2006/0193671.
Regarding claim 2 Florencio teaches the method of claim 1, wherein:  the set of modes also includes a third mode (audio transformation component can apply emotion transform, see par. [0040]).
However Florencio in view of Rashid does not teach and  in the third mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a superposition of the user's voice and one or more pitch-shifted versions of the user's voice that are sounded simultaneously with the user's voice, each of the one or more pitch-shifted versions being shifted in pitch, relative to the user's voice, by a frequency interval that occurs between notes of a chord in a chromatic musical scale.
In the same field of endeavor Yoshizawa teaches an audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
Regarding claim 6 Florencio teaches the method of claim 1, further comprising repeatedly sampling fundamental frequency of the user's voice during the transforming, which fundamental frequency changes over time during the transforming (speech processing techniques can relate to preprocessing based on pitch, frequency and so forth, see par. [0070]); 
However Florencio in view of Rashid does not teach  the set of modes also includes a third mode; and  in the third mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a superposition of at least the user's voice and a changing musical note, in such a way that each time that the changing note is sounded, the note is equal in pitch to fundamental frequency of the user's voice as most recently sampled, and the note changes over time due to the fundamental frequency of the user's voice changing over time.
In the same field of endeavor Yoshizawa teaches a method of restoring the voice of the two friends by using an audio restoration apparatus of the present invention. In the example of FIG. 4, a mixed audio where the friends' voices, the noises of cars and the voices of surrounding people are overlapped corresponds to the mixed audio S101, and the two friends' voices  corresponds to the restored audio S106 to be generated. The audio restoration unit 108A is an example audio restoration unit which restores the whole audio to be restored made up of the missing part audio and the audios of the parts other than the missing part, using at least one of the phoneme sequence, character sequence and musical note sequence which have been generated, see par. [0107]. Segment the musical audio into domains having the audio characteristics to be extracted, based on an audio structure change, and extract the real audio characteristics of the audio to be restored from the domains. In addition, in order to detect a melody change, it extracts the audio structure information likewise the audio structure analysis unit 104B. Subsequently, it can previously classify the domains into groups based on a melody having the same audio characteristics such as audio color and audio volume, and detect a melody change based on the groups to which the extracted audio structures are belong, see par. [0135].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200]
Regarding claim 7 Yoshizawa teaches the method of claim 6, wherein, each time that the changing note is sounded, the note comprises a sound that is a recording of, or that is synthesized to emulate, a note produced by an acoustic string instrument, an acoustic wind instrument, or an acoustic percussion instrument (it extracts these audio characteristics using a presentation method based on a Musical instrument Digital Interface (MIDI), see par. [0139]).
Regarding claim 8 Florencio teaches the method of claim 1, further comprising repeatedly sampling fundamental frequency of the user's voice during the transforming, which fundamental frequency changes over time during the transforming (speech processing techniques can relate to preprocessing based on pitch, frequency and so forth, see par. [0070]); 
However Florencio in view of Rashid does not teach  the set of modes also includes a third mode; and  in the third mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a changing, pitch-shifted version of the user's voice, in such a way that the changing, pitch-shifted version has a fundamental frequency that is, at any given time, equal in pitch to a note in a chromatic musical scale, which note is nearest in frequency to the fundamental frequency of the user's voice as most recently sampled, and the fundamental frequency of the pitch-shifted version changes over time due to the fundamental frequency of the user's voice changing over time.
In the same field of endeavor Yoshizawa teaches an audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133]. Segment the musical audio into domains having the audio characteristics to be extracted, based on an audio structure change, and extract the real audio characteristics of the audio to be restored from the domains. In addition, in order to detect a melody change, it extracts the audio structure information likewise the audio structure analysis unit 104B. Subsequently, it can previously classify the domains into groups based on a melody having the same audio characteristics such as audio color and audio volume, and detect a melody change based on the groups to which the extracted audio structures are belong, see par. [0135].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
Regarding claim 12 Florencio teaches the apparatus of claim 11, wherein the apparatus is configured in such a way that:  the set of modes also includes a third mode (audio transformation component can apply emotion transform, see par. [0040]).
However Florencio in view of Rashid does not teach and  in the third mode, the transformation causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a superposition of the user's voice and one or more pitch-shifted versions of the user's voice that are sounded simultaneously with the user's voice, each of the one or more pitch-shifted versions being shifted in pitch, relative to the user's voice, by a frequency interval that occurs between notes of a chord in a chromatic musical scale.
apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics, a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
Regarding claim 16 Florencio teaches the apparatus of claim 11, wherein:  the apparatus is configured to repeatedly sample fundamental frequency of the user's voice during the transformation, which fundamental frequency changes over time during the transformation.
However Florencio in view of Rashid does not teach and  the apparatus is configured in such a way that the set of modes also includes a third mode, and in the third mode  the transformation causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a superposition of at least the user's voice and a changing musical note,  each time that the changing note is sounded, the note is equal in pitch to fundamental frequency of the user's voice as most recently sampled, and  the note changes over time due to the fundamental frequency of the user's voice changing over time.
In the same field of endeavor Yoshizawa teaches a method of restoring the voice of the two friends by using an audio restoration apparatus of the present invention. In the example of FIG. 4, a mixed audio where the friends' voices, the noises of cars and the voices of surrounding people are overlapped corresponds to the mixed audio S101, and the two friends' voices  corresponds to the restored audio S106 to be generated. The audio restoration unit 108A is an example audio restoration unit which restores the whole audio to be restored made up of the missing part audio and the audios of the parts other than the missing part, using at least one of the phoneme sequence, character sequence and musical note sequence which have been generated, see par. [0107]. Segment the musical audio into domains having the audio characteristics to be extracted, based on an audio structure change, and extract the real audio characteristics of the audio to be restored from the domains. In addition, in order to detect a melody change, it extracts the audio structure information likewise the audio structure analysis unit 104B. Subsequently, it can previously classify the domains into groups based on a melody having the same audio characteristics such as audio color and audio volume, and detect a melody change based on the groups to which the extracted audio structures are belong, see par. [0135].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200]

Regarding claim 17 Yoshizawa teaches the apparatus of claim 16, wherein the apparatus is configured in such a way that each time that the changing note is sounded, the note comprises a sound that is a recording of, or that is synthesized to emulate, a note produced by an acoustic string instrument, an acoustic wind instrument, or an acoustic percussion instrument (it extracts these audio characteristics using a presentation method based on a Musical instrument Digital Interface (MIDI), see par. [0139])..
Regarding claim 18 Florencio teaches the apparatus of claim 11, wherein:  the apparatus is configured to repeatedly sample fundamental frequency of the user's voice during the transformation, which fundamental frequency changes over time during the transformation.
However Florencio in view of Rashid does not teach and  the apparatus is configured in such a way that the set of modes also includes a third mode, and in the third mode  the transformation causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise a changing, pitch-shifted version of the user's voice,  the changing, pitch-shifted version has a fundamental frequency that is, at any given time, equal in pitch to a note in a chromatic musical scale, which note is nearest in frequency to the fundamental frequency of the user's voice as most recently sampled, and  the fundamental frequency of the pitch-shifted version changes over time due to the fundamental frequency of the user's voice changing over time.
In the same field of endeavor Yoshizawa teaches an audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133]. Segment the musical audio into domains having the audio characteristics to be extracted, based on an audio structure change, and extract the real audio characteristics of the audio to be restored from the domains. In addition, in order to detect a melody change, it extracts the audio structure information likewise the audio structure analysis unit 104B. Subsequently, it can previously classify the domains into groups based on a melody having the same audio characteristics such as audio color and audio volume, and detect a melody change based on the groups to which the extracted audio structures are belong, see par. [0135].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Florencio U.S. PAP 2010/00195812, in view of Rashid U.S. PAP 2019/0231583, further in view of Rudzicz WO 2013/013319 A1.

Regarding claim 5 Florencio teaches the method of claim 1, wherein:  the method further comprises repeatedly sampling fundamental frequency of the user's voice during the transforming, which fundamental frequency changes over time during the transforming (speech processing techniques can relate to preprocessing based on pitch, frequency and so forth, see par. [0070]).
However Florencio in view of Rashid does not teach  the set of modes also includes a third mode; and  in the third mode the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user, to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice, the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (iii) the chord remains constant between each temporally adjoining pair of pseudobeats, and (iv) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set.
In a similar field of endeavor Rudzicz teaches an acoustic transformation system and method. A specific embodiment is the transformation of acoustic speech signals produced by speakers with speech disabilities in order to make those utterances more intelligible to typical listeners. These modifications include the correction of tempo or rhythm, the adjustment of formant frequencies in sonorants, the removal or adjustment of aberrant voicing, the deletion of phoneme insertion errors, and the replacement of erroneously dropped phonemes. These methods may also be applied to general correction of musical or acoustic sequences, see abstract.
Rudzicz teaches the set of modes also includes a third mode; and  in the third mode the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user (transforming an acoustic signal is provided, the system comprising an acoustic transformation engine operable to apply one or more transformations to the acoustic signal in accordance with one or more transformation rules configured to determine the correctness of each of one or more temporal segments of the acoustic signal, see par. [0010]), to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice (an example of the results of this morphing technique may have three identified formants shifted to their expected frequencies. The indicated black lines labelled Fl , F2, F3, and F4 are example formants, which are concentrations of high energy within a frequency band over time and which are indicative of the sound being uttered. The locations of these formants being changed changes the way the utterance sounds, see par. [0059]), the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (iii) the chord remains constant between each temporally adjoining pair of pseudobeats ( addition to the modification of frequency characteristics that modify one note or chord to sound more like another note or chord (e.g., key changes), these modifications can also be used to correct for aberrant tempo, to insert notes or chords that were accidentally omitted, or to delete notes or chords that were accidentally inserted, see par. [0075]), and (iv) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set (The spectrogram or other frequency-based or frequency-derived (e.g. cepstral) representation of the acoustic signal may be obtained with a fast Fourier transform (FFT), linear predictive coding, or other such method (typically by analyzing short windows of the time signal, see par. [0039, 0042]).
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Rudzicz for the benefit of determining a correctness of one or more temporal segments and improve a listeners ability to comprehend speech, see par. [0056].

Regarding claim 15 Florencio in view of Rashid does not teach the apparatus of claim 11, wherein:  the apparatus is configured to repeatedly sample fundamental frequency of the user's voice during the transformation, which fundamental frequency changes over time during the transformation; and  the apparatus is configured in such a way that the set of modes also includes a third mode, and in the third mode  the transformation causes the transformed sound, which is outputted by the one or more speakers and is audible to the user, to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other,  the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice,  the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (D) the chord remains constant between each temporally adjoining pair of pseudobeats, and (E) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set.
In the same field of endeavor Rudzicz teaches the set of modes also includes a third mode; and  in the third mode the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user (transforming an acoustic signal is provided, the system comprising an acoustic transformation engine operable to apply one or more transformations to the acoustic signal in accordance with one or more transformation rules configured to determine the correctness of each of one or more temporal segments of the acoustic signal, see par. [0010]), to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice (an example of the results of this morphing technique may have three identified formants shifted to their expected frequencies. The indicated black lines labelled Fl , F2, F3, and F4 are example formants, which are concentrations of high energy within a frequency band over time and which are indicative of the sound being uttered. The locations of these formants being changed changes the way the utterance sounds, see par. [0059]), the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (iii) the chord remains constant between each temporally adjoining pair of pseudobeats ( addition to the modification of frequency characteristics that modify one note or chord to sound more like another note or chord (e.g., key changes), these modifications can also be used to correct for aberrant tempo, to insert notes or chords that were accidentally omitted, or to delete notes or chords that were accidentally inserted, see par. [0075]), and (iv) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set (The spectrogram or other frequency-based or frequency-derived (e.g. cepslral) representation of the acoustic signal may be obtained with a fast Fourier transform (FFT), linear predictive coding, or other such method (typically by analyzing short windows of the time signal, see par. [0039, 0042]).
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Rudzicz for the benefit of determining a correctness of one or more temporal segments and improve a listeners ability to comprehend speech, see par. [0056].
Claim(s) 21-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Florencio U.S. PAP 2010/00195812, in view of Rashid U.S. PAP 2019/0231583, in view of Yoshizawa U.S. PAP 2006/0193671, further in view of Rudzicz WO 2013/013319 A1.

Regarding claim 21 Florencio teaches the method of claim 1, further comprising performing a speaker identification algorithm to determine whether a voice is the user's voice, wherein the transforming is performed only for time intervals in which the user is speaking (audio transformation component 120 can intelligently determine or infer which type of audio transform 122 to apply as well as when and/or for whom to apply audio transforms 122, see par. [0055]), wherein the set of modes also includes a third mode (audio transformation component can apply emotion transform, see par. [0040]); and repeatedly sampling fundamental frequency of the user's voice during the transforming, which fundamental frequency changes over time during the transforming (speech processing techniques can relate to preprocessing based on pitch, frequency and so forth, see par. [0070]).
However Florencio does not teach and  in the third mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user, to comprise a superposition of the user's voice and one or more pitch-shifted versions of the user's voice that are sounded simultaneously with the user's voice, each of the one or more pitch-shifted versions being shifted in pitch, relative to the user's voice, by a frequency interval that occurs between notes of a chord in a chromatic musical scale.
In the same field of endeavor Yoshizawa teaches an audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133].
It would have been obvious to one of ordinary skill in the art to combine the Florencio invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
However Florencio in view of Rashid in view of Yoshisawa does not teach wherein the set of modes also includes a fourth mode; and wherein in the fourth mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a  way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice, the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, the chord remains constant between each temporally adjoining pair of pseudobeats, and each pseudobeat in the set, except an initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set.
In the same field of endeavor Rudzicz teaches the set of modes also includes a fourth mode; and  in the fourth mode the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user (transforming an acoustic signal is provided, the system comprising an acoustic transformation engine operable to apply one or more transformations to the acoustic signal in accordance with one or more transformation rules configured to determine the correctness of each of one or more temporal segments of the acoustic signal, see par. [0010]), to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice (an example of the results of this morphing technique may have three identified formants shifted to their expected frequencies. The indicated black lines labelled Fl , F2, F3, and F4 are example formants, which are concentrations of high energy within a frequency band over time and which are indicative of the sound being uttered. The locations of these formants being changed changes the way the utterance sounds, see par. [0059]), the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (iii) the chord remains constant between each temporally adjoining pair of pseudobeats ( addition to the modification of frequency characteristics that modify one note or chord to sound more like another note or chord (e.g., key changes), these modifications can also be used to correct for aberrant tempo, to insert notes or chords that were accidentally omitted, or to delete notes or chords that were accidentally inserted, see par. [0075]), and (iv) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set (The spectrogram or other frequency-based or frequency-derived (e.g. cepstral) representation of the acoustic signal may be obtained with a fast Fourier transform (FFT), linear predictive coding, or other such method (typically by analyzing short windows of the time signal, see par. [0039, 0042]).
It would have been obvious to one of ordinary skill in the art to combine the Florencio invention with the teachings of Rudzicz for the benefit of determining a correctness of one or more temporal segments and improve a listeners ability to comprehend speech, see par. [0056].

Regarding claim 22 Florencio teaches the apparatus of claim 11, wherein the apparatus is configured to: perform a speaker identification algorithm to determine whether a voice is the user's voice; and perform the transformation only for time intervals in which the user is speaking (audio transformation component 120 can intelligently determine or infer which type of audio transform 122 to apply as well as when and/or for whom to apply audio transforms 122, see par. [0055]), wherein the apparatus is configured in such a way that: the set of modes also includes a third mode (audio transformation component can apply emotion transform, see par. [0040]); and repeatedly sampling fundamental frequency of the user's voice during the transforming, which fundamental frequency changes over time during the transforming (speech processing techniques can relate to preprocessing based on pitch, frequency and so forth, see par. [0070]).
However Florencio in view of Rashid does not teach and  in the third mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user, to comprise a superposition of the user's voice and one or more pitch-shifted versions of the user's voice that are sounded simultaneously with the user's voice, each of the one or more pitch-shifted versions being shifted in pitch, relative to the user's voice, by a frequency interval that occurs between notes of a chord in a chromatic musical scale.
In the same field of endeavor Yoshizawa teaches an audio restoration apparatus which restores an audio to be restored having a missing audio part and being included in a mixed audio, see abstract. The audio characteristic modification unit 203 modifies the audio characteristic information S105 generated by the audio characteristic extraction unit 107 so as to generate modified audio characteristic information S201. Here, the audio characteristic modification unit 203 modifies the audio characteristic information S105 so as to generate audio characteristics which are listenable to the user. The audio characteristic information S105 is made up of the speaker's characteristics, the gender-specific characteristics, the voice age, the voice characteristic, the voice tone, the audio volume, the audio quality, the reverberation characteristic and the audio color. For example, the audio characteristic modification unit 203 can modify only the audio characteristic corresponding to the speaker's characteristics in order to highlight the feature of the speaker a little bit. Without modifying the real audio characteristics a lot, it is possible to generate modified restored audio which is listenable and sounds natural. In addition, it can modify the voice tone of the announcement into a polite voice tone. In addition, it modifies a stuttering voice into a clear voice in order to make it possible to generate modified restored audio which is listenable. In addition, it can make the audio volume louder or reduce the reverberation in order to make it possible to generate modified restored audio which is listenable. Since only a part of audio characteristics is modified here, it is possible to generate modified restored audio which sounds natural, see par. [0200]. The audio structure analysis unit 104B generates audio structure information S103B of the BGM playing in streets, which is a musical audio to be restored, based on the separated audio information S102B extracted by the mixed audio separation unit 103 and the audio structure knowledge database 105B made up of an audio ontology dictionary, and a musical score dictionary. First, as shown in FIG. 20, it performs frequency analysis of the audio waveform which is an extraction of the components of the BGM playing in streets and which is the separated audio information S102B. Next, it estimates the musical note sequence of the missing part using the analyzed frequency structure and the audio ontology dictionary. In the audio ontology dictionary, rules of chords, modulation, and rhythms of musical notes are stored, see par. [0133].
It would have been obvious to one of ordinary skill in the art to combine the Florencio in view of Rashid invention with the teachings of Yoshizawa for the benefit of generating modified restored audio which sounds natural, see par. [0200].
However Florencio in view of Rashid in view of Yoshisawa does not teach wherein the set of modes also includes a fourth mode; and wherein in the fourth mode, the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to only the user, to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a  way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice, the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, the chord remains constant between each temporally adjoining pair of pseudobeats, and each pseudobeat in the set, except an initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set.
In the same field of endeavor Rudzicz teaches the set of modes also includes a fourth mode; and  in the fourth mode the transforming causes the transformed sound, which is outputted by the one or more speakers and is audible to the user (transforming an acoustic signal is provided, the system comprising an acoustic transformation engine operable to apply one or more transformations to the acoustic signal in accordance with one or more transformation rules configured to determine the correctness of each of one or more temporal segments of the acoustic signal, see par. [0010]), to comprise, at each pseudobeat in a set of pseudobeats, a superposition of two or more pitch-shifted versions of the user's voice, which pitch-shifted versions are sounded simultaneously with each other, in such a way that the fundamental frequencies of the respective pitch-shifted versions together form a chord in a chromatic musical scale, which chord has a root note that is the fundamental frequency of one of the pitch-shifted versions and is the nearest note in the scale to the fundamental frequency of the user's voice (an example of the results of this morphing technique may have three identified formants shifted to their expected frequencies. The indicated black lines labelled Fl , F2, F3, and F4 are example formants, which are concentrations of high energy within a frequency band over time and which are indicative of the sound being uttered. The locations of these formants being changed changes the way the utterance sounds, see par. [0059]), the chord may but does not necessarily change at each pseudobeat in the set, depending on whether the fundamental frequency of the user's voice as most recently sampled has changed, (iii) the chord remains constant between each temporally adjoining pair of pseudobeats ( addition to the modification of frequency characteristics that modify one note or chord to sound more like another note or chord (e.g., key changes), these modifications can also be used to correct for aberrant tempo, to insert notes or chords that were accidentally omitted, or to delete notes or chords that were accidentally inserted, see par. [0075]), and (iv) each pseudobeat in the set, except the initial pseudobeat of the set, occurs at the earliest time at which a build-up in amplitude of the user's voice occurs after a specified temporal interval has elapsed since the most recent pseudobeat in the set (The spectrogram or other frequency-based or frequency-derived (e.g. cepstral) representation of the acoustic signal may be obtained with a fast Fourier transform (FFT), linear predictive coding, or other such method (typically by analyzing short windows of the time signal, see par. [0039, 0042]).
It would have been obvious to one of ordinary skill in the art to combine the Florencio invention with the teachings of Rudzicz for the benefit of determining a correctness of one or more temporal segments and improve a listeners ability to comprehend speech, see par. [0056].

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711. The examiner can normally be reached Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656