DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/24/2021 has been entered.
This communication is in response to the Amendments and Arguments filed on   05/24/2021. 
Claims 1-20 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
	Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 
Response to Arguments
Applicant's arguments filed 05/24/2021 regarding the 101 rejection have been fully considered but they are not persuasive. Applicant presents the same arguments as previously presented in communications filed 01/08/2021, pages 8-9, and fully responded to by the Examiner in the previous Office Action mailed 03/22/2021, pages 2-3. As the arguments have not changed, neither has the Examiner’s response. Please refer to the previously presented response for details regarding the rejection of claims 1-  
Applicant’s arguments with respect to claim(s) 1, 8, and 15 have been considered but are moot because the new ground of rejection does not rely solely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. More specifically, the limitations of “generating a set of words for the speaker based on the set of missing phonemes; obtaining additional audio data for the speaker, wherein the additional audio data is the set of words spoken by the speaker;” are taught by newly cited reference, Mok, which teaches the identification of missing phonemes in a TTS database, and recording the missing phonemes, which are then added to the TTS database to be used for synthesis of text. Additionally, the limitation reciting “generating an output audio in the second language .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claims 1, 8, and 15, the limitation(s) of “receiving”, “receiving”, “determining”, “partitioning”, “converting”, “converting”, “determining”, “generating”, “obtaining”, and “generating”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human hearing a first speaker speak in a spoken language, hearing a second speaker speak in 
This judicial exception is not integrated into a practical application because the recitation of “a processor” in claim 1, a “system”, “processor”, and “memory” in claim 8, and a “computer readable storage medium” and “processor” in claim 15, read to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using Fig. 3, [0040] and [0055] in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the 

	With respect to claims 2, 9, and 16, the claims recite “recording”, which reads on a human timing how long each section takes to be spoken. No additional limitations are present.
	With respect to claims 3, 10, and 17, the claims recite “generating”, which reads on a human speaking the first section of speech in the translated language. No additional limitations are present.

With respect to claims 4, 11, and 18, the claim recites “performing and audio compression”, which reads on a human speaking faster in order to have the translated speech take the same amount of time to say as the original speech. No additional limitations are present.
	
	With respect to claims 5, 12, and 19, the claims recite “performing an audio expansion”, which reads on a human speaking slower in order to have the translated speech take the same amount of time to say as the original speech. No additional limitations are present.


With respect to claims 7 and 14, the claims recite a further definition of segments. No additional limitations are present.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 8, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun (US Patent No. 9922641), hereinafter Chun, in view of Mok et al. (U.S. Patent No. 9786267), hereinafter Mok.

(claim 1) A computer-implemented method comprising ((9:56-59) a method performed by a computer program product):
(claim 8) A system comprising ((1:39-41) a system):
a processor communicatively coupled to a memory ((1:39-42), (9:23-32) a system that includes a processor and a memory that are interconnected, i.e. communicatively coupled), the processor configured to:
(claim 15) A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising ((1:54-62) a computer program product comprising computer readable instructions, i.e. program instructions, encoded on a storage device, i.e. computer readable storage medium, that, when executed, cause one or more processors to perform operations, i.e. executable by a processor to cause the processor to perform a method):

receiving, by a processor, audio data associated with a speaker, wherein the audio data is in a first language spoken by the speaker ((1:41-43) a speech synthesis engine includes a processor that is configured to receive input speech data from a speaker, i.e. receiving…audio data associated with a speaker, in a first language, i.e. the audio data is in a first language spoken by the speaker);
receiving, by the processor, generic audio data associated with a generic speaker, wherein the generic audio data comprises a plurality of generic audio segments ((1:57-67), (5:14-27, 60-65), (6:36-42) the instructions executed by a processor, i.e. by the processor, include receiving input speech data, where the speech data is from a plurality of speakers, i.e. receiving…generic audio data associated with a generic speaker, can be used to obtain a speaker-independent speech model for the second language, and the speech data is produced by a speech recognition engine configured to recognize human speech from audio data captured by a user device, and may include a speech segmentation routine for breaking sounds into sub-parts, i.e. the generic audio data comprises a plurality of generic audio segments), wherein the plurality of generic audio segments comprises a set of generic phonemes ((3:9-22, 43-48),(4:2-10),(6:36-42) the TTS synthesizer system uses phonetic transcriptions, referred to as text-to-phoneme conversion, i.e. phonemes, and the voice characteristics for the audio output are from extracting speaker characteristics from multiple speakers to obtain a universal speech model, such as by breaking the sounds into sub-parts, i.e. plurality of generic audio segments);
determining speaker characteristics associated with the speaker from the audio data ((5:28-32), (6:29-35) the speech data from the first speaker, i.e. associated with the speaker from the audio data, is analyzed to estimate a speaker transform that, when applied to speech parameters of a universal model, produce speaker characteristics of the first speaker, i.e. determining speaker characteristics), wherein the determining the speaker characteristics associated with the speaker from the audio data comprises:
partitioning the audio data associated with the speaker into one or more audio segments ((5:60-65), (6:36-42) speech data is produced by a speech recognition ;
converting the audio data to a source text in the first language ((6:20-29, 36-42, 55-57) the speech recognition engine has a speech segmentation routine to break up sounds into sub-parts, and the engine further recognizes and converts utterances in the audio data, i.e. audio data, into text in a first language, i.e. converting…to a source text in the first language);
converting the source text to a target text, wherein the target text is in a second language ((6:55-61) the translation reads the output text in the first language, such as an English language text file, i.e. source text, and generates a second text file in a target language, such as a French-language text file, i.e. converting…to a target text…in a second language; and
generating an output audio in the second language for the target text using the one or more audio segments for the speaker, … , and the plurality of generic audio segments ((1:57-67), (5:14-27, 28-32, 60-65), (6:29-42) (8:17-21, 31-35, 44-48), (8:57-9:2) a speaker transform representing the speaker characteristics of the individual who provides the speech in the source language, i.e. one or more audio segments for the speaker, is used to modify the speaker-independent speech model, i.e. plurality of generic audio segments, to obtain a speaker-specific speech model that is used to generate speech data in the second language, i.e. generating…audio in the second .  
While Chun provides determining speaker characteristics, Chun does not specifically teach identifying missing phonemes for a particular speaker and obtaining the missing data, and thus does not teach
determining a set of missing phonemes for the speaker based on the target text;
generating a set of words for the speaker based on the set of missing phonemes;
obtaining additional audio data for the speaker, wherein the additional audio data is the set of words spoken by the speaker; and
the additional audio data
Mok, however, teaches determining a set of missing phonemes for the speaker based on the target text ((9:48-55,59-67),(10:1-13) the user voice phonemes are stored in a TTS database, and the control unit identifies the availability of an autocomplete function by comparing each phoneme stored in the database with the text, i.e. based on the target text, and where the registration of the TTS database compared to the text is less than 100%, i.e. determining a set of missing phonemes for the speaker);
generating a set of words for the speaker based on the set of missing phonemes ((9:11-39),(10:9-13) the control unit may suggest a required phoneme, i.e. based on the set of missing phonemes, to be recorded, where a user says words that ;
obtaining additional audio data for the speaker, wherein the additional audio data is the set of words spoken by the speaker ((9:11-39),(10:9-13) the control unit may suggest a required phoneme to be recorded, i.e. obtaining additional audio data for the speaker, where a user says words that are recorded and divided into phonemic voice files, i.e. additional audio data is the set of words spoken by the speaker); and
the additional audio data ((10:9-13,20-24) the control unit records the required phoneme, i.e. the additional audio data, so that the percentage of registration in the TTS database becomes 100%, and uses the TTS database, i.e. the one or more audio segments for the speaker, the additional audio data, to perform TTS recording of the pages).
Chun and Mok are analogous art because they are from a similar field of endeavor in providing TTS using the voice characteristics of a specific individual. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the determining speaker characteristics teachings of Chun with identifying and recording missing phonemes for a particular user voice as taught by Mok. The motivation to do so would have been to substitute similar elements to achieve a predictable result of enabling a TTS system to identify when an entire text can be recorded using the phonemes stored for a particular user voice (Mok (9:59-67)).

Claim(s) 2, 3, 7, 9, 10, 14, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Mok, and further in view of Meng et al. (US Patent No. 9342509), as found in the IDS, hereinafter Meng.

Regarding claims 2, 9, and 16, Chun in view of Mok teaches claims 1, 8, and 15.
While Chun in view of Mok provides segmenting speech into sub-parts, Chun in view of Mok does not specifically teach recording the length of the sub-parts, and thus does not teach
recording a length of time associated with each of the one or more segments.
Meng, however, teaches recording a length of time associated with each of the one or more segments ((4:47-58), (5:8-14) the duration, i.e. length of time, of each speech unit, i.e. each of the one or more segments, is extracted, i.e. recording).
Chun, Mok, and Meng are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the segmenting of speech for further processing of Chun, as modified by Mok, with the recording of the time of the speech units as taught by Meng. The motivation to do so would have been to achieve a predictable result of making sure the target speech units have the same prosodic information as that of the source speech (Meng (5:8-14)).



generating first spoken audio for a first segment from the one or more segments, wherein the first spoken audio is in the second language ((2:42-44), (5:8-15) speech synthesis is performed, i.e. generating first spoken audio, to synthesize the speech units, i.e. first segment, of the translated target speech, i.e. first spoken audio is in the second language).
Where the motivation to combine is the same as previously presented.

Regarding claims 7 and 14, Chun in view of Mok and Meng teaches claims 2 and 10, and Meng further teaches
wherein the one or more segments comprise at least one of a word, a phrase, and a sentence ((4:17-19) a speech unit, i.e. segment, can be a sentence, phrase, or word).  
Where the motivation to combine is the same as previously presented.

Claim(s) 4, 5, 11, 12, 18, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Mok, in view of Meng, and further in view of Rossano et al. (U.S. Patent No. 9552807), hereinafter Rossano.

Regarding claims 4, 11, and 18, Chun in view of Mok and Meng teaches claims 3, 10, and 17.

performing an audio compression operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment.  
Rossano, however, teaches performing an audio compression operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment ((6:23-34, 51-58), (6:64-7:2), (7:8-22) the translated speech segment, i.e. first spoken audio, is adjusted with the recommendations by the prosody analysis unit to mimic the original spoken voice, such as emphasis, volume, speed, and pitch, which includes timing differences for translations between short duration languages and long duration languages, i.e. match the speaker characteristics, and shrinking the audio track to fit the original timing, i.e. performing an audio compression operation).  
Chun, Mok, Meng, and Rossano are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the extraction of a duration of a speech unit teachings of Chun, as modified by Mok and Meng, with shrinking the audio track to fit the original timing as taught by Rossano. The motivation to do so would have been to achieve a predictable result of enabling the automatic dubbing of a video in a first language into a second 

Regarding claims 5, 12, and 19, Chun in view of Mok and Meng teaches claims 3, 10, and 17.
While Chun in view of Mok and Meng provides extracting the duration of a speech unit, Chun in view of Mok and Meng does not specifically teach using the duration as part of an expansion process, and thus does not teach
performing an audio expansion operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment.  
Rossano, however, teaches performing an audio expansion operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment ((6:23-34, 51-58), (6:64-7:2), (7:8-22) the translated speech segment, i.e. first spoken audio, is adjusted with the recommendations by the prosody analysis unit to mimic the original spoken voice, such as emphasis, volume, speed, and pitch, which includes timing differences for translations between short duration languages and long duration languages, i.e. match the speaker characteristics, and stretching the audio track to fit the original timing, i.e. performing an audio expansion operation).  
Chun, Mok, Meng, and Rossano are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed .

Claim(s) 6, 13, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Mok, and further in view of Miro et al. (US PG Pub No. 2005/0182630), hereinafter Miro.

Regarding claims 6, 13, and 20, Chun in view of Mok teaches claims 1, 8, and 15.
While Chun in view of Mok provides the recognition of speaker characteristics, which include phonemes, Chun in view of Mok does not specifically teach that the characteristics include vocal range, and thus does not teach
wherein the speaker characteristics associated with the speaker comprise phonemes of the speaker and vocal range.  
Miro, however, teaches wherein the speaker characteristics associated with the speaker comprise phonemes of the speaker and vocal range ([0021], [0032:1-6] speaker source parameters, i.e. speaker characteristics, may include vocal range, and the speech synthesizer converts text to phonemes using source parameters, i.e. phonemes).  


Conclusion	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474.  The examiner can normally be reached on 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for 






/NICOLE A K SCHMIEDER/Examiner, Art Unit 2659      


/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659