DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on   01/08/2021. 
Claims 1-20 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
In the interest of continuity of prosecution, Examiner notes a change of name since the previous Office Action was mailed.
	Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 01/08/2021 regarding the rejection based on 35 U.S.C. § 101 have been fully considered but they are not persuasive. Applicant asserts on page 9 that the parsing of audio data into segments, automatic translation, and generation of an output audio file cannot be performed entirely in the human mind. The Examiner respectfully disagrees with this assertion. The limitation regarding parsing of audio data into segments can read to a human recognizing speech in segments such as a sentence at a time. The limitations regarding translation can read to a human first writing down the speech into text of the spoken language, and then writing down a mental translation of the spoken language text into a translated language text, which is a mental process aided by pen and paper. The Examiner notes that, while Applicant argues that an “automatic translation” occurs, the claim language does not . 
Hence, Applicant’s arguments are not persuasive.
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. More specifically, the amendments reciting “receiving, by the processor, generic audio data associated with a generic speaker, wherein the generic audio data comprises a plurality of generic audio segments;…generating an output audio in the second language for the target text using the one or more audio segments for the speaker and the plurality of generic audio segments”. Please see the new mappings with respect to these limitations.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding claims 1, 8, and 15, the limitation(s) of “receiving”, “receiving”, “determining”, “partitioning”, “converting”, “converting”, and “generating”, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. More specifically, the mental process of a human hearing a first speaker speak in a spoken language, hearing a second speaker speak in a spoken language, recognizing specific voice characteristics of the first speaker in segments of speech, such as a sentence at a time, writing down the speech into text of the spoken language, writing down a translation of the spoken language text into a translated language text, and speaking the translated language text aloud in a voice mimicking a combination of voice characteristics of the first and second speakers, such as the intonation of the first speaker and the accent of the second speaker. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the --Mental Processes-- grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application because the recitation of “a processor” in claim 1, a “system”, “processor”, and “memory” in claim 8, and a “computer readable storage medium” and “processor” in claim 15, read to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using Fig. 3, [0040] and [0055] in the specification. Accordingly, these additional elements do not integrate the abstract idea into a practical application 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to receive, determine, partition, convert, and generate amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claims are not patent eligible.

	With respect to claims 2, 9, and 16, the claims recite “recording”, which reads on a human timing how long each section takes to be spoken. No additional limitations are present.

	With respect to claims 3, 10, and 17, the claims recite “generating”, which reads on a human speaking the first section of speech in the translated language. No additional limitations are present.

With respect to claims 4, 11, and 18, the claim recites “performing and audio compression”, which reads on a human speaking faster in order to have the translated speech take the same amount of time to say as the original speech. No additional limitations are present.
	
, which reads on a human speaking slower in order to have the translated speech take the same amount of time to say as the original speech. No additional limitations are present.
	
	With respect to claims 6, 13, and 20, the claims recite a further definition of speaker characteristics. No additional limitations are present.

With respect to claims 7 and 14, the claims recite a further definition of segments. No additional limitations are present.

These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 8, and 15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chun (US Patent No. 9922641), hereinafter Chun.

Regarding claims 1, 8, and 15, Chun teaches
(claim 1) A computer-implemented method comprising ((9:56-59) a method performed by a computer program product):
(claim 8) A system comprising ((1:39-41) a system):
a processor communicatively coupled to a memory ((1:39-42), (9:23-32) a system that includes a processor and a memory that are interconnected, i.e. communicatively coupled), the processor configured to:
(claim 15) A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising ((1:54-62) a computer program product comprising computer readable instructions, i.e. program instructions, encoded on a storage device, i.e. computer readable storage medium, that, when executed, cause one or more processors to perform operations, i.e. executable by a processor to cause the processor to perform a method):

receiving, by a processor, audio data associated with a speaker, wherein the audio data is in a first language spoken by the speaker ((1:41-43) a speech synthesis engine includes a processor that is configured to receive input speech data from a speaker, i.e. receiving…audio data associated with a speaker, in a first language, i.e. the audio data is in a first language spoken by the speaker);
receiving, by the processor, generic audio data associated with a generic speaker, wherein the generic audio data comprises a plurality of generic audio segments ((1:57-67), (5:14-27, 60-65), (6:36-42) the instructions executed by a processor, i.e. by the processor, include receiving input speech data, where the speech data is from a plurality of speakers, i.e. receiving…generic audio data associated with a generic speaker, can be used to obtain a speaker-independent speech model for the second language, and the speech data is produced by a speech recognition engine configured to recognize human speech from audio data captured by a user device, and may include a speech segmentation routine for breaking sounds into sub-parts, i.e. the generic audio data comprises a plurality of generic audio segments);
determining speaker characteristics associated with the speaker from the audio data ((5:28-32), (6:29-35) the speech data from the first speaker, i.e. associated with the speaker from the audio data, is analyzed to estimate a speaker transform that, when applied to speech parameters of a universal model, produce speaker characteristics of the first speaker, i.e. determining speaker characteristics), wherein the determining the speaker characteristics associated with the speaker from the audio data comprises:
partitioning the audio data associated with the speaker into one or more audio segments ((5:60-65), (6:36-42) speech data is produced by a speech recognition engine configured to recognize human speech from audio data captured by a user device, i.e. audio data associated with the speaker, and may include a speech segmentation routine for breaking sounds into sub-parts, i.e. partitioning…into one or more audio segments);
converting the one or more audio segments to a source text in the first language ((6:20-29, 36-42, 55-57) the speech recognition engine has a speech segmentation routine to break up sounds into sub-parts, i.e. one or more audio segments, and the engine further recognizes and converts utterances in the audio data into text in a first language, i.e. converting…to a source text in the first language);
converting the source text to a target text, wherein the target text is in a second language ((6:55-61) the translation reads the output text in the first language, such as an English language text file, i.e. source text, and generates a second text file in a target language, such as a French-language text file, i.e. converting…to a target text…in a second language; and
 generating an output audio in the second language for the target text using the one or more audio segments for the speaker and the plurality of generic audio segments ((1:57-67), (5:14-27, 28-32, 60-65), (6:29-42) (8:17-21, 31-35, 44-48), (8:57-9:2) a speaker transform representing the speaker characteristics of the individual who provides the speech in the source language, i.e. one or more audio segments for the speaker, is used to modify the speaker-independent speech model, i.e. plurality of generic audio segments, to obtain a speaker-specific speech model that .  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2, 3, 7, 9, 10, 14, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Meng et al. (US Patent No. 9342509), as found in the IDS, hereinafter Meng.

Regarding claims 2, 9, and 16, Chun teaches claims 1, 8, and 15.
While Chun provides segmenting speech into sub-parts, Chun does not specifically teach recording the length of the sub-parts, and thus does not teach
recording a length of time associated with each of the one or more segments.
Meng, however, teaches recording a length of time associated with each of the one or more segments ((4:47-58), (5:8-14) the duration, i.e. length of time, of each speech unit, i.e. each of the one or more segments, is extracted, i.e. recording).
Chun and Meng are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the segmenting of speech for further processing of Chun with the recording of the time of the speech units as taught by Meng. The motivation to do so would have been to achieve a predictable result of making sure the target speech units have the same prosodic information as that of the source speech (Meng (5:8-14)).

Regarding claims 3, 10, and 17, Chun in view of Meng teaches claims 2, 9, and 16, and Meng further teaches
generating first spoken audio for a first segment from the one or more segments, wherein the first spoken audio is in the second language ((2:42-44), (5:8-15) speech synthesis is performed, i.e. generating first spoken audio, to synthesize the speech units, i.e. first segment, of the translated target speech, i.e. first spoken audio is in the second language).

Regarding claims 7 and 14, Chun in view of Meng teaches claims 2 and 10, and Meng further teaches
wherein the one or more segments comprise at least one of a word, a phrase, and a sentence ((4:17-19) a speech unit, i.e. segment, can be a sentence, phrase, or word).  

Claim(s) 4, 5, 11, 12, 18, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Meng, and further in view of Rossano et al. (U.S. Patent No. 9552807), hereinafter Rossano.

Regarding claims 4, 11, and 18, Chun in view of Meng teaches claims 3, 10, and 17.
While Chun in view of Meng provides extracting the duration of a speech unit, Chun in view of Meng does not specifically teach using the duration as part of a compression process, and thus does not teach
performing an audio compression operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment.  
Rossano, however, teaches performing an audio compression operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment ((6:23-34, 51-58), (6:64-7:2), (7:8-22) the translated speech segment, i.e. first spoken audio, is adjusted with the recommendations by the prosody analysis unit to mimic the original spoken voice, such as emphasis, volume, speed, and pitch, which includes timing differences for translations between short duration languages and long duration .  
Chun, Meng, and Rossano are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the extraction of a duration of a speech unit teachings of Chun, as modified by Meng, with shrinking the audio track to fit the original timing as taught by Rossano. The motivation to do so would have been to achieve a predictable result of enabling the automatic dubbing of a video in a first language into a second language while maintaining the speech timing of the original movie (Rossano (6:64-7:2)).

Regarding claims 5, 12, and 19, Chun in view of Meng teaches claims 3, 10, and 17.
While Chun in view of Meng provides extracting the duration of a speech unit, Chun in view of Meng does not specifically teach using the duration as part of an expansion process, and thus does not teach
performing an audio expansion operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment.  
Rossano, however, teaches performing an audio expansion operation on the first spoken audio to match the speaker characteristics of the first segment in audio data and the length of time associated with the first segment ((6:23-34, 51-58), (6:64-7:2), (7:8-22) the translated speech segment, i.e. first spoken audio, is .  
Chun, Meng, and Rossano are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the extraction of a duration of a speech unit teachings of Chun, as modified by Meng, with stretching the audio track to fit the original timing as taught by Rossano. The motivation to do so would have been to achieve a predictable result of enabling the automatic dubbing of a video in a first language into a second language while maintaining the speech timing of the original movie (Rossano (6:64-7:2)).

Claim(s) 6, 13, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chun, in view of Miro et al. (US PG Pub No. 2005/0182630), hereinafter Miro.

Regarding claims 6, 13, and 20, Chun teaches claims 1, 8, and 15.
While Chun provides the recognition of speaker characteristics, Chun does not specifically teach that the characteristics are phonemes and vocal range, and thus does not teach
wherein the speaker characteristics associated with the speaker comprise phonemes of the speaker and vocal range.  
Miro, however, teaches wherein the speaker characteristics associated with the speaker comprise phonemes of the speaker and vocal range ([0021], [0032:1-6] speaker source parameters, i.e. speaker characteristics, may include vocal range, and the speech synthesizer converts text to phonemes using source parameters, i.e. phonemes).  
Chun and Miro are analogous art because they are from a similar field of endeavor in speech-to-speech translation. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the speaker characteristics of Chun with the specific recognition of phonemes and vocal range as characteristics as taught by Miro. The motivation to do so would have been to achieve a predictable result of enabling the generation of synthesized speech with in multiple languages with a range of accents (Miro [0024]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Qian et al. (US Patent No. 8594993): Cross-lingual voice transformation allowing the use of target language speech characteristics in combination with the voice characteristics of the first language. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474.  The examiner can normally be reached on 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571) 272-7799.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/NICOLE A K SCHMIEDER/           Examiner, Art Unit 2659                                                                                                                                                                                             
/Paras D Shah/           Primary Examiner, Art Unit 2659                                                                                                                                                                                             
03/17/2021