Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-7,9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Proidl et al (20080195386) in view of Liu (20090037179).

As per claim 1, Proidl et al (20080195386) teaches an automatic dubbing method (para 0019), comprising: 
extracting speeches of a voice from an audio portion of a media content (as extracting actor’s voice from the multimedia (movie format) – para 0031, 0030); 

processing the extracted speeches by utilizing the voice print model to generate replacement speeches; and replacing the extracted speeches of the voice with the generated replacement speeches in the audio portion of the media content (as using the extracted voice parameters of the actor to be used in the re-synthesis of the voice in a different language, but maintain the actor’s voice quality – para 0033, 0038). 
	Although Proidl et al (20080195386) teaches the replacement of the actor’s speech with a foreign language equivalent but with the actor’s voice quality, as shown above, Proidl et al (20080195386) does not explicitly teach the use of voice replacement on the phoneme level, however, Liu (20090037179) teaches voice conversion to a target voice (para 0004) using phoneme information (para 0036), and synthesis of the target voice using those units into the target voice/language (para 0040), and conversion to a target audio/video file (para 0080).  Therefore, it would have been obvious to one of ordinary skill in the art of voice replacement to modify the technique of Proidl et al (20080195386) with phoneme based conversion because it would advantageously provide for a smoother synthesized voice (Liu, para 0036, 0041).
        
As per claim 2, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 1, wherein the obtaining the voice print model further comprises: sampling speeches of a user by using a speech capturing device and creating the voice print model based on the sampled speeches of the user (Proidl et al (20080195386), as capturing the actor’s speech and using samples of the extracted speech from a multimedia signal for further training – para 0042); or choosing the voice print model from a predefined set of Proidl et al (20080195386) references meets the claim scope of claim 2 by the current mapping of the first subset of claim elements in claim 2, as shown above). 

As per claim 3 the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 2, wherein the creating the voice print model based on at least part of the extracted speeches of the voice further comprises: creating the voice print model for the voice based further on at least one of a closed caption, a subtitle, a script, a transcript, and a lyric of the media content (Proidl et al (20080195386), as the voice print model is based on a movie script – para 0011, which can include subtitles, or close captioning – para 0013). 

As per claim 4, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 2, wherein the choosing the voice print model from the predefined set of voice print models further comprises: choosing the voice print model from the predefined set of voice prints based on at least one of characteristic of the voice, speaker information of the media content, genre information of the media content, content of at least part of the extracted speeches of the voice (Proidl et al (20080195386), As choosing the voice based on the actor – para 0011, further in para 0031, in the example of, using voice characteristics of George Clooney). 

Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of one of claim 1, wherein the processing the extracted speeches further comprises: translating the extracted speeches of the voice in a first language to the replacement speeches in a second language by utilizing the voice print model (Proidl et al (20080195386), as using the voice print characteristics to provide a translation from the original language to a second different language – see para 0031, wherein the original language is English and the second/target language is German). 

As per claim 6, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 5, wherein the translating further comprises: generating the translated replacement speeches by further utilizing characteristics of the extracted speeches of the voice, wherein the characteristics includes at least one of a stress, a tonality, a speed, a volume and an inflection of the speeches (Proidl et al (20080195386), as the voice characteristics include pitch, melody, duration, loudness, timbre – para 0012). 

As per claim 7, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 6, wherein the translating further comprises: performing speech-to-text conversion for the extracted speeches of the voice based on at least one of a closed caption, a subtitle, a script, a transcript and a lyric of the media content (Proidl et al (20080195386), as performing a conversion to closed caption/subtitles – para 0011, 0013); and/or performing text-to-text translation for the converted text from the first language to the second language based on at least one of the characteristics of the speeches (Proidl et al (20080195386), as performing the translation from English to German – para 0031), a genre Proidl et al (20080195386), as knowing the actual script/content so as to preserve the original actor’s voice characteristics – para 0038, including stress/intonation – para 0012; examiner notes that the intent in Proidl is to generate an actor’s speech in a different language, such that the particular intonations-stresses in the speech, such as appropriate emotion as well, is accurately conveyed in the output speech); and generating the translated replacement speeches for the voice by performing text-to-speech conversion for the translated text based on the voice print model and the characteristics of the extracted speeches (Proidl et al (20080195386), as synthesizing the new language speech/voice – para 0033, 0038). 

As per claim 9,  the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 1, wherein the extracting speeches comprises: detecting the speeches from the audio portion of the media content based on a plurality of audio versions in different languages(Proidl et al (20080195386), as capturing the actor’s speech and using samples of the extracted speech from a multimedia signal for further training – para 0042; and generating the voice in a different language – para 0031); or detecting the speeches from the audio portion of the media content based on a plurality of audio channels and positional data obtained from the audio portion; or detecting the speeches from the audio portion of the media content based on predefined speaker locations and a virtual microphone array (Examiner notes that the claim elements of ‘or choosing…or creating’ are in the alternative, and that the Proidl et al (20080195386) references meets the claim scope of claim 2 by the current mapping of the first subset of claim elements in claim 2, as shown above). 

Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 1, wherein the extracting speeches comprises: grouping the speeches to be associated with the voice based on at least one of: voice characteristic of the speeches, audio positional data, detection of visual scene transition, visual recognition of speaker, subtitles, and closed captions content (Proidl et al (20080195386), as the voice print model is based on a movie script – para 0011, which can include subtitles, or close captioning – para 0013). 

As per claim 11, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 1, wherein the replacing comprises: muting the speeches of the voice from the audio portion; and adding the replacement speeches in place of the muted speeches in the audio portion (Proidl et al (20080195386), as replacing the original speech with the new speech, to match the scene – para 0038). 

As per claim 12, the combination of Proidl et al (20080195386) in view of Liu (20090037179) teaches the method of claim 1, wherein the muting comprises: muting the speeches of the voice by utilizing the extracted speeches from the audio portion; or muting the speeches of the voice by utilizing a plurality of audio channels obtained from the audio portion based on positional data; or regenerating speeches for the voice based on the voice print model of the voice and positional data, and muting the speeches based on the regenerated speeches (Proidl et al (20080195386), as replacing the original speech with the new speech, to match the scene – para 0038; examiner notes that in the examples of Proidl, namely, replacing actor’s speech/voice 

Claims 13-20 are apparatus claims that perform the method steps of claims 1-7,9-12 and as such, claims 13-20 are similar in scope and content to claims 1-7,9-12; therefore, claims 13-20 are rejected under similar rationale as presented against claims 1-7,9-12 above.  


Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Proidl et al (20080195386) in view of Liu (20090037179) in further view of DelGaldo (20130142341) .

As per claim 8, Proidl et al (20080195386) in view of Liu (20090037179) teaches the handling of voices/sound from a multimedia stream (ie, the example in a movie format, separating the voices, as noted above); however, does not explicitly teach using virtual microphone arrays to perform sound source separation; DelGaldo (20130142341) teaches using a virtual microphone array (para 0148) to assist in the source separation from a complex sound scene (para 0163).  Therefore, it would have been obvious to one of ordinary skill in the art of voice extraction to modify the processing of Proidl in view of Liu with virtual microphone arrays performing sound separation, as taught by DelGaldo (20130142341), because it would advantageously modify the amplitude/frequency of the desired source, so as to extract a more accurate representation of the desire sound source (para 0148).   
Response to Arguments

Applicant's arguments filed 1/20/2021 have been fully considered but they are not persuasive.  As per applicants arguments towards the use of phoneme replacement to transition from a source language/voice to a target language/voice, examiner notes the use of the Liu reference to meet this claimed concept.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form, for complete reference information.
Examiner notes the following references teaching the generic notion in the disclosure, of taking multimedia streams (including audio/voice/speech, and more), and then using the voice characteristics to translation the audio/voice/speech to a second language:
20180247624, para 0041, 0064, 0098
20140040946, para 0042, 0095, 0103
9094576, col. 9 lines 40-65

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                                        04/18/2021