DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 30 December 2020 in reference to application 17/138,642.  Claims 1-20 are pending and have been examined.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 10, line 31 refers to a third voice candidate. However a third voice candidate is generated at both lines 13 and 27.  Thus it is unclear to which third voice candidate line 31 refers to.  Therefore claim 10 is indefinite.

Claims 11-12 depend on and further limit claim 10 and are rejected based on their dependency. In addition, claim 11 refers to the third voice candidate as well, and thus it is unclear which voice candidate is being referenced.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2, 6, and 10-12 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sisman et al (On the Study of Generative Adversarial Networks for Cross-Lingual Voice Conversion).

Consider claim 1, Sisman teaches A method of cross-lingual voice conversion performed by a machine learning system (abstract), the method comprising: 
receiving, by a voice feature extractor, a first voice audio segment in a first language and a second voice audio segment in a second language (figure 1, cross lingual training, section 2.2 para. 3, training the encoder-decoder using speech from both languages as input, section 3.2, training with cross lingual data); 
extracting, by the voice feature extractor respectively from the first voice audio segment and second voice audio segment, audio features comprising first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features (section 2.2, section 3.2, speaker independent phonetic features in target language,  speaker identity features from target voice); 
generating, via a generator of a generative adversarial network (GAN) system from a trained data set, a third voice candidate having the first-voice, speaker-dependent acoustic features and the second-voice, speaker-independent linguistic features, wherein the third voice candidate speaks the second language (section 2.2, section 3.2, transforming context with VAW-GAN or CycleGAN, generating target speech); 
comparing, via one or more discriminators of the GAN system, the third voice candidate with ground truth data comprising the first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features (section 2.2, section 3.2.1- 3.2.3, loss functions comparing generated speech to target voice data); and 
providing results of the comparing step back to the generator for refining the third voice candidate (section 2.2, section 3.2, training the GAN by minimizing the loss functions).

Consider claim 2, Sisman teaches the method of claim 1, wherein the speaker-dependent acoustic features include short-term segmental features related to vocal tract characteristics (section 2.2, 3.2, speaker identity vector and characteristics), and the speaker- independent linguistic features comprise supra-segmental features related to acoustic properties over more than one segment (section 2.2, speaker independent phonetic features, section 3.2, linguistic content).

Consider claim 6, Sisman teaches the method of claim 1, wherein the GAN system is a Variational Autoencoding Wasserstein GAN (VAW-GAN) system (section 2) or a Cycle-Consistent GAN (CycleGAN) system (section 3).

Consider claim 10, Sisman teaches a method of training a cycle-consistent generative adversarial network (CycleGAN) system (abstract, Section 3) comprising: 
simultaneously learning a forward mapping function and an inverse mapping function using at least adversarial loss and cycle-consistency loss functions (figure 3, forward and inverse training), the forward mapping function comprising: 
receiving, by a voice feature extractor, a first voice audio segment in a first language (section 3, Figure 3, bilingual training data, source original features); 
extracting, by the voice feature extractor, first-voice, speaker-dependent acoustic features (section 3, Figure 3, bilingual training data, source original features, section feature extraction); 
sending the first-voice, speaker-dependent acoustic features to a first-to-third speaker generator of the CycleGAN system (section 3.1, figure 3, source converted feature generation); 
receiving, by the first-to-third speaker generator, second-voice, speaker-independent linguistic features from the inverse mapping function (section 3.2.2, inverse mapping functions, inform loss function); 
generating, by the first-to-third speaker generator, a third voice candidate using the first-voice, speaker-dependent acoustic features and the second-voice, speaker-independent linguistic features (source converted features, section 3.2, figure 3); and 
determining, by a first discriminator of the CycleGAN system, whether there is a discrepancy between the third voice candidate and the first-voice, speaker-dependent acoustic features (Figure 3, section 3.2, discriminator Dy, and loss functions); and the inverse mapping function comprising: 
receiving, by the feature extractor, a second voice audio segment in a second language (figure 3, section 3.2, target original features, bilingual training data); 
extracting, by the feature extractor, the second-voice, speaker-independent linguistic features (section 3, Figure 3, bilingual training data, source original features, section feature extraction); 
sending the second-voice, speaker-independent linguistic features to a second-to-third voice candidate generator (figure 3, section 3.2, converting source features, generator); 
1313-P49US2 (BOEHCA-0073252)-29-receiving, by the second-to-third voice candidate generator, first-voice, speaker- dependent acoustic features from the forward mapping function (section 3.2.2, error functions include forward mapping function data); 
generating, by the second-to-third voice candidate generator, a third voice candidate using the second-voice, speaker-independent linguistic features and first-voice, speaker- dependent acoustic features (section 3.2, figure 3, generating target converted features); and 
determining, by a second discriminator, whether there is a discrepancy between the third voice candidate and the second-voice, speaker-independent linguistic features (Figure 3, section 3.2, discriminator Dx, loss function.).

Consider claim 11, Sisman teaches the method of claim 10, wherein the forward mapping function, when the first discriminator determines that the third voice candidate and the first-voice, speaker-dependent acoustic features are not consistent, triggers the method to continue by: 
providing first inconsistency information back to the first-to-third voice candidate generator for refining the third voice candidate (section 3.2, loss functions); 
sending the third voice candidate to a third-to-first speaker generator (section 3, figure 3, generator Gy-x); 
generating converted first-voice, speaker-dependent acoustic features (sores converted features ); and 
sending back the converted first-voice, speaker-dependent acoustic features to the first- to-third voice candidate generator (section 3, figure 3, generator Gx-Y); and 
wherein the inverse mapping function, when the second discriminator determines that the third voice candidate and the second-voice, speaker-independent linguistic features are not consistent, triggers the method to continue by: 
providing second inconsistency information back to the second-to-third voice candidate generator for refining the third voice candidate (section 3.2, loss functions); 
sending the third voice candidate to a third-to-second speaker generator (section 3, figure 3, generator Gx-y, right side); 
generating converted second-voice, speaker-independent linguistic features (target converted features, section 3, figure 3); and 
sending back the converted second-voice, speaker-independent linguistic features to the second-to-third voice candidate generator (Section 3, figure 3, generator Gy-x, right side).

Consider claim 12, Sisman teaches the method of claim 10, further comprising employing identity mapping loss for preserving identity-related features of each of the first and second voice audio segments. (section 3.2.3 identity mapping loss).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman in view of Zhou et al. (US PAP 2020/0410976).

Consider claim 7, Sisman teaches the method of claim 1, but does not specifically teach wherein the first voice is an original actor voice speaking the first language, and wherein the second voice is a voice actor speaking the second language.
In the same field of voice conversion, Zhou teaches wherein the first voice is an original actor voice speaking the first language, and wherein the second voice is a voice actor speaking the second language (0003, making speech of actor B in one language sound like voice of actor A from a different language, movie dubbing).
	Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform voice conversion in movies in the system if Sisman in order to allow for the improvements of Sisman to be implemented in a useful real world application, providing improved automatically generated dubbed movies.

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman and Zhou as applied to claim 7 above, and further in view of Ko (US PAP 2002/0009043).

Consider claim 8, Sisman and Zhou teach The method of claim 7 but does not specifically teach being implemented during a movie voice translation enabling the selection of an original version, a dubbed version with the original actor voice, or a dubbed version with the voice actor voice.
In the same field of language support, Ko teaches being implemented during a movie voice translation enabling the selection of an original version, a dubbed version with the original actor voice, or a dubbed version with the voice actor voice (0004, DVD allows selection of multiple language tracks. In combination with Sisman and Zhou, this could include original tracks in different languages and generated tracks).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to allow track selection as taught by Ko in the system of Sisman and Zhou in order to allow the movie to be appreciated in different languages (Ko 0004).

Claim(s) 13, 14, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman in view of Agiomyrgiannakis (US PAP 2015/0127349).

Consider claim 13, Sisman teaches a system of cross-lingual voice conversion performed by a machine learning system (abstract), the system comprising: 
a voice feature extractor configured to:
receive a first voice audio segment in a first language and a second voice audio segment in a second language (figure 1, cross lingual training, section 2.2 para. 3, training the encoder-decoder using speech from both languages as input, section 3.2, training with cross lingual data); 
extract, respectively from the first voice audio segment and second voice audio segment, audio features comprising first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features (section 2.2, section 3.2, speaker independent phonetic features in target language,  speaker identity features from target voice); and 
a generative adversarial network (GAN) comprising one or more generators and one or more discriminators (figures 2 and 3), the one or more generators configured to:
receive extracted features, and produce therefrom a third voice candidate having the first-voice, speaker- dependent acoustic features and the second-voice, speaker-independent linguistic features, wherein the third voice candidate speaks the second language; (section 2.2, section 3.2, transforming context with VAW-GAN or CycleGAN, generating target speech); 
and the one or more discriminators configured to:
compare the third voice candidate with ground truth data comprising the first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features (section 2.2, section 3.2.1- 3.2.3, loss functions comparing generated speech to target voice data); and 
provide results of the comparing step back to the generator for refining the third voice candidate (section 2.2, section 3.2, training the GAN by minimizing the loss functions).
Sisman does not specifically teach implementing the system as stored in memory of a server computer system and being implemented by at least one processor.
In the same field of cross-lingual voice conversion, Agiomyrgiannakis teaches implementing the system as stored in memory (0076 memory) of a server computer system (0076, server computing system) and being implemented by at least one processor (0083, CPUs).
	Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use server components as taught by Agiomyrgiannakis in the system of Sisman in order to use well known and widely available computing components to implement the voice conversion system of Sisman.

Consider claim 14, Sisman teaches the system of claim 13, wherein the speaker-dependent acoustic features include short-term segmental features related to vocal tract characteristics (section 2.2, 3.2, speaker identity vector and characteristics), and the speaker- independent linguistic features comprise supra-segmental features related to acoustic properties over more than one segment (section 2.2, speaker independent phonetic features, section 3.2, linguistic content).

Consider claim 16, Sisman teaches the system of claim 13, wherein the GAN system is a Variational Autoencoding Wasserstein GAN (VAW-GAN) system (section 2) or a Cycle-Consistent GAN (CycleGAN) system (section 3).

Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman and Agiomyrgiannakis as applied to claim 13 above, and further in view of GABRYJELSKI (US PAP 2002/0009043) .

Consider claim 17, Sisman and Agiomyrgiannakis teaches the system of claim 13, but does not specifically teach a database connected to the machine learning system and configured to store selected one or more third voices and comprising a plurality of different trained third voices.
In the same field of voice conversion, GABRYJELSKI teaches a database connected to the machine learning system and configured to store selected one or more third voices and comprising a plurality of different trained third voices (0026, database of different voice models may be stored).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to store different generated models as taught by GABRYJELSKI in the system of Sisman and Agiomyrgiannakis in order to allow a user to select different target voices.

Claim(s) 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman and Agiomyrgiannakis and further in view of Zhou et al. (US PAP 2020/0410976).

Consider claim 18, Sisman and Agiomyrgiannakis teaches the system of claim 13, but does not specifically teach wherein the first voice is an original actor voice speaking the first language, and wherein the second voice is a voice actor speaking the second language.
In the same field of voice conversion, Zhou teaches wherein the first voice is an original actor voice speaking the first language, and wherein the second voice is a voice actor speaking the second language (0003, making speech of actor B in one language sound like voice of actor A from a different language, movie dubbing).
	Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to perform voice conversion in movies in the system if Sisman in order to allow for the improvements of Sisman to be implemented in a useful real world application, providing improved automatically generated dubbed movies.

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sisman and Agiomyrgiannakis as applied to claim 13 above, and further in view of Ko (US PAP 2002/0009043).

Consider claim 19, Sisman and Agiomyrgiannakis teach The system of claim 13 but does not specifically teach being implemented during a movie voice translation enabling the selection of an original version, a dubbed version with the original actor voice, or a dubbed version with the voice actor voice.
In the same field of language support, Ko teaches being implemented during a movie voice translation enabling the selection of an original version, a dubbed version with the original actor voice, or a dubbed version with the voice actor voice (0004, DVD allows selection of multiple language tracks. In combination with Sisman and Zhou, this could include original tracks in different languages and generated tracks).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to allow track selection as taught by Ko in the system of Sisman and Agiomyrgiannakis in order to allow the movie to be appreciated in different languages (Ko 0004).

Allowable Subject Matter
Claims 3-5, 9, 15, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Consider claim 3, the prior art of record does not teach or suggest the limitations of “generating a plurality of third voice candidates, each third voice candidate comprising a different level of first-voice, speaker- dependent acoustic features and second-voice, speaker-independent linguistic features” when combined with each and every other limitation of the claim and the base claim.  Therefore claim 3 contains allowable subject matter.

Claims 4-5 depend on and further limit claim 3 and therefore contain allowable subject matter as well.

Consider claim 9, the prior art of record does not teach or fairly suggest the limitations of “generating a plurality of third voice candidates, each third voice candidate comprising a different level of first-voice, speaker-dependent acoustic features and second-voice, speaker- independent linguistic features; using the plurality of generated third voice candidates in the generation of a plurality of dubbed version audio files comprising different levels of the first-voice, speaker-dependent acoustic features and second-voice, speaker-independent linguistic features” when combined with each and every other limitation of the claim, the base claim, and intervening claims. Therefore claim 9 contains allowable subject matter.

Claim 15 contains similar limitations as claim 3 and therefore contains allowable subject matter as well.

Claim 20 contains similar limitations as claim 9 and therefore contains allowable subject matter as well. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Huffman et al. (US PAP 2018/0342256) teaches using neural networks for voice conversion.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/
Primary Examiner, Art Unit 2655