DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to correspondence filed 11 January 2020 in reference to application 16/740,440.  Claims 1-7 are pending and have been examined.

Claim Objections
Claims 4 and 5 objected to because of the following informalities:  Claims 4 and 5 recite “of any of claims 1” but should ready “of claim 1.”  Appropriate correction is required.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-3 and 5 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Takashi et al. (US PAP 2021/0225383).

Consider claim 1, Takashi teaches A voice morphing apparatus (abstract) comprising: 
a neural network architecture to map input audio data to output audio data (0097, neural network voice quality converter), the input audio data comprising a representation of speech from a speaker (0048, source input speech, 0156, input speaker), the neural network architecture including a set of parameters (0097, neural network, which are known to have parameters such as weights, biases, activation functions etc.), the set of parameters being trained to reduce a speaker identification score from the input audio data to the output audio data and to optimize a speaker intelligibility score for the output audio data (0099-121, training the neural network to minimize loss functions, which includes speaker identification error score  at 0104-05 and phoneme scores measuring intelligibility at 0114-16).

Consider claim 2, Takashi teaches the voice morphing apparatus of claim 1 further comprising a noise filter to pre-process the input audio data (0156, signal separation separating input speech from other voices and acoustic sources.).

Consider claim 3, Takashi teaches the voice morphing apparatus of claim 2, wherein the noise filter removes a noise component from the input audio data (0156, signal separation separating input speech from other voices and acoustic sources.) and the voice morphing apparatus adds the noise component to output audio data from the neural network architecture (0160, adding noise component back to transformed voice signal).

Consider claim 5, Takashi teaches the voice morphing apparatus of any of claims 1, wherein the voice morphing apparatus is configured to output time-series audio waveform data based on the output audio data from the neural network architecture (0159-161, audio output acoustic data generated based on neural network.  Acoustic audio data which can be played out must be converted to time domain audio.).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Takashi in view of Kameoka et al. (US PAP 2020/0395028).

Consider claim 4, Takashi teaches the voice morphing apparatus of any of claim 1, but does not specifically teach wherein the neural network architecture comprises one or more recurrent connections.
In the same field of voice transformation, Kameoka teaches wherein the neural network architecture comprises one or more recurrent connections (0044, 0061, neural network may be a Recurrent Neural Network).
.  


Claim 6 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Takashi in view of Nakashika et al. (US PAP 2019/0051314).

Consider claim 6, Takashi teaches a non-transitory computer-readable storage medium for storing instructions that, when executed by at least one processor (0236, RAM, ROM and processor), cause the at least one processor to: 
load input audio data from a data source (0046, 0052, inputting or generating training data); 
input the input audio data to a voice morphing apparatus, the voice morphing apparatus including a set of trainable parameters (0097, 0102-05, neural network voice conversion, generating output voice based on training data); 
process the input audio data using the voice morphing apparatus to generate morphed audio data (0097, 0102-05, neural network voice conversion, generating output voice based on training data);); 
apply a speaker identification system to at least the morphed audio data to output a measure of speaker identification (speaker identification error score  at 0104-05); 

evaluate an objective function based on the measure of speaker identification and the measure of audio fidelity (0100, loss function); and 
adjust the set of trainable parameters for the voice morphing apparatus based on the objective function (0099-100 training based on the loss function), 
wherein the objective function is configured to adjust the set of trainable parameters to optimize the measure of audio fidelity between the morphed audio data and the input audio data and to modify the measure of speaker identification (0099-121, training the neural network to minimize loss functions, which includes speaker identification error score at 0104-05 and phoneme scores measuring intelligibility at 0114-16).
Takashi does not specifically teach training using based on a gradient of the objective function.
IN the same field of neural network voice transformation, Nakashika teaches training using based on a gradient of the objective function (0061, training using gradient method).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use gradient training as taught by Nakashika in the system of Takashi in order to allow for the loss function to minimized and increase the accuracy of the neural network.

Consider claim 7, Takashi teaches a method for optimizing training parameters (abstract), the method comprising:: 
loading input audio data from a data source (0046, 0052, inputting or generating training data); 
inputting the input audio data to a voice morphing apparatus, the voice morphing apparatus including a set of trainable parameters (0097, 0102-05, neural network voice conversion, generating output voice based on training data); 
processing the input audio data using the voice morphing apparatus to generate morphed audio data (0097, 0102-05, neural network voice conversion, generating output voice based on training data);); 
applying a speaker identification system to at least the morphed audio data to output a measure of speaker identification (speaker identification error score  at 0104-05); 
applying an audio fidelity system to the morphed audio data and the input audio data to output a measure of audio fidelity (phoneme scores measuring intelligibility at 0114-16); 
evaluating an objective function based on the measure of speaker identification and the measure of audio fidelity (0100, loss function); and 
adjusting the set of trainable parameters for the voice morphing apparatus based on the objective function (0099-100 training based on the loss function), 
wherein the objective function is configured to adjust the set of trainable parameters to optimize the measure of audio fidelity between the morphed audio data and the input audio data and to modify the measure of speaker identification (0099-121, 
Takashi does not specifically teach training using based on a gradient of the objective function.
IN the same field of neural network voice transformation, Nakashika teaches training using based on a gradient of the objective function (0061, training using gradient method).
Therefore it would have been obvious to one of ordinary skill in the art at the time of effective filing to use gradient training as taught by Nakashika in the system of Takashi in order to allow for the loss function to minimized and increase the accuracy of the neural network.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Huffman et al. (US PAP 2018/0342256 also teaches Neural Network based voice conversion).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DOUGLAS C GODBOLD whose telephone number is (571)270-1451. The examiner can normally be reached 6:30am-5pm Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DOUGLAS GODBOLD
Examiner
Art Unit 2655



/DOUGLAS GODBOLD/           Primary Examiner, Art Unit 2655