DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The office action sent in response to Applicant’s communication received on 3/31/2020 for the application number 16835434. The office hereby acknowledges receipt of the following placed of record in the file: Specification, Abstract, Oath/Declaration and claims. 

Claims 1-20 are presented for examination. 

Information Disclosure Statement
The information disclosure submitted on 3/31/2020 was filed before the mailing data of the first office action. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 5-7, 9, 12-14 and 16-17  and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Huffman ( US Pub 20180342256) 




Regarding claim 1, Huffman teaches method comprising:  5receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( transform speech, Para 0035, Fig 1) , wherein the first audio having embedded therein first information characterizing the first source of the audio ( frequency component in the first speech, Para 0035-0040, wherein frequency component can be a watermark, Para 0131, 0011) ; extracting from the first audio the first information of the first source embedded within the first audio ( extract frequency component, Para 0035-0040, Fig 8-11) ;  10obtaining second information characterizing a third source ( target speech, Fig 1) ; comparing the first information to the second information to obtain comparison results ( detect differences in speech, Fig 9) ; and subject to the comparison results indicating that the first source is the same as the third source, initiating an action ( inconsistency message, Fig 9-Fig 11 ( Fig 11 initiate action for verification case) ) 
Regarding claim 2, Huffman as above in claim 1, teaches , wherein the first information or the second information is a vector representing a voice in a speakers' space ( vector space, Para 0111-0114, 0035) 

Regarding claim 5, Huffman as above in claim 1, teaches wherein the first information is embedded within the first audio as a watermark ( watermark, Para 0131) 

Regarding claim 6, Huffman as above in claim 1, modifying speech by the first source such that the first audio sounds as if 25emitted by the second source( transformation, Fig 1, Fig 8-11); obtaining the first information characterizing the first source from speech by the first source; and embedding the first information in the first audio( embedding frequency component, Para 0035-0040, Para 0131)

Regarding claim 7, Huffman teaches a method comprising:  GA Ref: 200-26516 IBM Ref.: P201904644US01receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( voice to voice conversion, Para 0036-0040) , wherein the first audio having embedded therein first information characterizing the first source of the audio ( frequency component of the voices, Para 0035-0040)  ; extracting from the first audio the first information of the first source based 5on the information embedded within the first audio ( extracting the frequency component, Para 0035-0040) ; and synthesizing, based on the first information, a second audio comprising speech in the likeness of the first source ( synthesize and manipulate to get the second audio, Para 0035; Fig 8-11) 


Regarding claim 9, Huffman as above in claim 7, teaches 10wherein the first information is a vector representing a voice in a speakers' space ( vector space, Para 0111-0114, 0035) 

Regarding claim 12, Huffman as above in claim 7, teaches 1swherein the first information is embedded within the first audio as a watermark ( watermark, Para 0131)  

Regarding claim 13, Huffman as above in claim 7, teaches  modifying speech by the first source such that the first audio sounds as if emitted by the second source ( transformation, Fig 1, Fig 8-11) ;  20extracting information of the first source from speech by the first source; and embedding the information of the first source within the first audio ( embedding frequency component, Para 0035-0040, Para 0131) 

Regarding claim 14, arguments analogous to claim 1, are applicable. In addition a computer program product comprising: a non-transitory computer readable medium retaining program instructions, which instructions when read by a processor, cause the processor to perform of claim 1 (computer readable medium Para 0012) 


Regarding claim 16, Huffman as above in claim 14, teaches, wherein the processor is further 10configured to perform: modifying speech by the first source such that the first audio sounds as if emitted by the second source( transformation, Fig 1, Fig 8-11); obtaining the first information characterizing the first source from speech by the first source; and  1sembedding the first information in the first audio ( embedding frequency component, Para 0035-0040, Para 0131)

Regarding claim 17, Huffman as above in claim 14, teaches , wherein the processor is further configured to perform: receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( transformation, Fig 1) , wherein the first audio having embedded therein 20first information characterizing the first source of the audio ( frequency component, Para 0035-0040; wherein frequency component can be a watermark, Para 0131, 0011)  extracting from the first audio the first information of the first source based on the information embedded within the first audio ( extracting the frequency component, Para 0035-0040) ; and synthesizing, based on the first information, a second audio comprising speech in the likeness of the first source ( synthesize and manipulate to get the second audio, Para 0035; Fig 8-11) 

Regarding claim 20, Huffman teaches a system comprising a unit retaining the non-transitory computer readable medium of 30Claim 14 and the processor ( fig 1, fig 2, fig 9) 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 3, 10 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data) 

Regarding claim 3, Huffman as above in claim 2, does not explicitly teaches wherein the first information or the second information is an x-vector or an i-vector
However Jie teaches wherein the first information or the second information is an x-vector or an i-vector ( i-vectors for voice conversion, speaker identity is represented by i-vector, Under III. proposed: avm with augmented i-vectors , B. I-vector Extraction; IV. experimental setup , A. I-vector Extractor) 
It would have been obvious having the teachings of Huffman to further include the concept of i-vector of Jie before effective filing date to make the voice conversion more convenient using the low dimensional model such as i-vectors  ( Conclusion, Jie) 

Regarding claim 10, Huffman as above in claim 9, does not explicitly teaches  wherein the first information is an x-vector or an i-vector
However Jie teaches wherein the first information is an x-vector or an i-vector( i-vectors for voice conversion, speaker identity is represented by i-vector, Under III. proposed: avm with augmented i-vectors , B. I-vector Extraction; IV. experimental setup , A. I-vector Extractor) 
It would have been obvious having the teachings of Huffman to further include the concept of i-vector of Jie before effective filing date to make the voice conversion more convenient using the low dimensional model such as i-vectors  ( Conclusion, Jie) 

Regarding claim 15, arguments analogous to claim 3, are applicable. 

Claims 4, 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Huffman (US Pub: 20210050025) herein after Huffman’025

Regarding claim 4, Huffman as above in claim 1, does not explicitly teaches  wherein the first information is embedded within the first 20audio using steganography
However Huffman’025 teaches wherein the first information is embedded within the first 20audio using steganography (the system may perform steganography in the spectrogram space as opposed to the audio domain; the system may use a generative-adversarial neural network to help make the watermark ‘hidden’—as opposed to training the ‘watermarked’ signal to look like an ‘unwatermarked’ signal—the adversary is training the ‘watermarked’ signal to look like a signal coming from a target speaker while the Watermark machine learning trains the signal to contain the watermark; the ‘watermarked’ signal looks like a signal from the unwatermarked dataset, but also is in the voice of the target speaker, Para 0125/ Page 9- provisional ) 
It would have been obvious having the teachings of Huffman to further include the concept of Huffman’025 before effective filing date since audio signal is in the spectrogram domain and processing can be done in space domain as another way of hiding information 


Regarding claim 11, Huffman as above in claim 7, does not teach wherein the first information is embedded within the first audio using steganography 
However Huffman’025 teaches wherein the first information is embedded within the first audio using steganography the system may perform steganography in the spectrogram space as opposed to the audio domain; the system may use a generative-adversarial neural network to help make the watermark ‘hidden’—as opposed to training the ‘watermarked’ signal to look like an ‘unwatermarked’ signal—the adversary is training the ‘watermarked’ signal to look like a signal coming from a target speaker while the Watermark machine learning trains the signal to contain the watermark; the ‘watermarked’ signal looks like a signal from the unwatermarked dataset, but also is in the voice of the target speaker, Para 0125/ Page 9- provisional ) 
It would have been obvious having the teachings of Huffman to further include the concept of Huffman’025 before effective filing date since audio signal is in the spectrogram domain and processing can be done in space domain as another way of hiding information 


Regarding claim 19, arguments analogous to claim 14, are applicable. 

Claims 8 and 18  are rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Arik ( Neural Voice Cloning with a Few Samples) 
Regarding claim 8, Huffman as above in claim 7, teaches ( could be a text representation, Para 0116) but does not explicitly teaches  wherein said synthesizing comprising applying text-to-speech to text spoken in the first audio
However Arik teaches  wherein said synthesizing comprising applying text-to-speech to text spoken in the first audio ( voice cloning/synthesis using text to speech fig 1) 
It would have been obvious having the teachings of Huffman to further include the concept of Arik before effective filing date since the model gives optimal results using few text-audio pairs ( under 3, speaker adaptation) 

Regarding claim 18, Huffman as above in claim 17, does not explicitly teaches wherein said synthesizing comprises applying text-to-speech to text spoken in the first audio.  
However Arik teaches wherein said synthesizing comprises applying text-to-speech to text spoken in the first audio( voice cloning/synthesis using text to speech fig 1) 
It would have been obvious having the teachings of Huffman to further include the concept of Arik before effective filing date since the model gives optimal results using few text-audio pairs ( under 3, speaker adaptation) 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHA MISHRA whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHA MISHRA/Primary Examiner, Art Unit 2674