DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Claims 5, 7, 12 and 17 are amended. Claim 1-20 are presented for examination. 
Response to Arguments
Applicant arguments filed on 10/23/2022 have been reviewed. Following are the response to applicant’s arguments: 
CLAIM REJECTIONS - 35 USC § 102 
Applicant argues “that Huffman fails to teach information characterizing the first source of audio in the converted audio” However Huffman teaches in Fig 1 that information characterizing the first source which is basically the first audio is transferred ( para 0039, Fig 8-11) 

Applicant further argues “Applicant respectfully submits that the extracted "frequency components" in Huffman cannot be contended or relied on as recited first information, at least since Huffman fails to teach embedding the frequency components, which characterize a first source of an audio, in the first audio, wherein the first audio is a conversion of the audio by the first source to a second source” however it is clearly stated that frequency component is the watermark ( Para 0131) 
Applicant further argues “Secondly, Applicant respectfully submits that the low frequency sounds of paragraph [0131] of Huffman cannot be interpreted as the recited first information, at least since the low frequency sounds of Huffman do not characterize the first source of the audio. Instead, the low frequency sounds of Huffman are merely used to indicate whether a sound sample is authentic or not. Additionally, in contrast to Claim 1, the low frequency sounds of Huffman are never compared to second information characterizing a third source” However Fig 1, Para 0035-0040 describes the concept of first information and the source Para 0131 is states that that information is watermarked for protection 

Applicant submits “that the contention in the Office Action that the "target speech" is allegedly the second information characterizing the third source, is not only erroneous, but also misleading. The term the "target speech" in Huffman refers to the transformed speech, e.g., the speech segment after being converted to be spoken by the target voice, which is in fact equivalent to the converted audio (the audio converted to the second source). (See, inter alia, paragraphs [0005] and [0046]). Such "target speech" does not comprise any information characterizing a source of a voice, as explicitly claimed.” However that’s just simply incorrect refer to Fig 1, target speech source speech is the speech which is being transformed. 
Applicant further submits “Huffman would still fail to teach comparing such information with any other information characterizing other source of voice to determine that it is the same source of voice. The alleged "differences in speech" detected in Fig. 9 of Huffman, are at best, differences between a candidate speech segment (i.e., speech that is supposed to imitate the target, but that is not authentic speech from the target) and timbre data with reference to plurality of voice profiles (see, inter alia, paragraph [0096] and Fig. 9), that is indicative of the speech not being authentic e.g., being synthetic speech” However the claimed invention does not differ from the cites portion of the prior art. Refer to The generative neural network 140 uses the augmented voice profile 144 to generate speech data that represents a candidate speech segment 146 (i.e., speech that is supposed to imitate the target 104, but that is not authentic speech from the target 104). The generated candidate speech segment 146 can be said to be in a candidate voice. The speech data that represents the candidate speech segment 146 is evaluated by the discriminative neural network 142, which determines whether it believes the speech data that represents the candidate voice in the candidate speech segment 146 is authentic or synthetic speech ( fig 9) Ref


Current Claim 7
In light of amendments new ground(s) of rejection been given over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data) 
With respect to Claim 5:
In light of amendments new ground(s) of rejection been given over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data) 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 7-14 and 17-18 are  rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 7 and 14 includes “whereby enabling performing said synthesizing without  relying on the audio of the first source” however specification does not mention this concept. Applicant relied on Para 0019. Para 0019 suggest embedding characteristics of actual audio source and watermarking the characteristics however its unclear how its suggesting whereby enabling performing said synthesizing without  relying on the audio of the first source

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2  6, 14, 16 and 20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Huffman ( US Pub 20180342256) 

Regarding claim 1, Huffman teaches method comprising:  5receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( transform speech, Para 0035, Fig 1) , wherein the first audio having embedded therein first information characterizing the first source of the audio ( frequency component in the first speech, Para 0035-0040, wherein frequency component can be a watermark, Para 0131, 0011) ; extracting from the first audio the first information of the first source embedded within the first audio ( extract frequency component, Para 0035-0040, Fig 8-11) ;  10obtaining second information characterizing a third source ( target speech, Fig 1) ; comparing the first information to the second information to obtain comparison results ( detect differences in speech, Fig 9) ; and subject to the comparison results indicating that the first source is the same as the third source, initiating an action ( inconsistency message, Fig 9-Fig 11 ( Fig 11 initiate action for verification case) ) 

Regarding claim 2, Huffman as above in claim 1, teaches , wherein the first information or the second information is a vector representing a voice in a speakers' space ( vector space, Para 0111-0114, 0035) 

Regarding claim 6, Huffman as above in claim 1, modifying speech by the first source such that the first audio sounds as if 25emitted by the second source( transformation, Fig 1, Fig 8-11); obtaining the first information characterizing the first source from speech by the first source; and embedding the first information in the first audio( embedding frequency component, Para 0035-0040, Para 0131)


Regarding claim 14, arguments analogous to claim 1, are applicable. In addition a computer program product comprising: a non-transitory computer readable medium retaining program instructions, which instructions when read by a processor, cause the processor to perform of claim 1 (computer readable medium Para 0012) 


Regarding claim 16, Huffman as above in claim 14, teaches, wherein the processor is further 10configured to perform: modifying speech by the first source such that the first audio sounds as if emitted by the second source( transformation, Fig 1, Fig 8-11); obtaining the first information characterizing the first source from speech by the first source; and  1sembedding the first information in the first audio ( embedding frequency component, Para 0035-0040, Para 0131)


Regarding claim 20, Huffman teaches a system comprising a unit retaining the non-transitory computer readable medium of 30Claim 14 and the processor ( fig 1, fig 2, fig 9) 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 3, 5, 7, 9-10, 12-13, 15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data) 

Regarding claim 3, Huffman as above in claim 2, does not explicitly teaches wherein the first information or the second information is an x-vector or an i-vector
However Jie teaches wherein the first information or the second information is an x-vector or an i-vector ( i-vectors for voice conversion, speaker identity is represented by i-vector, Under III. proposed: avm with augmented i-vectors , B. I-vector Extraction; IV. experimental setup , A. I-vector Extractor) 
It would have been obvious having the teachings of Huffman to further include the concept of i-vector of Jie before effective filing date to make the voice conversion more convenient using the low dimensional model such as i-vectors  ( Conclusion, Jie) 

Regarding claim 5, Huffman as above in claim 1, teaches wherein the first information is embedded within the first audio as a watermark ( watermark, Para 0131) while Huffman does not explicitly teach  wherein the watermark is indicative of characteristic of the first audio source
Jie teaches wherein the watermark is indicative of characteristic of the first audio source ( embedding using the i-vector, sing i-vectors can capture the speaker identity and achieve high quality at the same time, Under C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 
It would have been obvious having the teachings of Huffman to further include the teachings of Jie before effective filing date to capture the speaker identity and achieve high quality at the same time, Under C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 

Regarding claim 7, Huffman teaches a method comprising:  GA Ref: 200-26516 IBM Ref.: P201904644US01receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( voice to voice conversion, Para 0036-0040) , wherein the first audio having embedded therein first information characterizing the first source of the audio ( frequency component of the voices, Para 0035-0040)  ; extracting from the first audio the first information of the first source based 5on the information embedded within the first audio ( extracting the frequency component, Para 0035-0040) ; and synthesizing, based on the first information, a second audio comprising speech in the likeness of the first source ( synthesize and manipulate to get the second audio, Para 0035; Fig 8-11) whereby enabling performing said synthesizing without necessarily  relying on the audio of the first source ( fig 1, the voice is based on number of factor and not necessarily relying only on the audio of the first source) 
Huffman does not explicitly teaches whereby enabling performing said synthesizing without  relying on the audio of the first source 
Jie in the same field of endeavor teaches whereby enabling performing said synthesizing without  relying on the audio of the first source  ( synthesizing based on the i-vector, which embeds the characteristics of the first speech instead of the speech itself , Fig 2) 
It would have been obvious having the teachings of Huffman to further include the concept of Jie before effective filing date since  using i-vectors can capture the speaker identity and achieve high quality at the same time (Under  C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 



Regarding claim 9, Huffman as above in claim 7, teaches 10wherein the first information is a vector representing a voice in a speakers' space ( vector space, Para 0111-0114, 0035) 


Regarding claim 10, Huffman as above in claim 9, does not explicitly teaches  wherein the first information is an x-vector or an i-vector
However Jie teaches wherein the first information is an x-vector or an i-vector( i-vectors for voice conversion, speaker identity is represented by i-vector, Under III. proposed: avm with augmented i-vectors , B. I-vector Extraction; IV. experimental setup , A. I-vector Extractor) 
It would have been obvious having the teachings of Huffman to further include the concept of i-vector of Jie before effective filing date to make the voice conversion more convenient using the low dimensional model such as i-vectors  ( Conclusion, Jie) 

Regarding claim 12, Huffman as above in claim 7, teaches 1swherein the first information is embedded within the first audio as a watermark ( watermark, Para 0131)  while Huffman does not explicitly teach  wherein the watermark is indicative of characteristic of the first audio source
Jie teaches wherein the watermark is indicative of characteristic of the first audio source ( embedding using the i-vector, sing i-vectors can capture the speaker identity and achieve high quality at the same time, Under C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 
It would have been obvious having the teachings of Huffman to further include the teachings of Jie before effective filing date to capture the speaker identity and achieve high quality at the same time, Under C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 

Regarding claim 13, Huffman as above in claim 7, teaches  modifying speech by the first source such that the first audio sounds as if emitted by the second source ( transformation, Fig 1, Fig 8-11) ;  20extracting information of the first source from speech by the first source; and embedding the information of the first source within the first audio ( embedding frequency component, Para 0035-0040, Para 0131) 

Regarding claim 17, Huffman as above in claim 14, teaches , wherein the processor is further configured to perform: receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source ( transformation, Fig 1) , wherein the first audio having embedded therein 20first information characterizing the first source of the audio ( frequency component, Para 0035-0040; wherein frequency component can be a watermark, Para 0131, 0011)  extracting from the first audio the first information of the first source based on the information embedded within the first audio ( extracting the frequency component, Para 0035-0040) ; and synthesizing, based on the first information, a second audio comprising speech in the likeness of the first source ( synthesize and manipulate to get the second audio, Para 0035; Fig 8-11) 
Jie in the same field of endeavor teaches whereby enabling performing said synthesizing without  relying on the audio of the first source  ( synthesizing based on the i-vector, which embeds the characteristics of the first speech instead of the speech itself , Fig 2) 
It would have been obvious having the teachings of Huffman to further include the concept of Jie before effective filing date since  using i-vectors can capture the speaker identity and achieve high quality at the same time (Under  C. Dblstm+avm+/-Vector Vs. Dblstm+avm) 


Regarding claim 15, arguments analogous to claim 3, are applicable. 

Claims 4 and 19  rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Huffman (US Pub: 20210050025) herein after Huffman’025

Regarding claim 4, Huffman as above in claim 1, does not explicitly teaches  wherein the first information is embedded within the first 20audio using steganography
However Huffman’025 teaches wherein the first information is embedded within the first 20audio using steganography (the system may perform steganography in the spectrogram space as opposed to the audio domain; the system may use a generative-adversarial neural network to help make the watermark ‘hidden’—as opposed to training the ‘watermarked’ signal to look like an ‘unwatermarked’ signal—the adversary is training the ‘watermarked’ signal to look like a signal coming from a target speaker while the Watermark machine learning trains the signal to contain the watermark; the ‘watermarked’ signal looks like a signal from the unwatermarked dataset, but also is in the voice of the target speaker, Para 0125/ Page 9- provisional ) 
It would have been obvious having the teachings of Huffman to further include the concept of Huffman’025 before effective filing date since audio signal is in the spectrogram domain and processing can be done in space domain as another way of hiding information 

Regarding claim 19, arguments analogous to claim 4, are applicable. 


Claims 11 is  rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data) and further in view of Huffman (US Pub: 20210050025) herein after Huffman’025

Regarding claim 11, Huffman modified by Jie as above in claim 7, does not teach wherein the first information is embedded within the first audio using steganography 
However Huffman’025 teaches wherein the first information is embedded within the first audio using steganography the system may perform steganography in the spectrogram space as opposed to the audio domain; the system may use a generative-adversarial neural network to help make the watermark ‘hidden’—as opposed to training the ‘watermarked’ signal to look like an ‘unwatermarked’ signal—the adversary is training the ‘watermarked’ signal to look like a signal coming from a target speaker while the Watermark machine learning trains the signal to contain the watermark; the ‘watermarked’ signal looks like a signal from the unwatermarked dataset, but also is in the voice of the target speaker, Para 0125/ Page 9- provisional ) 
It would have been obvious having the teachings of Huffman to further include the concept of Huffman’025 before effective filing date since audio signal is in the spectrogram domain and processing can be done in space domain as another way of hiding information 


Claims 8 and 18  are rejected under 35 U.S.C. 103 as being unpatentable over Huffman ( US Pub 20180342256) and further in view of Jie ( On the Use of I-vectors and Average Voice Model for Voice Conversion without Parallel Data)  and further in view of Arik ( Neural Voice Cloning with a Few Samples) 
Regarding claim 8, Huffman as above in claim 7, teaches ( could be a text representation, Para 0116) but does not explicitly teaches  wherein said synthesizing comprising applying text-to-speech to text spoken in the first audio
However Arik teaches  wherein said synthesizing comprising applying text-to-speech to text spoken in the first audio ( voice cloning/synthesis using text to speech fig 1) 
It would have been obvious having the teachings of Huffman to further include the concept of Arik before effective filing date since the model gives optimal results using few text-audio pairs ( under 3, speaker adaptation) 

Regarding claim 18, Huffman as above in claim 17, does not explicitly teaches wherein said synthesizing comprises applying text-to-speech to text spoken in the first audio.  
However Arik teaches wherein said synthesizing comprises applying text-to-speech to text spoken in the first audio( voice cloning/synthesis using text to speech fig 1) 
It would have been obvious having the teachings of Huffman to further include the concept of Arik before effective filing date since the model gives optimal results using few text-audio pairs ( under 3, speaker adaptation) 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHA MISHRA whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHA MISHRA/Primary Examiner, Art Unit 2674