DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Claims 1, 5, 7,  10, 14-15, 17 and 19 are amended. Claims 6 and 18 are cancelled. Claims 21-22 are added. Claims 1-5, 7-17 and 19-22 are presented for examination. 
Response to Arguments
Applicant arguments are persuasive, hence Examiner is re-opening the prosecution. Applicant’s arguments, see remark pages 1 and 2 , filed on 9/22/2010, with respect to claim 6  have been fully considered and are persuasive.  The rejection under 35 U.S.C. 103 as being unpatentable over Piero (WO 2019191251) and further in view of Chae (  US 20200058290)   of 6  has been withdrawn. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.

Claims 1-3, 5, 7-8, 10-17, 19 and 21-22, are rejected under 35 U.S.C. 103 as being unpatentable over Piero (WO 2019191251) and further in view of  Aryal ( US Pat# 10643600) 

Regarding claim 1, Piero  teaches a system ( Fig 7) , comprising: a memory that stores computer executable components; and a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components  ( Fig 5A,5B, memory and processor))  comprise: a differential component that determines a difference between a first vector that characterizes a first feature of an audio signal and a second vector that characterizes a second feature of a synthesized reference audio signal  ; and a speech analysis component that determines a condition of the origin of the audio signal based on  the difference( extract feature 408, Para 0072; wherein the feature is a condition of an origin can be extracted features ‘V’ may be based on differences in speech properties such as differences in pitch, amplitude, duration, etc. between synthetic speech data and recorded reference speech data. The synthetic speech data‘D’ and reference speech data‘R’ (e.g., natural speech) may be aligned in time for facilitating with a feature extraction step or steps. These extracted features‘V’ may include, but are not limited to, Fundamental Frequency (F0), LF (Liljencrants-Fant  model) features representing the source signal (e.g., vocal folds’ behavior), parametric representation of the spectrum (such as Cepstral Coefficients), linguistic features representing the context, linguistic features related to the context, and a difference signal between the recorded reference speech and synthesized speech. In an example, the difference signal that may be modeled is a source signal, and not the parameter space. This difference signal may be modeled in a space of vector quantized excitation vectors that may be built in the training mode. In an example embodiment, where the system 100 is the parametric text-to-speech synthesis system, the extracted features ‘V’ may particularly include a sequence of excitation vectors, corresponding to the differences between the synthetic speech data‘D’ (e.g., SPSS) and the recorded reference speech data‘R’ (e.g., natural speech signal), for the first input text‘TG, Para 0048; extract atleast one feature based on the speech and the synthetic data, Para 0072, Fig 7; wherein the feature can be pitch, duration, amplitude etc., Para 0046; a difference is incorporated to match the TTS close to user voice, Para 0045, 0054, 0072 – gap filling model for a particular speaker and improving pitch prediction) 
 wherein the difference correlates to a speech pattern associated with an origin of the audio signal
However Aryal in the same filed of endeavor teaches wherein the difference correlates to a speech pattern associated with an origin of the audio signal ( scaling factor associated with the user is stored, fig 6, Col 6, line 35-67; scaling factor represents the speech patterns and its associated with the user ( origin of audio signal) 
Piero has a base concept of finding the difference between user’s voice and the TTS voice and modify the system based on that. It would have been obvious before effective filing date, having the teachings of Piero to include the teachings of Aryal to correlate/associate  the difference of onset times obtained from user’s voice and TTS voice with a particular user to personalize the system for a particular user ( Abstract, Aryal) 
  
Regarding claim 2, Piero as above in claim 1, teaches  further comprising: a synthetic speech component that generates the synthesized reference audio signal , wherein the audio signal and the synthesized reference audio signal express a mutual sentence structure ( synthetic speech of the same signal, Fig 7) 
Regarding claim 3, Aryal  as above in claim 2,  a speech content component that analyzes the audio signal using machine learning model to determine a sentence structure expressed by the audio signal (sentence  ( words etc.), Col 2, line 20-45, Aryal, Col 3, line 45-60) , and wherein the synthetic speech component generates the synthesized reference audio signal to match the sentence structure ( Fig 4, Synthetic speech to match the original audio using linguistic feature, a linguistic feature vector (x) is computed 430 for each syllable. The linguistic feature vector is constructed based on attributes that characterize the syllable as well as the syllable's context in the sentence, Col 4 line 38-42) 
It would have been obvious having the teachings of Piero to further include the concept of Aryal before effective filing date to improve the technology of synthesized voices 

Regarding claim 5, Piero as above in claim 1, teaches  a feature component that extracts a first vector from the audio signal that characterizes the first feature and extracts a second vector from the synthesized reference audio signal that characterizes the second feature ( comparison based on the feature, Para 0075-0076)

Regarding claim 7, Piero modified by Aryal  as above in claim 1,  teaches  a classification component that generates a machine learning model that classifies the speech pattern to determine the condition ( GBRT for the association, Col 6, line 60-67, Aryal ) 
Regarding claim 8, Piero modified by Aryal  as above in claim 7, wherein the machine learning model utilizes a neural network model ( neural network , Col 5, line 65-67, Aryal ) 



Regarding claim 10, arguments analogous to claim 1, are applicable. In addition Piero teaches A computer-implemented to perform the steps of claim 1 ( Abstract) 
Regarding claim 14, Piero modified by Aryal as above in  claim 10, teaches the system, a neural network model that classifies the speech pattern to determine the condition ( difference in the pitch, shifting, amplitude etc.,  associating with the user, Para 0045;Piero; scaling associated with the user is stored, Fig 6, Aryal) 

Regarding claim 15, arguments analogous to claim 1 are applicable. In addition Piero teaches A computer program product for characterizing speech, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform the functions of claim 1 ( Abstract, Fig 5a-5b) 
Regarding claim 21, Aryal as above in claim 15, teaches  wherein the condition comprises a member selected from a group consisting of an identity, an emotional state, an accent, an age, and a health status ( personalized system- identity of the speaker associated with the scale , Fig 6) 
Regarding claim 22, Piero modified by Aryal as above in claim 1, teaches  wherein the origin is a human speaking ( associated with the speaker ( recorded voice), Fig 4, fig 6, Aryal; user recoded voice, Para 0043)  

Regarding claim 11, arguments analogous to claim 2, are applicable
 Regarding claim 12, arguments analogous to claim 3, are applicable 

Regarding claim 13, arguments analogous to claim 5, are applicable 

Regarding claim 16, arguments analogous to claim 2, are applicable
Regarding claim 17, arguments analogous to claim 5, are applicable 


Regarding claim 19, arguments analogous to claim 7, are applicable


Claims 4 is rejected under 35 U.S.C. 103 as being unpatentable over Piero (WO 2019191251) and further in view of  Aryal ( US Pat# 10643600)  and further in view of Mahyar ( US Pat# 10930263) 


Regarding claim 4, Piero as above in claim 1, does not explicitly teaches wherein the synthesized reference audio signal is comprised within a plurality of synthesized reference audio signals generated by the synthetic speech component, and wherein the plurality of synthesized reference audio signals express the mutual sentence structure 
However Mahyar teaches  wherein the synthesized reference audio signal is comprised within a plurality of synthesized reference audio signals generated by the synthetic speech component, and wherein the plurality of synthesized reference audio signals express the mutual sentence structure ( Following training, to synthesize multiple predicted audio waveforms (each one also referred to as “predicted audio”) for the speaker 105 speaking in the target language, each of these neural networks is provided a representation of text input, (e.g., the movie dialogue “What does it mean to be Samurai? . . . to master the way of the sword” as illustrated in text 140), and each neural network predicts as outputs an audio waveform (or parameters for generating an audio waveform) corresponding to the text input being spoken by the speaker 105 in the target language, Col 2, line 25-50) 

It would have been obvious having the teachings of Piero to further include the concept of Mahyar before effective filing date to improve the technology of synthesized voices ( Col 1, line 45-67, Mahyar) 



Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Piero (WO 2019191251) and further in view of  Aryal ( US Pat# 10643600)  and further in view of Sabrina (Vocal caricatures reveal signatures of speaker identity) 

Regarding claim 9, Piero as above in claim 1, wherein the condition comprises a member selected from a group consisting of an( amplitude, pitch, duration)  however does not explicitly mentions  identity, an emotional state, an accent, an age, and a health status 
However Sabrina teaches pitch/amplitude/duration determine identity, an emotional state, an accent, an age, and a health status ( pitch determines the identity of the speaker, Under Acoustic spaces of similarity and identity, Left col, page 3; Although studies that focused on prosodic aspects were inconclusive some temporal properties as pitch f0(t), sound intensity I(t) and duration D(t) have been shown to be cues for differentiating voices, Under Data Analysis, Right Col, Page 6) 

It would have been obvious having the teachings of Piero to further include the concepts of Sabrina before effective filing date to determine the identity of the speaker in an improved way (Abstract, Sabrina) 

Claim 20  is rejected under 35 U.S.C. 103 as being unpatentable over Piero (WO 2019191251) and further in view of  Aryal ( US Pat# 10643600)  and further in view of Macconnell ( US Pub: 20200211540) 

Regarding claim 20, Piero modified by Aryal as above in claim1 19,   mentioned remote machines ( Para 0061, Piero; storage devices, Col 7 line 45-50) Piero modified Aryal , does not explicitly teaches wherein the processor utilizes a cloud computing environment to generate the machine learning model 
However McConnell  teaches wherein the processor utilizes a cloud computing environment to generate the machine learning model  ( machine learning for synthetizing in a cloud based environment, Para 0061-0062) 
It would  have been obvious having the teachings of Piero and Aryal to further include the concept of Macconnell to have easily accessible storages 
Conclusion


THIS IS A 2ND ACTION NON-FINAL. Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHA MISHRA whose telephone number is (571)272-5357. The examiner can normally be reached M-T 7AM - 5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached on (571)272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHA MISHRA/Primary Examiner, Art Unit 2674