Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 9/11/2019.  These drawings are accepted.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3,5-11,14-17,19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118).
Claim 1, Khoury et al discloses
	a computation device (Paragraph 67);

	receiving audio content (Fig. 1, 2, label voice source is received by label 100,200.);
	determining a representation of the audio content by performing a transformation on the audio  content (Fig. 1, label feature extraction, Fig. 2, label first DNN);
	analyzing the representation using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.); and
	classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer-generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices.).
Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification.
Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (Paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the received request to access account data (in the scenario the system determines the audio 
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the classification of the audio data as disclosed by Leng et al so to prevent fraudulent users from accessing the device or private user information, hence increasing security of the system or device capable of processing voice commands. 
Claim 2, Khoury et al discloses the transformation comprises a Fourier transform or a discrete Fourier transform (Fig. 6, label STFT); and wherein the representation comprises a spectrogram, a phasegram or both (Fig. 6, label power spectrum).
Claim 3, Khoury et al discloses the transformation comprises a cosine transformation of a power spectrum of the audio content (Fig. 6, label power spectrum, DCT); and wherein the representation comprises mel-frequency cepstral coefficients  (Paragraph 56 discloses the first DNN may be configured via MFCCs.).
Claim 5, Khoury et al discloses the transformation comprises a signal processing transformation (Fig. 6 shows the transformation includes signal processing transformation.).
Claim 6, Khoury et al discloses the audio content is allegedly associated with a given individual (Paragraph 34 discloses spoofing occurs when a fraudulent or malicious communication sent from an unknown source disguised as a known source. For example, a fraudulent speaker or call may imitate or replay a known caller’s voice. Paragraph 36 discloses the automatic speech verification apparatus or system determined whether the input voice or speech is a genuine speaker or fraudulent speaker. Such paragraph indicates the audio content is allegedly associated with a given 
Claim 7, Khoury et al discloses the audio content has an associated content (Paragraph 38 discloses audio sample includes associated content such as frequency, frequency range, dynamic power range, reverberation, noise levels in particular frequency ranges and the like.) and the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context (Fig. 2, label 220 as the predetermined neural network with model selected from a plurality of models. Paragraph 37 discloses one or more enrollment models may be generated for each user and selected to train 120 to discriminate one or more spoofing conditions from a genuine access. Paragraph 39 discloses a selected enrollment model is used for a specific user with an audio having limited number of channels and low level audio qualities. Paragraph 41 discloses label 220 corresponds to label 120.).
Claim 8, Khoury et al discloses the predetermined neural network was trained using synthetic audio content corresponding to different attack vectors used to generate 
Claim 9, Khoury et al discloses the output comprises a probability (Paragraph 47 discloses a likelihood score for each class: replay attack, voice conversion, speech synthesis, and an absence thereof.) and the classification is further based at least in part on a threshold. (Paragraph 48 discloses comparing the likelihood score to a threshold in order to determine classification of genuine or spoofed.) 
Claim 10, Khoury et al discloses wherein the audio content is allegedly associated with a given individual (Paragraph 39 discloses a genuine speaker uses a limited number of channels having specific low level audio qualities. An enrollment model with these qualities are used to determine whether the audio content is genuine. This indicates the audio content is associated with a given individual.) and the threshold corresponds to the given individual (paragraph 36 discloses the binary classifier may compare the resulting classification with a predetermined threshold score or another classification from previously stored low level features for a voice model corresponding to an authorized user.).
Claim 11, Khoury et al discloses wherein the predetermined neural network comprises multiple convolutional blocks, arranged sequentially, followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.).
Claim 14, Khoury et al discloses wherein the classification is performed using a classifier or a regression model (paragraph 41 discloses the binary classifier classifies the audio data to genuine or spoofed.) that was trained using a supervised learning 
Claim 15, Khoury et al discloses wherein the classification is performed using a classifier or a regression model (paragraph 41 discloses the binary classifier classifies the audio data to genuine or spoofed.) that was trained using additional audio content that was classified as being fake or real audio content (Paragraph 39 discloses training data set pertaining to a genuine speaker’s audio qualities.) using an unsupervised learning technique (Paragraph 39 discloses the training dataset is captured and stored according to the qualities of a genuine speaker and distinguishing a voice input as spoofed or genuine according to such qualities or attributes of a genuine speaker. Since the classification is based on shared attributes of a genuine speaker and the voice input, such training dataset is generated using unsupervised learning technique.).
Claim 16, Khoury et al discloses wherein the remedial action comprises one of: providing a warning associated with the audio content; 
providing a recommendation associated with the audio content; or 
filtering at least a portion of the audio content. (paragraph 54,40 discloses rejecting the input data or audio content when a replay is determined. Such rejection is an indication of warning associated with the audio content.).
 Claim 17, Khoury et al discloses 

determining a representation of the audio content by performing a transformation on the audio content (Fig. 1, label feature extraction, Fig. 2, label first DNN);
analyzing the representation using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.); and 
classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices.).
Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification.
Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (Paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the received request to access account data (in the scenario the system determines the audio data as replay audio data) or providing access to account data (in the scenario the system determines the audio data as genuine). (paragraph 40)
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the 
Claim 19, Khoury et al discloses 
by a computer system (paragraph 67-68, Fig. 1,2,7,6):
receiving the audio content (Fig. 1, 2, label voice source is received by label 100,200.);
determining a representation of the audio content by performing a transformation on the audio content (Fig. 1, label feature extraction, Fig. 2, label first DNN);
analyzing the representation using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.); and 
classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices.).
Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification.
Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (Paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the 
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the classification of the audio data as disclosed by Leng et al so to prevent fraudulent users from accessing the device or private user information, hence increasing security of the system or device capable of processing voice commands. 

Claims 12,18,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Boyadjiev et al (US Publication No.: 20200035247).
Claim 12, Khoury et al discloses 
wherein the predetermined neural network comprises convolutional blocks, arranged sequentially (Fig. 7, label 720, 730), 
wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.),

	Boyadjiev et al discloses convolutional neural network with convolutional layers, normalization layers, pooling layers and fully connected layers (paragraph 46). It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al’s neural network with convolutional layers and pooling layers with normalization layers as disclosed by Boyadjiev et al so to performing convolutional based neural network and determined whether an input voice is a spoofed voice or genuine, hence increasing security of user device. 
Claim 18, Khoury et al discloses wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.),
wherein the predetermined neural network comprises convolutional blocks, arranged sequentially, followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by 
wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), wherein the given convolution operation corresponds to a given frequency range (Fig. 2 shows the second DNN is provided with features from the first DNN. The features are generated using STFT, which indicates the operation of the second DNN is performed within the given frequency range for produced from performing STFT. Paragraph 63 discloses the second DNN includes multiple convolutional layers.), but fails to disclose the multiple convolution layers comprises a normalization operation. 
Boyadjiev et al discloses convolutional neural network with convolutional layers, normalization layers, pooling layers and fully connected layers (paragraph 46). It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al’s neural network with convolutional layers and pooling layers with normalization layers as disclosed by Boyadjiev et al so to performing convolutional based neural network and determined whether an input voice is a spoofed voice or genuine, hence increasing security of user device. 
Claim 20, Khoury et al discloses wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional 
wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), wherein the given convolution operation corresponds to a given frequency range (Fig. 2 shows the second DNN is provided with features from the first DNN. The features are generated using STFT, which indicates the operation of the second DNN is performed within the given frequency range for produced from performing STFT. Paragraph 63 discloses the second DNN includes multiple convolutional layers.), but fails to disclose the multiple convolution layers comprises a normalization operation. 
	Boyadjiev et al discloses convolutional neural network with convolutional layers, normalization layers, pooling layers and fully connected layers (paragraph 46). It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al’s neural network with convolutional layers and pooling layers with normalization layers as disclosed by Boyadjiev et al so to performing convolutional based neural network and determined whether an input voice is a spoofed voice or genuine, hence increasing security of user device. 

Claim 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Rastrow et al (US Patent No.: 10032463).

Rastrow et al discloses a speech recognition system comprising an acoustic model (Fig. 1, label 110,112). Col. 7, lines 36-53 discloses “using the acoustic model 112 to generate hypotheses regarding the words in the utterance. For example, the acoustic model 112 may generate probabilities that individual feature vectors or frames correspond to particular words or subword units (e.g. phonemes, triphones, n-grams, etc.).” It would be obvious to one skilled in the art before the effective filing date of the application to for the representation or feature vectors of Khoury et al to correspond to particular words or subword units or word embeddings as disclosed by Rastrow et al so to understand and decipher the user’s utterance, hence improving speech recognition.

Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Huffman et al (US Patent No.: 10861476).
Claim 13, Khoury et al discloses the predetermined neural network (Fig. 2, label second DNN), but fails to disclose the second DNN as a generative adversarial network (GAN).
Huffman et al discloses a neural network to distinguish authentic from fake speech based on an input speech or voice. Such neural network is a generative adversarial network (Fig. 2, label 116,140,142, Fig. 9,11, Col. 16, lines 40-67,Col. 17, 55-67). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of a neural network as disclosed by 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044.  The examiner can normally be reached on 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LINDA WONG/Primary Examiner, Art Unit 2656