Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawings were received on 9/30/2021.  These drawings are accepted.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4,6-12,15-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118) and further in view of Izadi et al (US Publication No.: 20210365777).
	Claim 1, Khoury et al discloses
	a computation device (Paragraph 67);
	memory configured to store program instructions (paragraph 67,68), wherein when executed by the computation device, the program instructions cuase the computer system to perform one or more operations (paragraph 67-68) comprising
	receiving audio content (Fig. 1,2, label voice source is received by label 100,200),
wherein the audio content is allegedly associated with a given individual (paragraph 36 discloses “A voice source 10 (e.g., a person or, typically fraudulently, a recording of a person) …”);
	analyzing information associated with the audio content using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.),
	wherein the audio content or the given individual has an associated context (paragraph 36 discloses acoustic features are extracted from the voice of the given individual. The acoustic features are context of the voice of the given individual.), and 
	classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices).
	Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification. 
	Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the received request to access account data (in the scenario the system determines the audio data as replay audio data) or providing access to account data (in the scenario the system determines the audio data as genuine). (paragraph 40) 
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the classification of the audio data as disclosed by Leng et al so to prevent fraudulent users from accessing the device or private user information, hence increasing security of the system or device capable of processing voice commands.
	Leng et al discloses the predetermined neural network (Fig. 2, label second DNN), but fails to disclose wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights.
	Izadi et al discloses wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights (Paragraph 55,28 discloses selecting a particular set of conditional layer weights from a set of infinitely many possible conditional layer weights.). It would be obvious to one skilled in the art before the effective filing date of the application to modify Leng et al’s predetermined neural network by selecting weights of the neural network as disclosed by Izadi et al so to optimize the neural network and improve speech recognition, a part of determining authentic voice of the user or given individual.
	Claim 2, Khoury et al discloses the information comprises a representation of the audio content (Fig. 1, label feature extraction, Fig. 2, label first DNN) and the operations comprise determining the representation of the audio content by performing a transformation on the audio content (Fig. 1, label feature extraction and Fig. 2, label first DNN indicates transformation on the audio content to generate representation of the audio content.).
Claim 3, Khoury et al discloses the transformation comprises a Fourier transform or a discrete Fourier transform (Fig. 6, label STFT); and wherein the representation comprises a spectrogram, a phasegram or both (Fig. 6, label power spectrum).
Claim 4, Khoury et al discloses the transformation comprises a cosine transformation of a power spectrum of the audio content (Fig. 6, label power spectrum, DCT); and wherein the representation comprises mel-frequency cepstral coefficients  (Paragraph 56 discloses the first DNN may be configured via MFCCs.).
Claim 6, Khoury et al discloses the transformation comprises a signal processing transformation (Fig. 6 shows the transformation includes signal processing transformation.).
Claim 7, Khoury et al discloses the audio content is allegedly associated with a given individual (Paragraph 34 discloses spoofing occurs when a fraudulent or malicious communication sent from an unknown source disguised as a known source. For example, a fraudulent speaker or call may imitate or replay a known caller’s voice. Paragraph 36 discloses the automatic speech verification apparatus or system determined whether the input voice or speech is a genuine speaker or fraudulent speaker. Such paragraph indicates the audio content is allegedly associated with a given speaker or user.) and the analysis further uses a predetermined representation of audio content associated with the given individual based at least in part on historical audio content of the given individual (Paragraph 37 discloses one or more enrollment models may be generated for each authorized user at a speaker enrollment time and stored in an enrollment database 140. Paragraph 41 discloses the second deep neural network determines a likelihood that the voice sample includes spoofing condition. Paragraph 48,58 discloses the binary classifier determines whether the input voice or speech signal is genuine or fake depending on the comparison of the likelihood score of the previously captured enrollment sample associated with a genuine user for whom the voice sample is intended to match and the output from the second neural network shown in Fig. 2.)
Claim 8, Khoury et al discloses the audio content has an associated content (Paragraph 38 discloses audio sample includes associated content such as frequency, frequency range, dynamic power range, reverberation, noise levels in particular frequency ranges and the like.) and the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context (Fig. 2, label 220 as the predetermined neural network with model selected from a plurality of models. Paragraph 37 discloses one or more enrollment models may be generated for each user and selected to train 120 to discriminate one or more spoofing conditions from a genuine access. Paragraph 39 discloses a selected enrollment model is used for a specific user with an audio having limited number of channels and low level audio qualities. Paragraph 41 discloses label 220 corresponds to label 120.).
Claim 9, Khoury et al discloses the predetermined neural network was trained using synthetic audio content corresponding to different attack vectors used to generate fake audio content (Paragraph 37 discloses training of the neural network with known spoofed and known clean models.).
Claim 10, Khoury et al discloses the output comprises a probability (Paragraph 47 discloses a likelihood score for each class: replay attack, voice conversion, speech synthesis, and an absence thereof.) and the classification is further based at least in part on a threshold. (Paragraph 48 discloses comparing the likelihood score to a threshold in order to determine classification of genuine or spoofed.) 
Claim 11, Khoury et al discloses wherein the audio content is allegedly associated with a given individual (Paragraph 39 discloses a genuine speaker uses a limited number of channels having specific low level audio qualities. An enrollment model with these qualities are used to determine whether the audio content is genuine. This indicates the audio content is associated with a given individual.) and the threshold corresponds to the given individual (paragraph 36 discloses the binary classifier may compare the resulting classification with a predetermined threshold score or another classification from previously stored low level features for a voice model corresponding to an authorized user.).
Claim 12, Khoury et al discloses wherein the predetermined neural network comprises multiple convolutional blocks, arranged sequentially, followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.).
Claim 15, Khoury et al discloses wherein the classification is performed using a classifier or a regression model (paragraph 41 discloses the binary classifier classifies the audio data to genuine or spoofed.) that was trained using a supervised learning technique (paragraphs 36-38 discloses training of the binary classifier based on previous interactions, such as previous interactions corresponding to genuine speaker. Paragraph 41 discloses the binary classifier is trained using the predetermined likelihood. Such indicates training is supervised.) and a training dataset with additional audio content (Paragraph 37 discloses training dataset such as subsequent genuine interactions with the corresponding genuine speaker is used to update the models of the enrollment model.).
Claim 16, Khoury et al discloses wherein the classification is performed using a classifier or a regression model (paragraph 41 discloses the binary classifier classifies the audio data to genuine or spoofed.) that was trained using additional audio content that was classified as being fake or real audio content (Paragraph 39 discloses training data set pertaining to a genuine speaker’s audio qualities.) using an unsupervised learning technique (Paragraph 39 discloses the training dataset is captured and stored according to the qualities of a genuine speaker and distinguishing a voice input as spoofed or genuine according to such qualities or attributes of a genuine speaker. Since the classification is based on shared attributes of a genuine speaker and the voice input, such training dataset is generated using unsupervised learning technique.).
Claim 17, Khoury et al discloses wherein the remedial action comprises one of: providing a warning associated with the audio content; 
providing a recommendation associated with the audio content; or 
filtering at least a portion of the audio content. (paragraph 54,40 discloses rejecting the input data or audio content when a replay is determined. Such rejection is an indication of warning associated with the audio content.).
Claim 18, Khoury et al discloses
	receiving audio content (Fig. 1,2, label voice source is received by label 100,200),
wherein the audio content is allegedly associated with a given individual (paragraph 36 discloses “A voice source 10 (e.g., a person or, typically fraudulently, a recording of a person) …”);
	analyzing information associated with the audio content using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.),
	wherein the audio content or the given individual has an associated context (paragraph 36 discloses acoustic features are extracted from the voice of the given individual. The acoustic features are context of the voice of the given individual.), and 
	classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices).
	Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification. 
	Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the received request to access account data (in the scenario the system determines the audio data as replay audio data) or providing access to account data (in the scenario the system determines the audio data as genuine). (paragraph 40) 
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the classification of the audio data as disclosed by Leng et al so to prevent fraudulent users from accessing the device or private user information, hence increasing security of the system or device capable of processing voice commands.
	Leng et al discloses the predetermined neural network (Fig. 2, label second DNN), but fails to disclose wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights.
	Izadi et al discloses wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights (Paragraph 55,28 discloses selecting a particular set of conditional layer weights from a set of infinitely many possible conditional layer weights.). It would be obvious to one skilled in the art before the effective filing date of the application to modify Leng et al’s predetermined neural network by selecting weights of the neural network as disclosed by Izadi et al so to optimize the neural network and improve speech recognition, a part of determining authentic voice of the user or given individual.
Claim 19, Khoury et al discloses
	by a computer system (Paragraph 67,68);
	receiving audio content (Fig. 1,2, label voice source is received by label 100,200),
wherein the audio content is allegedly associated with a given individual (paragraph 36 discloses “A voice source 10 (e.g., a person or, typically fraudulently, a recording of a person) …”);
	analyzing information associated with the audio content using a predetermined neural network (Fig. 2, label second DNN. Paragraph 41,36 discloses the second DNN is a trained via an enrollment database 140, which indicates the neural network is predetermined.),
	wherein the audio content or the given individual has an associated context (paragraph 36 discloses acoustic features are extracted from the voice of the given individual. The acoustic features are context of the voice of the given individual.), and 
	classifying, based at least in part on an output of the predetermined neural network, the audio content as being fake or real (Fig. 1,2, label genuine or spoofed and 230,130,220,120 classifies the voice source as genuine or spoofed.), wherein the fake audio content is, at least in part, computer generated (Paragraph 51 discloses spoofing call can be from a computer microphone and other devices).
	Khoury et al discloses classifying the voice source as genuine or spoofed (Fig. 1,2, label genuine, spoofed), but fails to disclose the result of such classification is selectively performing a remedial action based at least in part on the classification. 
	Leng et al discloses a service provider that classifies a voice input as being genuine audio data or replay audio data (spoofed) (paragraph 40). Depending on the classification, the system will selectively perform a remedial action of rejecting the received request to access account data (in the scenario the system determines the audio data as replay audio data) or providing access to account data (in the scenario the system determines the audio data as genuine). (paragraph 40) 
It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al by performing an action according to the classification of the audio data as disclosed by Leng et al so to prevent fraudulent users from accessing the device or private user information, hence increasing security of the system or device capable of processing voice commands.
	Leng et al discloses the predetermined neural network (Fig. 2, label second DNN), but fails to disclose wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights.
	Izadi et al discloses wherein the predetermined neural network is selected from a set of predetermined neural networks based at least in part on the context, or weights associated with the predetermined neural network are selected from a set of predetermined weights (Paragraph 55,28 discloses selecting a particular set of conditional layer weights from a set of infinitely many possible conditional layer weights.). It would be obvious to one skilled in the art before the effective filing date of the application to modify Leng et al’s predetermined neural network by selecting weights of the neural network as disclosed by Izadi et al so to optimize the neural network and improve speech recognition, a part of determining authentic voice of the user or given individual.

Claims 13,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Izadi et al (US Publication No.: 20210365777), and further in view of Boyadjiev et al (US Publication No.: 20200035247).
Claim 13, Khoury et al discloses 
wherein the predetermined neural network comprises convolutional blocks, arranged sequentially (Fig. 7, label 720, 730), 
wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.),
wherein the given convolution operation corresponds to a given frequency range (Fig. 2 shows the second DNN is provided with features from the first DNN. The features are generated using STFT, which indicates the operation of the second DNN is performed within the given frequency range for produced from performing STFT. Paragraph 63 discloses the second DNN includes multiple convolutional layers.), but fails to disclose the multiple convolution layers comprises a normalization operation. 
	Boyadjiev et al discloses convolutional neural network with convolutional layers, normalization layers, pooling layers and fully connected layers (paragraph 46). It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al’s neural network with convolutional layers and pooling layers with normalization layers as disclosed by Boyadjiev et al so to performing convolutional based neural network and determined whether an input voice is a spoofed voice or genuine, hence increasing security of user device. 
Claim 20, Khoury et al discloses wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), followed by a softmax layer (Fig. 7, Paragraph 63 discloses a plurality of convolutional layers 720. Paragraph 47 discloses a softmax layer by disclosing likelihood score for different categories such as replay attack, voice conversion, speech synthesis and lack thereof.),
wherein a given convolution block comprises a given convolution operation and a max pool operation (Fig. 7, label 720, paragraph 63 discloses the plurality of convolutional layers includes at least one max pooling layer and convolutional layers, which indicates a given convolution operation is performed.), wherein the given convolution operation corresponds to a given frequency range (Fig. 2 shows the second DNN is provided with features from the first DNN. The features are generated using STFT, which indicates the operation of the second DNN is performed within the given frequency range for produced from performing STFT. Paragraph 63 discloses the second DNN includes multiple convolutional layers.), but fails to disclose the multiple convolution layers comprises a normalization operation. 
	Boyadjiev et al discloses convolutional neural network with convolutional layers, normalization layers, pooling layers and fully connected layers (paragraph 46). It would be obvious to one skilled in the art before the effective filing date of the application to modify Khoury et al’s neural network with convolutional layers and pooling layers with normalization layers as disclosed by Boyadjiev et al so to performing convolutional based neural network and determined whether an input voice is a spoofed voice or genuine, hence increasing security of user device. 

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Izadi et al (US Publication No.: 20210365777), and further in view of Rastrow et al (US Patent No.: 10032463).
Claim 5, Khoury et al discloses the transformation comprises a neural network (Fig. 2, label first DNN), but fails to disclose the representation comprises word embedding or sense embedding of words in the audio content.
Rastrow et al discloses a speech recognition system comprising an acoustic model (Fig. 1, label 110,112). Col. 7, lines 36-53 discloses “using the acoustic model 112 to generate hypotheses regarding the words in the utterance. For example, the acoustic model 112 may generate probabilities that individual feature vectors or frames correspond to particular words or subword units (e.g. phonemes, triphones, n-grams, etc.).” It would be obvious to one skilled in the art before the effective filing date of the application to for the representation or feature vectors of Khoury et al to correspond to particular words or subword units or word embeddings as disclosed by Rastrow et al so to understand and decipher the user’s utterance, hence improving speech recognition.

Claim 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Khoury et al (US Publication No.: 20180254046) in view of Leng et al (US Publication No.: 20200053118), further in view of Izadi et al (US Publication No.: 20210365777), and further in view of Huffman et al (US Patent No.: 10861476).
Claim 14, Khoury et al discloses the predetermined neural network (Fig. 2, label second DNN), but fails to disclose the second DNN as a generative adversarial network (GAN).
Huffman et al discloses a neural network to distinguish authentic from fake speech based on an input speech or voice. Such neural network is a generative adversarial network (Fig. 2, label 116,140,142, Fig. 9,11, Col. 16, lines 40-67,Col. 17, 55-67). It would be obvious to one skilled in the art before the effective filing date of the application to substitute one well known element of a neural network as disclosed by Khoury et al with another well known element of a GAN neural network as disclosed by Huffman et al so to yield predictable result of determining whether a voice is fake or authentic, hence improving security of a user device enabled with speech recognition. 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11158329. Although the claims at issue are not identical, they are not patentably distinct from each other because the recited claimed language of this application is broader than the patent, hence anticipates the inventive subject matter of the patent.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LINDA WONG whose telephone number is (571)272-6044. The examiner can normally be reached 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on (571) 272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LINDA WONG/Primary Examiner, Art Unit 2655