DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 1/22/2021 and 8/17/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Status of Claims
Claims 1-20 are pending in this application.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim 11 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021).
As per claim 11, Lesso discloses:
A computer-implemented method for spoofing countermeasures, the method comprising: 
obtaining, by a computer, a plurality of training audio signals including one or more clean audio signals and one or more simulated audio signals (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio."); 
training, by the computer, a neural network architecture to extract a spoofprint embedding from an audio signal and classify the audio signal, the neural network architecture trained by applying the neural network architecture on a plurality of features of the plurality of training audio signals (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine."); 
extracting, by the computer, an inbound spoofprint for the inbound speaker by applying the neural network architecture on the plurality of features of an inbound audio signal (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine."); and 
generating, by the computer, a classification for the inbound audio signal based upon applying the neural network architecture on the inbound spoofprint (Pg 11 In 15 - 20 "The method may comprise comparing a similarity score with a first threshold to determine whether the signal contains speech of an enrolled user, and comparing the similarity score with a second, lower, threshold to determine whether the signal contains speech.") .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 8-10, 14-15 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021) in view of Lesso (U.S. Patent Application Publication 2019/0228778)
As per claim 1, Lesso (WO) discloses:
A computer-implemented method for spoofing countermeasures, the method comprising: 
generating, by a computer, an enrollee spoofprint for an enrollee based upon a first set of one or more features extracted from one or more enrollee audio signals for the enrollee, wherein the first set of one or more features includes one or more types of spoofing artifacts of the enrollee (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio". Since spoofing artifacts are defined as “specific aspects of how the speaker speaks” (i.e. audio characteristics) and spoofprints can use the same features as voiceprints, Lesso teaches this limitation); 
applying, by the computer, a neural network architecture to an inbound audio signal, the neural network architecture trained to detect spoofing artifacts occurring in an audio signal (Pg 9 In 25 - 30 "Preferably, the first processor, or the device performing the first voice biometric process on the audio signal, is configured to perform a spoof detection process on the audio signal, to identify if the audio signal is the result of a replay attack,"); 
generating, by the computer, an inbound spoofprint for an inbound speaker by applying the neural network architecture to the inbound audio signal for the inbound speaker (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine."); and 
generating, by the computer, a spoof likelihood score for the inbound audio signal (Page 9, lines 29-30 and Page 10, lines 1-8).
Lesso (WO) fails to disclose but Lesso (US) discloses:
generating, by the computer, a spoof likelihood score for the inbound audio signal based upon one or more similarities between the inbound spoofprint and the enrollee spoofprint (Figure 12, items 204-208 and Paragraphs [0325-0327] – the speaker validation result is used as part of the anti-spoofing protection. Since the spoofprint and voiceprint can be identical, the limitations are met.)
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Lesso (WO) with the spoof detection based on voiceprint/spoof print of Lesso (US) because it is a case of simple substitution of one known element for another to obtain predictable results.

As per claim 2, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 1 above. Lesso (WO) in the combination further discloses:
extracting, by the computer, a plurality of features from a plurality of training audio signals, the plurality of training audio signals comprising one or more simulated audio signals and one or more clean audio signals (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio.") ; and 
training, by the computer, the neural network architecture to detect speech by applying the neural network architecture to the plurality of features (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine.").

As per claim 8, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 1 above. Lesso (WO) in the combination further discloses:
generating, by the computer, an enrollee voiceprint for the enrollee by applying the neural network architecture to a second set of one or more features extracted from the one or more enrollee audio signals for the enrollee, wherein the second set of one or more features includes one or more voice characteristics of the enrollee (Pg 1 In 25 - 30 "if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process.") ; 
generating, by the computer, an inbound voiceprint for the inbound speaker by applying the neural network architecture to the second set of one or more features extracted from the inbound audio signal (Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled user’s speech.") ; and 
generating, by the computer, a voice similarity score for the inbound audio signal based upon one or more similarities between the inbound voiceprint and the enrollee voiceprint (Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled user’s speech."); and 
generating, by the computer, a combined similarity score based upon the voice similarity score and the spoof likelihood score (Pg 18 In 4 -6 "Preferably, the method comprises the step of fusing the speaker ID score from the primary biometrics scoring with the second speaker ID score of the secondary biometrics scoring to provide a speaker authentication result.").

As per claim 9, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 1 above. Lesso (WO) in the combination further discloses:
generating, by the computer, an enrollee combined embedding based upon the enrollee spoofprint and an enrollee voiceprint (Pg 17 In 25 - 30 ''The use of such a two-stage biometrics scoring system allows for the primary biometrics scoring to be a relatively low-power and/or always-on solution, while the secondary biometrics scoring may be a relatively high-power and/or occasionally triggered solution, or a solution power-gated by the primary biometrics scoring.", Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled users speech.", and Pg 16 In 25 – 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio."); 
generating, by the computer, an inbound combined embedding based upon the inbound spoofprint and an inbound voiceprint (Pg 17 In 25 - 30 ''The use of such a two-stage biometrics scoring system allows for the primary biometrics scoring to be a relatively low-power and/or always-on solution, while the secondary biometrics scoring may be a relatively high-power and/or occasionally triggered solution, or a solution power-gated by the primary biometrics scoring.", Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled users speech.", and Pg 16 In 25 – 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio."); and 
generating, by the computer, a similarity score for the inbound audio signal based upon a similarity between the enrollee combined embedding and the inbound combined embedding (Pg 18 In 14 - 20 "By selecting the particular biometrics techniques to provide such performance, and/or by tuning the primary and secondary biometrics scoring systems to this effect, accordingly the eventual fusion of the primary and secondary scores results in a robust speaker recognition approach having combined low FAR and FRR scores.").

As per claim 10, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 1 above. Lesso (WO) in the combination further discloses:
the neural network architecture comprises one or more layers of one or more embedding extractors, including at least one of a spoofprint embedding extractor and a voiceprint embedding extractor (Pg 3 In 10- 12 "As another example, the first and second voice biometric processes might both use Deep Neural Nets, with the second process using more weights.").

As per claim 14, Lesso (WO) discloses:
A system comprising: 
a non-transitory machine readable memory; and 
a computer comprising a processor (Pg 11 In 5 - 10 "According to another aspect of the
present invention, there is provided a non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method according to the first aspect.") configured to: 
generate an enrollee spoofprint for an enrollee based upon a first set of one or more features extracted from one or more enrollee audio signals for the enrollee, wherein the first set of one or more features includes one or more types of audio characteristics of the enrollee (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio"); 
store the enrollee spoofprint into the memory (Pg 79 In 25 - 30 "based on the identified sound classification, scoring the received audio against a stored template of the acoustic classes produced by enrolled speakers to identify a speaker for the received audio from the enrolled speakers") ; 
apply a neural network architecture to an inbound audio signal, the neural network architecture trained to detect spoofing artifacts occurring in an audio signal (Pg 9 In 25 - 30 "Preferably, the first processor, or the device performing the first voice biometric process on the audio signal, is configured to perform a spoof detection process on the audio signal, to identify if the audio signal is the result of a replay attack,"); 
generate an inbound spoofprint for an inbound speaker by applying the neural network architecture to an inbound audio signal for the inbound speaker (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine."); and 
Lesso (WO) fails to disclose but Lesso (US) discloses:
generating, by the computer, a spoof likelihood score for the inbound audio signal based upon one or more similarities between the inbound spoofprint and the enrollee spoofprint (Figure 12, items 204-208 and Paragraphs [0325-0327] – the speaker validation result is used as part of the anti-spoofing protection. Since the spoofprint and voiceprint can be identical, the limitations are met.)
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method of Lesso (WO) with the spoof detection based on voiceprint/spoof print of Lesso (US) because it is a case of simple substitution of one known element for another to obtain predictable results.

As per claim 15, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 14 above. Lesso (WO) in the combination further discloses:
extract a plurality of features from a plurality of training audio signals, the plurality of training audio signals comprising one or more simulated audio signals and one or more clean audio signals (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio"); and 
training, by the computer, the neural network architecture to detect speech by applying the neural network architecture to the plurality of features (Pg 2 In 15 - 25 "In some embodiments, the first voice biometric process is selected from the following: a process based on analysing a long-term spectrum of the speech; a method using a Gaussian Mixture Model; a method using Mel Frequency Cepstral Coefficients; a method using Principal Component Analysis; a Joint Factor Analysis process; a Tied Mixture of Factor Analyzers process; a method using machine learning techniques such as Deep Neural Nets (DNNs) or Convolutional Neural Nets (CNNs); and a method using a Support Vector Machine.").

As per claim 19, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 14 above. Lesso (WO) in the combination further discloses:
generate an enrollee voiceprint for the enrollee by applying the neural network architecture to a second set of one or more features extracted from the one or more enrollee audio signals for the enrollee, wherein the second set of one or more features includes one or more voice characteristics of the enrollee (Pg 1 In 25 - 30 "if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker, wherein the second voice biometric process is selected to be more discriminative than the first voice biometric process."); 
generate an inbound voiceprint for the inbound speaker by applying the neural network architecture to the second set of one or more features extracted from the inbound audio signal (Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled user’s speech."); and 
generate a voice similarity score for the inbound audio signal based upon one or more similarities between the inbound voiceprint and the enrollee voiceprint (Pg 39 In 20 - 25 "A voice biometric process typically compares features extracted from the received speech against a voiceprint that is made of features extracted from the enrolled user’s speech."); and 
generate a combined similarity score based upon the voice similarity score and the spoof likelihood score (Pg 18 In 4 -6 "Preferably, the method comprises the step of fusing the speaker ID score from the primary biometrics scoring with the second speaker ID score of the secondary biometrics scoring to provide a speaker authentication result.").

As per claim 20, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 14 above. Lesso (WO) in the combination further discloses:
the neural network architecture comprises one or more layers of one or more embedding extractors, including at least one of a spoofprint embedding extractor and a voiceprint embedding extractor (Pg 3 In 10 - 12 "As another example, the first and second voice biometric processes might both use Deep Neural Nets, with the second process using more weights.").

Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021) and Lesso (U.S. Patent Application Publication 2019/0228778) in view of Vitaladevuni et al. (U.S. Patent 9,704,478, listed in IDS dated 8/17/2021).
As per claims 3 and 16, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claims 2 and 15 above. The combination fails to disclose but Vitaladevuni et al. in the same field of endeavor teaches:
generating, by the computer, the one or more simulated audio signals by executing one or more data augmentation operations (Col 2 In 48 - 52 "Aspects of the present disclosure relate to masking off portions of audio signals that are output through a presentation device, such as a speaker. This masking can improve ASR performance on input that may include the output (e.g., input that may include an acoustic echo of the output).").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the augmented audio signal simulation as taught by Vitaladevuni et al. with the methods and system disclosed by the combination of Lesso (WO) and Lesso (US) to improve the robustness of the detection algorithm (see Vitaladevuni et al. Col 2 In 48 - 52).

Claims 4-5 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021) and Lesso (U.S. Patent Application Publication 2019/0228778) in view of Alzantot et al. (Non-patent literature “Deep Residual Neural Networks for Audio Spoofing Detection”, listed in IDS dated 8/17/2021).
As per claim 4, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 2 above. The combination fails to disclose but Alzantot et al. in the same field of endeavor teaches:
during a training phase: 
executing, by the computer, a loss function of the neural network architecture for the spoof likelihood score outputted by the neural network architecture, the loss function instructing the computer to update one or more hyperparameters of one or more layers of the neural network architecture based on maximizing inter-class variance and minimizing intra-class variance (Pg 1080 Section 3.2 "All models are trained by minimizing a weighted cross entropy loss function where the ratio of between weights assigned to genuine and spoofed examples are 9: 1, in order to mitigate the imbalance in the training data distribution. The cost function is minimized using Adam optimizer [29) with learning rate= 5.10.5 for 200 epochs with batch size= 32. After each epoch we save the model parameters, and we finally use the parameters with the best performance on the validation dataset.").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the loss function optimization as taught by Alzantot et al. with the method disclosed by the combination of Lesso (WO) and Lesso (US) to improve the robustness of the detection algorithm (see Alzantot et al. Pg 1080 Section 3.2).

As per claim 5, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 1 above.  The combination fails to disclose but Alzantot et al. in the same field of endeavor teaches:
applying, by the computer, the neural network architecture to the first set of one or more features extracted from the one or more enrollee audio signals to generate a feature vector corresponding to the enrollee spoofprint (Pg 1078 Section 1 "We consider different feature extraction algorithms to convert the input (raw time-domain speech waveform) into a 2D feature representation. That 2D feature representation is fed as an input into our convolutional model.").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the loss function optimization as taught by Alzantot et al. with the method disclosed by the combination of Lesso (WO) and Lesso (US) to improve the accuracy of the detection algorithm (see Alzantot et al. Pg 1078 Section 1).

As per claim 17, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 15 above.  The combination fails to disclose but Alzantot et al. in the same field of endeavor teaches:
during a training phase: 
execute a loss function of the neural network architecture for the spoof likelihood score outputted by the neural network architecture, the loss function instructing the computer to update hyperparameters of the neural network architecture based on maximizing inter-class variance and minimizing intra-class variance (Pg 1080 Section 3.2 "All models are trained by minimizing a weighted cross entropy loss function where the ratio of between weights assigned to genuine and spoofed examples are 9: 1, in order to mitigate the imbalance in the training data distribution. The cost function is minimized using Adam optimizer [29) with learning rate= 5.10.5 for 200 epochs with batch size= 32. After each epoch we save the model parameters, and we finally use the parameters with the best performance on the validation dataset.").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the loss function optimization as taught by Alzantot et al. with the system disclosed by the combination of Lesso (WO) and Lesso (US) to improve the robustness of the detection algorithm (see Alzantot et al. Pg 1080 Section 3.2).

As per claim 18, the combination of Lesso (WO) and Lesso (US) discloses all of the limitations of claim 15 above.  The combination fails to disclose but Alzantot et al. in the same field of endeavor teaches:
apply the neural network architecture to the first set of one or more features extracted from the one or more enrollee audio signals to generate a feature vector corresponding to the enrollee spoofprint (Pg 1078 Section 1 "We consider different feature extraction algorithms to convert the input (raw time-domain speech waveform) into a 2D feature representation. That 2D feature representation is fed as an input into our convolutional model.").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the loss function optimization as taught by Alzantot et al. with the system disclosed by the combination of Lesso (WO) and Lesso (US) to improve the accuracy of the detection algorithm (see Alzantot et al. Pg 1078 Section 1).

Claims 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021), Lesso (U.S. Patent Application Publication 2019/0228778) and Alzantot et al. (Non-patent literature “Deep Residual Neural Networks for Audio Spoofing Detection”, listed in IDS dated 8/17/2021) in view of Vitaladevuni et al. (U.S. Patent 9,704,478, listed in IDS dated 8/17/2021).
As per claim 6, the combination of Lesso (WO), Lesso (US) and Alzantot et al. discloses all of the limitations of claim 5 above, the combination fails to disclose but Vitaladevuni et al. in the same field of endeavor teaches:
during an enrollment phase, generating, by the computer, one or more simulated enrollee audio signals by executing one or more data augmentation operations on the one or more enrollee audio signals (Col 2 In 48 – 52 "Aspects of the present disclosure relate to masking off portions of audio signals that are output through a presentation device, such as a speaker. This masking can improve ASR performance on input that may include the output (e.g., input that may include an acoustic echo of the output)." ) .
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the augmented audio signal simulation as taught by Vitaladevuni et al with the method disclosed by Lesso (WO) as modified by Lesso (US) and Alzantot et al. to improve the robustness of the detection algorithm (see Vitaladevuni et al. Col 2 In 48 - 52).

As per claim 7, the combination of Lesso (WO), Lesso (US), Alzantot et al. and Vitaladevuni et al discloses all of the limitations of claim 6 above. Vitaladevuni et al. in the combination further discloses:
the one or more data augmentation operations includes a frequency masking data augmentation operation (Col 8 In 30 - 35 "Acoustic models can be trained for use with the particular frequency masks identified above. For example, some number n of frequency masks may be defined based on some measurement of ASR quality, user-perceived audio output quality, automated analysis of SNR, or the like.").

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021) in view of Vitaladevuni et al. (U.S. Patent 9,704,478, listed in IDS dated 8/17/2021).
As per claims 3, 12 and 16, Lesso discloses all of the limitations of claim 11 above, Lesso fails to disclose but Vitaladevuni et al. in the same field of endeavor teaches:
generating, by the computer, the one or more simulated audio signals by executing one or more data augmentation operations (Col 2 In 48 - 52 "Aspects of the present disclosure relate to masking off portions of audio signals that are output through a presentation device, such as a speaker. This masking can improve ASR performance on input that may include the output (e.g., input that may include an acoustic echo of the output).").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the augmented audio signal simulation as taught by Vitaladevuni et al. with the methods and system disclosed by Lesso to improve the robustness of the detection algorithm (see Vitaladevuni et al. Col 2 In 48 - 52).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Lesso (International Patent Application Publication WO 2019/145708, listed in IDS dated 8/17/2021) in view of Alzantot et al. (Non-patent literature “Deep Residual Neural Networks for Audio Spoofing Detection”, listed in IDS dated 8/17/2021).
As per claim 13, Lesso discloses all of the limitations of claim 11 above, Lesso further discloses: 
extracting, by the computer, a training spoofprint for a corresponding training audio signal by applying an embedding extractor of the neural network architecture on the corresponding training audio signal (Pg 16 In 25 - 30 "performing a feature extraction on the received audio, and wherein said step of performing a speaker recognition process is performed on the feature extract version of the received audio");
Lesso fails to disclose but Alzantot et al. in the same field of endeavor teaches:
executing, by the computer, a loss function of the neural network architecture according to the training spoofprint outputted by the embedding extractor for the corresponding training audio signal, the loss function instructing the computer to update one or more hyperparameters of one or more layers of the neural network architecture, the one or more hyperparameters updated based on maximizing inter-class variance and minimizing intra-class variance (Pg 1080 Section 3.2 "All models are trained by minimizing a weighted cross entropy loss function where the ratio of between weights assigned to genuine and spoofed examples are 9: 1, in order to mitigate the imbalance in the training data distribution. The cost function is minimized using Adam optimizer [29) with learning rate= 5.10.5 for 200 epochs with batch size= 32. After each epoch we save the model parameters, and we finally use the parameters with the best performance on the validation dataset.").
It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to combine the loss function optimization as taught by Alzantot et al. with the method disclosed by Lesso to improve the robustness of the detection algorithm (see Alzantot et al. Pg 1080 Section 3.2).

Response to Arguments
Applicant’s arguments, see pages 7-9, filed 8/9/2022, with respect to the rejection(s) of claim(s) 1-10 and 14-20 under U.S.C. 102 & 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Lesso  (U. S. Patent Application Publication 2019/0228778).
Since spoofprints can be the same as voiceprints, claims 11-13 maintain their current rejections.

Examiner Notes
The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner. 
Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so:
“Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.”
Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached on (571) 272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/EDWIN S LELAND III/Primary Examiner, Art Unit 2677