DETAILED ACTION
1.	This communication is in response to the Application filed on 6/10/2020. Claims 1-20 are pending and have been examined.
Allowable Subject Matter
2.	Claims 2-4, 11-13, 15-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. However, for claims 2, 11, 15, the applicant is further requested to clarify the two ‘loss signals’ because they are not sufficiently defined/explained in the Specification, even with equations 1 and 6. 
Claim Rejections - 35 USC § 103
3.	Claims 1, 5-10, 14, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tewfik, et al. (US 20080215333; hereinafter TEWFIK) in view of Schonherr, et al. (arXiv, 2018; hereinafter Schonherr).
As per claim 1, TEWFIK (Title: Embedding Data in Audio and Detecting Embedded Data in Audio) discloses “A computer-implemented method for [ speech recognition ], the method comprising: 
sampling an audio input signal to generate a time-domain sampled input signal; converting the time-domain sampled input signal to a frequency-domain input signal (TEWFIK, [0072], the audio data is segmented into blocks, specifically audio segments of 512 samples each. Each audio segment (block) is weighted with a Hanning window. Consecutive blocks overlap by fifty percent. In step 38, a fast Fourier transform (FFT) is used to convert the segments to the frequency domain);
generating perceptual weights in response to frequency components of critical bands of the frequency-domain input signal; creating a time-domain adversary signal in response to the perceptual weights (TEWFIK, [0080], watermarking a sound according 40, the masking threshold is approximated with a 10th order all-pole filter, M(w), using a least squares criterion, which is part of the MPEG Audio Psychoacoustic Model 1. Note that this is the perceptual masking of the watermark, as represented by PN-sequence 35, in the frequency domain. Thus, the PN-sequence is filtered with the approximate masking filter, M(w), in order to ensure that the spectrum of the watermark is below the masking threshold (i.e., so that it cannot be heard or perceived by the human ear)); and
combining the time-domain adversary signal with the audio input signal to create a combined audio signal, [ wherein a speech processing of the combined audio signal outputs a different result from speech processing of the audio input signal ] (TEWFIK, [0075],The resulting scaled masked watermark is in step 50 added to the audio signal as each segment thereof has been weighted with a Hanning window in step 36; [0078], The signals resulting from steps 66 and 68 are added together in step 70, to which the audio signal itself is added in step 72 to generate the audio signal including a watermark).” 
TEWFIK does not expressly disclose “speech recognition .. wherein a speech processing of the combined audio signal outputs a different result from speech processing of the audio input signal ..” However, the limitation is taught by Schonherr (Title: Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding).
In the same field of endeavor, Schonherr teaches: [Title] and [Abstract] “.. adversarial examples based on psychoacoustic hiding. Our attack exploits the characteristics of DNN-based ASR systems .. to embed an arbitrary audio input with a malicious voice command that is then transcribed by the ASR system <read on different result>, with the audio signal remaining barely distinguishable from the original signal.”
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Schonherr in the system taught by TEWFIK to serve the purpose of attacking/testing a speech recognizer with barely perceptual adversarial audio. 
As per claim 5 (dependent on claim 1), TEWFIK in view of Schonherr further discloses “wherein the time-domain sampled input signal is converted to the frequency-domain input signal via a symmetrical algorithm including a Fast Fourier Transform, a Discrete Fourier Transform, or a symmetrical filter bank (TEWFIK, [0072], a fast Fourier transform (FFT) is used to convert the segments <read on time-domain samples> to the frequency domain <DFT/FFT to convert time-domain audio signal for transform-domain processing are basic common knowledge, why is this considered potential patentable limitation?>).”    
As per claim 6 (dependent on claim 1), TEWFIK in view of Schonherr further discloses “wherein sampling the audio input signal is via an analog to digital A/D converter, and outputting the time-domain adversary is directly to a digital to analog D/A converter (TEWFIK, [0010], Typically, sound data are subject to signal processing operations such as filtering, resampling .. audio-to-digital and subsequent digital-to-audio conversion, etc. <ADC and DAC .” 
As per claim 7 (dependent on claim 1), TEWFIK in view of Schonherr further discloses “identifying the critical bands via a psychoacoustic model of a human ear (TEWFIK, [0027], The human auditory system can be modeled by a set of 26 bandpass filters with bandwidths that increase with increasing frequency. The 26 bands are known as the critical bands. The critical bands are defined around a center frequency in which the noise bandwidth is increased until there is just a noticeable difference in the tone at the center frequency. Thus, if a faint tone lies in the critical band of a louder tone, the faint tone will not be perceptible <critical bands are basic common knowledge for human hearing system for perceptual audio processing, why is this considered potential patentable limitation?>).  
As per claim 8 (dependent on claim 7), TEWFIK in view of Schonherr further discloses “wherein the psychoacoustic model is an MPEG psychoacoustic model or an AAC psychoacoustic model (TEWFIK, [0072], To actually generate the watermark, a masking threshold of the signal is first calculated using the MPEG Audio Psychoacoustic Model 1).”
Claims 9, 14 (similar in scope to claim 1) are rejected under the same rationale as applied above for claim 1.  
Claims 10, 18 (similar in scope to claim 7) are rejected under the same rationale as applied above for claim 7.  
Claim 19 (similar in scope to claim 8) is rejected under the same rationale as applied above for claim 8. 
Claim 20 (similar in scope to claim 1) is rejected under the same rationale as applied above for claim 1. 
 				Conclusion
4.	 Any inquiry concerning this communication or earlier communications from the examiner should be directed to FENG-TZER TZENG whose telephone number is (571)272-4609. The examiner can normally be reached on M-F (8:00-5:30). The fax phone number where this application or proceeding is assigned is 571-273-4609.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir (SPE) can be reached on (571)272-7799.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/FENG-TZER TZENG/	4/2/2021

Primary Examiner, Art Unit 2659