DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 1/20/2022 have been fully considered but they are not persuasive.  Regarding claim 1, the examiner respectfully disagrees with the applicant’s statement that “Silver does not teach or suggest causing an autonomous vehicle to undertake a safety related action in response to determining the magnitude satisfies a decision threshold”.  In the combination of Wingate and Silver, Silver makes obvious estimating the relative velocity of an emergency vehicle (i.e., the estimated direction of motion) with respect to, at least, a speed away or towards the microphone array (see Silver, ¶ 0044 and 0060 and figure 7).  Then, Silver makes obvious, based on the estimated direction of motion, which actions to take, such as determining the appropriate action according to if the emergency vehicle is approaching from behind or a side street, or if the emergency vehicle is getting further away in front of the vehicle (see Silver, ¶ 0063 and 0075-0077).  Silver teaches that the decisions to take an appropriate action involves comparisons to decision thresholds, or likelihood thresholds such that no action is taken when the minimum likelihood threshold is met (see Silver, ¶ 0075-0077).  
Regarding claim 8, see the preceding remarks with respect to claim 1.  The combination of Wingate and Silver make obvious a non-transitory computer readable storage medium with the claimed features including the amended portions with respect to causing an autonomous vehicle to undertake a safety related action (see Silver, ¶ 0044, 0060, 0063, and 0075-0077).
Regarding claim 15, see the preceding remarks with respect to claim 1.  The combination of Wingate and Silver make obvious a system with the claimed features including the amended portions 
Therefore, claims 1-2, 4-6, 8-9, 11-13, 15-20, and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wingate and Silver.  Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wingate and Silver further in view of LeBlanc.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-2, 4-6, 8-9, 11-13, 15-20, and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable over Wingate et al., US 2016/0071526 A1 (previously cited and hereafter Wingate), in view of Silver et al., US 2019/0355251 A1 (previously cited in an IDS received 5/13/21 and hereafter Silver).
Regarding claim 1, Wingate discloses a system for acoustic source tracking and selection (see Wingate, abstract).  Herein, Wingate teaches fixed and adaptive beamforming techniques with a microphone array to filter out noise from directions that are different than the desired audio signal (see Wingate, ¶ 0046-0047, 0055-0056, 0082-0083, and 0086, figure 1A, unit 110, and figure 4, unit 410).  Then, Wingate teaches audio source separation to remove portions identified as noise or boost portions of the audio input identified as a desired audio signal, where source separation is performed on output 
Silver teaches a system for detecting and responding to sirens (see Silver, abstract).  Silver teaches the use of a microphone array where beamforming is used to focus on locations that are most relevant to detecting sounds from emergency vehicles and to ignore interfering sounds from other locations (see Silver, ¶ 0039 and 0073, and figures 3A-3D, units 152a-d).  The beams are formed in multiple directions to improve the SNR and improve detection of the siren direction, range, and/or velocity (see Silver, ¶ 0074).  Importantly, Silver teaches that the siren’s velocity is estimate from Doppler shifts of the siren frequency (see Silver, ¶ 0044).  It would have been obvious to one of ordinary skill in the art at the time of the effective filing date to modify Wingate with the teachings of Silver to use beamforming in multiple directions to improve the tracking of an acoustic source, such as tracking the audio source’s direction, range, and/or velocity (see Silver, ¶ 0074).  Therefore, the combination makes obvious:
“A processor-implemented method for audio-based detection and tracking of an acoustic source, the method comprising: 
performing, by a processor-based system, beamforming on a plurality of acoustic signal spectra to generate a first beam signal spectrum and a second beam signal spectrum, the acoustic signal spectra 

“detecting, by a deep neural network (DNN) classifier, an acoustic event associated with the acoustic source, in at least one of the first beam signal spectrum and the second beam signal spectrum;” where Wingate teaches that source separation is performed in time-frequency bins, where a DNN is used in a nonnegative tensor factorization (NTF) with a Neural Net (NN) redux method; the NTF with NN redux method is used to identify specific acoustic sources in the audio spectrum for further extraction with mask data (see Wingate, ¶ 0134-0135, 0146-0147, 0150, 0211-0213, 0220-0221, 0231, 0234, 0306-0307, 0311-0312, and 0318, figure 9, steps 930, 932, 934, and 936, and figure 12);

“performing, by the processor-based system, pattern extraction in response to the detection, the pattern comprising identified time and frequency bins of the plurality of acoustic signal spectra, the bins associated with the acoustic event;” where Wingate teaches performing extraction based on the signal spectrum using calculated mask functions (see Wingate, ¶ 0231-0236, 0297-0300, and 0317, and figure 12, steps 1230, 1240, and 1250);

“estimating, by the processor-based system, a direction of motion of the acoustic source relative to the array of microphones, the estimation based on a Doppler frequency shift of the acoustic event;” where Wingate teaches updating the acoustic source tracking according to relative motion or movement, and Silver makes obvious updating the acoustic source tracking according to Doppler shifts of a siren frequency, such that the siren from an emergency vehicles is tracked using better estimates of range and velocity (see Wingate, ¶ 0113-0114 and figure 6, steps 660-670, in view of Silver, ¶ 0044 and 0073-0074);

“comparing a magnitude of the estimated direction of motion to a decision threshold; and” where Silver makes obvious that the estimated direction of motion, such as the estimated relative velocity (e.g., at least speed away or towards the microphone array on the vehicle) is used to determine which actions to take, such as determining the appropriate action according to if the emergency vehicle is approaching from behind or a side street, or if the emergency vehicle is getting further away in front of the vehicle, where these actions clearly involve a comparison to decision thresholds (see Silver, ¶ 0044, 0060, 0063, and 0075-0077); and 

“in response to determining the magnitude satisfies the decision threshold, causing an autonomous vehicle to undertake a safety related action” where Silver makes obvious that when it is determined to take an action, the actions include a safety related action, such as pulling over, slowing down, etc. (see Silver, ¶ 0076).

Regarding claim 2, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “method of claim 1, further comprising: applying a Generalized Cross Correlation 
Regarding claim 4, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “method of claim 1, wherein the pattern extraction comprises comparing one or more of the plurality of acoustic signal spectra to a predetermined spectrum associated with the expected pattern, and identifying time and frequency bins that match, based on the comparison, to within a threshold value” because Wingate teaches pattern extraction with calculated mask data in the time-frequency data, and the output (i.e., extracted) audio comprises the data that is greater than a certain threshold (see Wingate, ¶ 0211-0212, 0220-0221, 0231, and 0234 and figure 9, steps 930, 932, and 934).
Regarding claim 5, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “method of claim 1, wherein the pattern extraction comprises applying a neural network to one or more of the plurality of acoustic signal spectra, the neural network trained to generate scores for time and frequency bins of the acoustic signal spectra that indicate a probability of matching to an acoustic event of interest” because Wingate teaches a DNN to indicate probabilities that certain spectra belongs to one or more audio sources (see Wingate, ¶ 0211-0213, 0220-0221, 0231-0236, 0259-0262, 0297-0300, 0306-0307, 0311-0312, and 0317-0318, figure 9, steps 930, 932, 934, and 936, and figure 12).
Regarding claim 6, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “method of claim 1, wherein the acoustic source is an emergency vehicle and the acoustic event is a siren” because Silver makes obvious the detection of emergency vehicles so that an 
Regarding claim 8, see the preceding rejection with respect to claim 1 above.  The combination of Wingate and Silver makes obvious the method of claim 1, and for the same reasons makes obvious the instant claim.  Specifically, the combination makes obvious:
“At least one non-transitory computer readable storage medium comprising instructions encoded thereon that, when executed, cause one or more processors to at least: 
perform beamforming on a plurality of acoustic signal spectra to generate a first beam signal spectrum and a second beam signal spectrum, the acoustic signal spectra generated from acoustic signals received from an array of microphones;” (see Wingate, ¶ 0046-0047, 0055-0056, 0082-0083, and 0086, figure 1A, unit 110, and figure 4, unit 410, in view of Silver, ¶ 0073-0074);

“detect, by a deep neural network (DNN) classifier, an acoustic event associated with the acoustic source, in at least one of the first beam signal spectrum and the second beam signal spectrum;” (see Wingate, ¶ 0134-0135, 0146-0147, 0150, 0211-0213, 0220-0221, 0231, 0234, 0306-0307, 0311-0312, and 0318, figure 9, steps 930, 932, 934, and 936, and figure 12);

“perform pattern extraction in response to the detection, the pattern comprising identified time and frequency bins of the plurality of acoustic signal spectra, the bins associated with the acoustic event;” (see Wingate, ¶ 0231-0236, 0297-0300, and 0317, and figure 12, steps 1230, 1240, and 1250);

“estimate a direction of motion of the acoustic source relative to the array of microphones, the estimation based on a Doppler frequency shift of the acoustic event, the Doppler frequency shift calculated from the time and frequency bins of the extracted pattern;” (see Wingate, ¶ 0113-0114 and figure 6, steps 660-670, in view of Silver, ¶ 0044)

“compare a magnitude of the estimated direction of motion to a decision threshold; and” where Silver makes obvious that the estimated direction of motion, such as the estimated relative velocity (e.g., at least speed away or towards the microphone array on the vehicle) is used to determine which actions to take, such as determining the appropriate action according to if the emergency vehicle is approaching from behind or a side street, or if the emergency vehicle is getting further away in front of the vehicle, where these actions clearly involve a comparison to decision thresholds (see Silver, ¶ 0044, 0060, 0063, and 0075-0077); and

“in response to determining the magnitude satisfies the decision threshold, cause an autonomous vehicle to undertake a safety related action.” where Silver makes obvious that when it is determined to take an action, the actions include a safety related action, such as pulling over, slowing down, etc. (see Silver, ¶ 0076).
claim 9, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “computer readable storage medium of claim 8, wherein the instructions cause the one or more processors to: apply a Generalized Cross Correlation Phase Transform to the plurality of acoustic signal spectra to generate an angular spectrum; and estimate a direction of the acoustic source relative to the array of microphones based on detection of a peak in the angular spectrum” because Silver teaches that the Generalized Cross Correlation Phase Transform algorithm is used for estimating a direction, and it is obvious that the direction estimate is based on a peak in the angular spectrum (see Silver, ¶ 0045). 
Regarding claim 11, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “computer readable storage medium of claim 8, wherein the instructions cause the one or more processors to compare one or more of the plurality of acoustic signal spectra to a predetermined spectrum associated with the expected pattern, and identify time and frequency bins that match, based on the comparison, to within a threshold value when performing the pattern extraction process” because Wingate teaches pattern extraction with calculated mask data in the time-frequency data, and the output (i.e., extracted) audio comprises the data that is greater than a certain threshold (see Wingate, ¶ 0211-0212, 0220-0221, 0231, and 0234 and figure 9, steps 930, 932, and 934).
Regarding claim 12, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “computer readable storage medium of claim 8, wherein the instructions cause the one or more processors to apply a neural network to one or more of the plurality of acoustic signal spectra, the neural network trained to generate scores for time and frequency bins of the acoustic signal spectra that indicate a probability of matching to an acoustic event of interest, when performing the pattern extraction process” because Wingate teaches a DNN to indicate probabilities that certain spectra belongs to one or more audio sources (see Wingate, ¶ 0211-0213, 0220-0221, 0231-0236, 0259-
Regarding claim 13, see the preceding rejection with respect to claim 8 above.  The combination makes obvious the “computer readable storage medium of claim 8, wherein the acoustic source is an emergency vehicle and the acoustic event is a siren” because Silver makes obvious the detection of emergency vehicles so that an autonomous vehicle gains awareness and reacts to detected emergency vehicles (see Silver, ¶ 0018, 0041, and 0045).
Regarding claim 15, see the preceding rejection with respect to claim 1 above.  The combination of Wingate and Silver makes obvious the method of claim 1, and for the same reasons makes obvious the instant claim.  Specifically, the combination makes obvious:
“A system for audio-based detection and tracking of an acoustic source, the system comprising: 
a beamforming circuit to perform beamforming on a plurality of acoustic signal spectra to generate a first beam signal spectrum and a second beam signal spectrum, the acoustic signal spectra generated from acoustic signals received from an array of microphones;” (see Wingate, ¶ 0046-0047, 0055-0056, 0082-0083, and 0086, figure 1A, unit 110, and figure 4, unit 410, in view of Silver, ¶ 0073-0074);

“a deep neural network (DNN) classifier to detect an acoustic event associated with the acoustic source, in at least one of the first beam signal spectrum and the second beam signal spectrum;” (see Wingate, ¶ 0134-0135, 0146-0147, 0150, 0211-0213, 0220-0221, 0231, 0234, 0306-0307, 0311-0312, and 0318, figure 9, steps 930, 932, 934, and 936, and figure 12);

“a pattern extraction circuit to perform pattern extraction in response to the detection, the pattern comprising identified time and frequency bins of the plurality of acoustic signal spectra, the bins associated with the acoustic event; and” (see Wingate, ¶ 0231-0236, 0297-0300, and 0317, and figure 12, steps 1230, 1240, and 1250);

“a movement direction estimation circuit to estimate a direction of motion of the acoustic source relative to the array of microphones, the estimation based on a Doppler frequency shift of the acoustic event, the Doppler frequency shift calculated from the time and frequency bins of the extracted pattern; and” (see Wingate, ¶ 0113-0114 and figure 6, steps 660-670, in view of Silver, ¶ 0044); and

“a thresholding circuit to:
compare a magnitude of the estimated direction of motion to a decision threshold; and” where “thresholding circuit” is interpreted in view of the original specification as programmable circuitry such as computer processors executing instructions (see Instant, pp. 20-21, ¶ 0076), such that Silver makes obvious software run on processors (see Silver, ¶ 0024-0025) where the estimated direction of motion, such as the estimated relative velocity (e.g., at least speed away or towards the microphone array on 

“in response to determining the magnitude satisfies the decision threshold, cause an autonomous vehicle to undertake a safety related action.” where Silver makes obvious that when it is determined to take an action, the actions include a safety related action, such as pulling over, slowing down, etc. (see Silver, ¶ 0076).

Regarding claim 16, see the preceding rejection with respect to claim 15 above.  The combination makes obvious the “system of claim 15, further comprising a direction of arrival estimation circuit to: apply a Generalized Cross Correlation Phase Transform to the plurality of acoustic signal spectra to generate an angular spectrum; and estimate a direction of the acoustic source relative to the array of microphones based on detection of a peak in the angular spectrum” because Silver teaches that the Generalized Cross Correlation Phase Transform algorithm is used for estimating a direction, and it is obvious that the direction estimate is based on a peak in the angular spectrum (see Silver, ¶ 0045). 
Regarding claim 18, see the preceding rejection with respect to claim 15 above.  The combination makes obvious the “system of claim 15, wherein the pattern extraction circuit is further to compare one or more of the plurality of acoustic signal spectra to a predetermined spectrum associated with the expected pattern, and identify time and frequency bins that match, based on the comparison, to within a threshold value” because Wingate teaches pattern extraction with calculated mask data in the time-frequency data, and the output (i.e., extracted) audio comprises the data that is greater than a certain threshold (see Wingate, ¶ 0211-0212, 0220-0221, 0231, and 0234 and figure 9, steps 930, 932, and 934).
Regarding claim 19, see the preceding rejection with respect to claim 15 above.  The combination makes obvious the “system of claim 15, wherein the pattern extraction circuit further comprises a neural network for application to one or more of the plurality of acoustic signal spectra, the neural network trained to generate scores for time and frequency bins of the acoustic signal spectra that 
Regarding claim 20, see the preceding rejection with respect to claim 15 above.  The combination makes obvious the “system of claim 15, wherein the acoustic source is an emergency vehicle and the acoustic event is a siren” because Silver makes obvious the detection of emergency vehicles so that an autonomous vehicle gains awareness and reacts to detected emergency vehicles (see Silver, ¶ 0018, 0041, and 0045).
Regarding claim 22, see the preceding rejection with respect to claim 1 above.  The combination makes obvious the “method of claim 1, further including calculating the Doppler frequency shift based on the time and frequency bins of the extracted pattern, a known frequency of the acoustic event, and a velocity of the autonomous vehicle” where Wingate teaches the output of time and frequency bins of extracted patterns and Silver makes obvious that an extracted pattern is matched to known siren sounds with which the classifier has been trained, and calculating the Doppler frequency shift of siren frequencies (see Wingate, ¶ 0134-0135, 0146-0147, 0150, and 0203-0206, in view of Silver, ¶ 0041-0047, 0062, and 0078).
Regarding claim 23, see the preceding rejection with respect to claim 15 above.  The combination makes obvious the “system of claim 15, wherein the safety related action including at least one of pulling over the autonomous vehicle on to a shoulder or stop the autonomous vehicle” where Silver makes obvious that when it is determined to take an action, the actions include a safety related action, such as pulling over, slowing down, stopping, etc. (see Silver, ¶ 0076).

Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Wingate and Silver as applied to claims 1, 8, and 15 above, and further in view of LeBlanc et al., US 2008/0189100 A1 (previously cited and hereafter LeBlanc).
Regarding claim 7, see the preceding rejection with respect to claim 1 above.  The combination of Wingate and Silver makes obvious the method of claim 1, where Silver teaches that beamforming helps reduce wind noise which interferes with detection, but the combination does not appear to teach a high-pass filter for this purpose.
LeBlanc teaches a method and system for improving speech quality that is distorted (see LeBlanc, abstract).  Specifically, LeBlanc teaches a system that compensates for detected wind noise, where a wind noise detection above a certain threshold activates a high-pass filter to reduce the wind noise before processing (see LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).  It would have been obvious to one of ordinary skill in the art at the time of the effective filing date to modify the combination of Wingate and Silver with the teachings of LeBlanc for the purpose of improving the detection and tracking of an audio source in the presence of wind noise.  Therefore, the combination of Wingate, Silver, and LeBlanc makes obvious the “method of claim 1, further comprising applying a high-pass filter to the acoustic signals to reduce wind noise” (see Silver, ¶ 0073, in view of LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).
Regarding claim 14, see the preceding rejection with respect to claim 8 above.  The combination of Wingate and Silver makes obvious the computer readable storage medium of claim 8, but does not appear to teach the high-pass filter.
As stated above with respect to claim 7, LeBlanc teaches a system that compensates for detected wind noise, where a wind noise detection above a certain threshold activates a high-pass filter to reduce the wind noise before processing (see LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).  It would have been obvious to one of ordinary skill in the art at the time of the effective instructions cause the one or more processors to apply a high-pass filter to the acoustic signals to reduce wind noise” (see Silver, ¶ 0073, in view of LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).
Regarding claim 21, see the preceding rejection with respect to claim 15 above.  The combination of Wingate and Silver makes obvious the system of claim 15, but does not appear to teach the high-pass filter.
As stated above with respect to claim 7, LeBlanc teaches a system that compensates for detected wind noise, where a wind noise detection above a certain threshold activates a high-pass filter to reduce the wind noise before processing (see LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).  It would have been obvious to one of ordinary skill in the art at the time of the effective filing date to modify the combination of Wingate and Silver with the teachings of LeBlanc for the purpose of improving the detection and tracking of an audio source in the presence of wind noise.  Therefore, the combination of Wingate, Silver, and LeBlanc makes obvious the “system of claim 15, further comprising a signal conditioning circuit to apply a high-pass filter to the acoustic signals to reduce wind noise” (see Silver, ¶ 0073, in view of LeBlanc, ¶ 0024 and 0026-0027, and figure 4, units 400 and 403-404).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniel R Sellers whose telephone number is (571)272-7528. The examiner can normally be reached Mon - Fri 10:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan S Tsang can be reached on (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit 





/Daniel R Sellers/Examiner, Art Unit 2653