Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  


Status of Claims
In the amendment filed on July 22nd, 2021, claims 1, 6, 11, 17 and 19 have been amended, claim 3 has been cancelled and new claim 21 has been added.  Therefore, claims 1, 2, 4-21 are pending for examination.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-9, 16, 17 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shabtai et al. (US 2019029416 A1) in view of Mitsufuji (US 20180047407 A1) and Woodruff et al. (US 20190208317 A1) and Markhovsky et al. (US 20190285722 A1).
In regards to claim 1, Shabtai teaches a system with one or more processors and one or more processors and one or more computer readable media storing instructions executable by the one or more processors, when executed cause the system to perform operations (Paragraph 22).  Furthermore Shabtai teaches receiving audio data from a pair of audio sensors ((22)(24)(26)(28)) (such as microphones) associated with a vehicle (Paragraph 19).  Shabtai teaches the vehicle is exposed to external sound signals at the zero vector and at selected points of XY rotation from the zero vector that are consistent with the desired angular resolution around 360 degrees of rotation, and data is captured from each of the microphones 22, 24, 26 and 28 for each point in the XY rotation for each external sound signal, corresponding to the DOA (264). At each point of the XY rotation, the external sound signal includes an incident acoustic signal that includes a frequency of interest, e.g., a frequency that is associated with an audible siren that is generated by an emergency vehicle. The frequency of interest may be defined in terms of a minimum/maximum base frequency of the siren signal, which can be matched to known frequency spectrum of sounds generated by specific emergency vehicle sirens with compensation for Doppler-effect and other frequency distortions (Paragraph 38); illustrating the determining  based at least in part on a portion of audio data, angular spectrum data  and determining based at least in part on the angular spectrum data, a feature associated with the audio data.
Furthermore Shabtai further teaches the inputting of the feature into a pre-training routine that is synonymous to the machine learned model  in that a  direction of arrival value associated with the audio data is received (Paragraph 38), as well as determining based in at least in part on the audio data, an occurrence of sound associated with an emergency vehicle.  Lastly, Shabtai teaches the DoA value, a direction of the emergency vehicle relative to the vehicle (Paragraphs 29-31).
Shabtai fails to teach determining based at least in part on the angular spectrum data at least a peak value of the angular spectrum data and inputting into a machine learned model, the at least one of the distribution of the angular spectrum data, the peak value of the angular data.  Mitsufuji on the other hand teaches sound is collected from two sound sources using the linear microphone array 11 and the sound collection signal obtained as a result of the sound collection is subjected to spatial frequency analysis. It is assumed that, as a result of the analysis, in a spatial spectrum (angular spectrum) of the sound collection signal, as indicated with an arrow Q11, a spectral peak indicated with lines L11 to L13 is observed (Paragraph 41).  Furthermore, Mitsufuji mentions learning mechanism specifically directed towards the sound source spatial frequency analysis (Paragraph 88).  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Mitsufuji’s teaching with Shabtai in order to separate and analyze received sound signals to identify the source and characteristics of the sound and ultimately identify the location of the source.
Shabtai modified fails to teach determining, by the machine learned model and based at least in part on the at least one of the distribution of the angular spectrum data, the peak value of the angular spectrum data, or the energy value associated with the audio data, a direction of arrival (DoA) value associated with the audio data.  Woodruff on the other hand teaches a machine learning model that determines a direction of arrival (DoA) value associated with audio data (Claim 1).  Furthermore, the DoA value(s) may be used to classify the captured environmental sounds; such sounds that may be attention-seeking sounds (e.g., ringers, horns, alarms, sirens, etc.) (Paragraph 15), sirens that are commonly associated with emergency vehicles as such.
It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Woodruff’s teaching of using a machine learning model for the determining of DoA particularly with sounds such as sirens with Shabtai modified’s teaching in order to effectively determine the direction of arrival of an emergency vehicle via its sirens by way of the learning model.
Furthermore, Shabtai fails to teach the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  Markhovsky on the other hand teaches a split architecture is disclosed for determining the location of a wireless device in a heterogeneous wireless communications environment, wherein the precision localization methods employ a two-step location process, whereby the first step entails calculation of one or more observables (observation results): TOA, TDOA, TOF, AOA/DOA, Received Signal Phase, and associated with these results metrics (SNR, std. deviation, confidence, etc.). During second step the observation results and their metrics are utilized to determine the wireless device (target) position/navigation (Paragraph 832).  Furthermore, the two-step process' observables (TOA, TDOA, TOF, etc.) results accuracy may be uniquely enhanced by the multipath mitigation using advanced spectrum estimation (super resolution) algorithms. Similarly, AOA/DOA unique enhancements/adaptations combine the aforementioned super resolution estimates of time difference of the ranging signal, i.e. TDOA, as received at each antenna and the AOA/DOA technique that compares the phase difference of the ranging signal collected by each antenna (Paragraph 834), i.e. the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Markhovsky’s teaching with Shabtai modified’s teaching in order to accurately locate a track a mobile device, in this case applicable to detecting, locating and tracking a mobile emergency vehicle.
In regards to claim 6, Shabtai teaches a method with one or more processors  and one or more processors and one or more computer readable media storing instructions executable by the one or more processors, when executed cause the method’s system to perform operations (Paragraph 22).  Furthermore Shabtai teaches receiving audio data from a pair of audio sensors ((22)(24)(26)(28)) (such as microphones) associated with a vehicle (Paragraph 19).  Shabtai teaches the vehicle is exposed to external sound signals at the zero vector and at selected points of XY rotation from the zero vector that are consistent with the desired angular resolution around 360 degrees of rotation, and data is captured from each of the microphones 22, 24, 26 and 28 for each point in the XY rotation for each external sound signal, corresponding to the DOA (264). At each point of the XY rotation, the external sound signal includes an incident acoustic signal that includes a frequency of interest, e.g., a frequency that is associated with an audible siren that is generated by an emergency vehicle. The frequency of interest may be defined in terms of a minimum/maximum base frequency of the siren signal, which can be matched to known frequency spectrum of sounds generated by specific emergency vehicle sirens with compensation for Doppler-effect and other frequency distortions (Paragraph 38); illustrating the determining  based at least in part on a portion of audio data, angular spectrum data  and determining based at least in part on the angular spectrum data, a feature associated with the audio data.
Furthermore Shabtai further teaches the inputting of the feature into a pre-training routine that is synonymous to the machine learned model  in that a  direction of arrival value associated with the audio data is received (Paragraph 38), as well as determining based in at least in part on the audio data, an occurrence of sound associated with an emergency vehicle.  Lastly Shabtai teaches the DoA value, a direction of the emergency vehicle relative to the vehicle (Paragraphs 29-31).
Shabtai fails to teach determining based at least in part on the angular spectrum data at least a peak value of the angular spectrum data and inputting into a machine learned model, the at least one of the distribution of the angular spectrum data, the peak value of the angular data.  Mitsufuji on the other hand teaches sound is collected from two sound sources using the linear microphone array 11 and the sound collection signal obtained as a result of the sound collection is subjected to spatial frequency analysis. It is assumed that, as a result of the analysis, in a spatial spectrum (angular spectrum) of the sound collection signal, as indicated with an arrow Q11, a spectral peak indicated with lines L11 to L13 is observed (Paragraph 41).  Furthermore, Mitsufuji mentions learning mechanism specifically directed towards the sound source spatial frequency analysis (Paragraph 88).  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Mitsufuji’s teaching with Shabtai in order to separate and analyze received sound signals to identify the source and characteristics of the sound and ultimately identify the location of the source.
Shabtai modified fails to teach determining, by the machine learned model and based at least in part on the at least one of the distribution of the angular spectrum data, the peak value of the angular spectrum data, or the energy value associated with the audio data, a direction of arrival (DoA) value associated with the audio data.  Woodruff on the other hand teaches a machine learning model that determines a direction of arrival (DoA) value associated with audio data (Claim 1).  Furthermore, the DoA value(s) may be used to classify the captured environmental sounds; such sounds that may be attention-seeking sounds (e.g., ringers, horns, alarms, sirens, etc.) (Paragraph 15), sirens that are commonly associated with emergency vehicles as such.
It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Woodruff’s teaching of using a machine learning model for the determining of DoA particularly with sounds such as sirens with Shabtai’s modified’s teaching in order to effectively determine the direction of arrival of an emergency vehicle via its sirens by way of the learning model.
Furthermore, Shabtai fails to teach the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  Markhovsky on the other hand teaches a split architecture is disclosed for determining the location of a wireless device in a heterogeneous wireless communications environment, wherein the precision localization methods employ a two-step location process, whereby the first step entails calculation of one or more observables (observation results): TOA, TDOA, TOF, AOA/DOA, Received Signal Phase, and associated with these results metrics (SNR, std. deviation, confidence, etc.). During second step the observation results and their metrics are utilized to determine the wireless device (target) position/navigation (Paragraph 832).  Furthermore, the two-step process' observables (TOA, TDOA, TOF, etc.) results accuracy may be uniquely enhanced by the multipath mitigation using advanced spectrum estimation (super resolution) algorithms. Similarly, AOA/DOA unique enhancements/adaptations combine the aforementioned super resolution estimates of time difference of the ranging signal, i.e. TDOA, as received at each antenna and the AOA/DOA technique that compares the phase difference of the ranging signal collected by each antenna (Paragraph 834), i.e. the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Markhovsky’s teaching with Shabtai modified’s teaching in order to accurately locate a track a mobile device, in this case applicable to detecting, locating and tracking a mobile emergency vehicle.

In regards to claim 7, Shabtai modified teaches audio sensors associated with the vehicle including having the microphone array 20 disposed on an exterior portion of a roof of the vehicle 10 with a center point 21 and with the microphones 22, 24, 26 and 28 each having a predefined XY location relative to the center point 21 and the vehicle 10. The subject vehicle is exposed to external sound signals at the zero vector and at selected points of XY rotation from the zero vector that are consistent with the desired angular resolution around 360 degrees of rotation, and data is captured from each of the microphones 22, 24, 26 and 28 for each point in the XY rotation for each external sound signal, corresponding to the DOA (264) (Paragraph 38); hence illustrating at least two audio sensors associated with a front/left/back or right side of the vehicle  with respect to a direction of travel of the vehicle.
In regards to claim 8, Shabtai modified teaches determining an audio event being an emergency event and controlling the vehicle (autonomous) further based in part of the audio event comprising the emergency event, wherein controlling the vehicle comprises at least one of stopping the vehicle or changing the vehicle to another lane however appropriate based on the proximity and the direction of arrival of the emergency vehicle relative to the subject vehicle (Paragraphs 28, 31).
In regards to claim 9, Shabtai modified teaches an acoustic decision identifying an angle associated with the direction of arrival (DOA) can be achieved employing one of the following options: determining the MUSIC spectrum employing the RTF-based steering vector, determining the MUSIC spectrum employing the free-field based steering vector, determining the MUSIC spectrum employing the combined steering vector or a fusion of the RTF-based and free-field based MUSIC spectrum (Paragraph 75).
In regards to claim 16, Shabtai modified teaches audio data is discretized into a plurality of audio frames; wherein determining the occurrence of the sound comprises; inputting, at least a portion of the audio data into a classifier; and receiving  from the classifier, a classification of the sound, the classification comprising one or more emergency vehicle(s) (which would include an ambulance siren class, a police siren class, or a police siren class, or a fire truck siren class) (Paragraphs 9, 30, 35, 62).
In regards to claim 17, Shabtai teaches a system with one or more processors  and one or more processors and one or more computer readable media storing instructions executable by the one or more processors, when executed cause the system to perform operations (Paragraph 22).  Furthermore Shabtai teaches receiving audio data from a pair of audio sensors ((22)(24)(26)(28)) (such as microphones) associated with a vehicle (Paragraph 19).  Shabtai teaches the vehicle is exposed to external sound signals at the zero vector and at selected points of XY rotation from the zero vector that are consistent with the desired angular resolution around 360 degrees of rotation, and data is captured from each of the microphones 22, 24, 26 and 28 for each point in the XY rotation for each external sound signal, corresponding to the DOA (264). At each point of the XY rotation, the external sound signal includes an incident acoustic signal that includes a frequency of interest, e.g., a frequency that is associated with an audible siren that is generated by an emergency vehicle. The frequency of interest may be defined in terms of a minimum/maximum base frequency of the siren signal, which can be matched to known frequency spectrum of sounds generated by specific emergency vehicle sirens with compensation for Doppler-effect and other frequency distortions (Paragraph 38); illustrating the determining  based at least in part on a portion of audio data, angular spectrum data  and determining based at least in part on the angular spectrum data, a feature associated with the audio data.
Furthermore Shabtai further teaches the inputting of the feature into a pre-training routine that is synonymous to the machine learned model  in that a  direction of arrival value associated with the audio data is received (Paragraph 38), as well as determining based in at least in part on the audio data, an occurrence of sound associated with an emergency vehicle.  Lastly Shabtai teaches the DoA value, a direction of the emergency vehicle relative to the vehicle (Paragraphs 29-31).
Shabtai modified fails to teach determining, by the machine learned model and based at least in part on the at least one of the distribution of the angular spectrum data, the peak value of the angular spectrum data, or the energy value associated with the audio data, a direction of arrival (DoA) value associated with the audio data.  Woodruff on the other hand teaches a machine learning model that determines a direction of arrival (DoA) value associated with audio data (Claim 1).  Furthermore, the DoA value(s) may be used to classify the captured environmental sounds; such sounds that may be attention-seeking sounds (e.g., ringers, horns, alarms, sirens, etc.) (Paragraph 15), sirens that are commonly associated with emergency vehicles as such.
It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Woodruff’s teaching of using a machine learning model for the determining of DoA particularly with sounds such as sirens with Shabtai’s modified’s teaching in order to effectively determine the direction of arrival of an emergency vehicle via its sirens by way of the learning model.
Furthermore, Shabtai fails to teach the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  Markhovsky on the other hand teaches a split architecture is disclosed for determining the location of a wireless device in a heterogeneous wireless communications environment, wherein the precision localization methods employ a two-step location process, whereby the first step entails calculation of one or more observables (observation results): TOA, TDOA, TOF, AOA/DOA, Received Signal Phase, and associated with these results metrics (SNR, std. deviation, confidence, etc.). During second step the observation results and their metrics are utilized to determine the wireless device (target) position/navigation (Paragraph 832).  Furthermore, the two-step process' observables (TOA, TDOA, TOF, etc.) results accuracy may be uniquely enhanced by the multipath mitigation using advanced spectrum estimation (super resolution) algorithms. Similarly, AOA/DOA unique enhancements/adaptations combine the aforementioned super resolution estimates of time difference of the ranging signal, i.e. TDOA, as received at each antenna and the AOA/DOA technique that compares the phase difference of the ranging signal collected by each antenna (Paragraph 834), i.e. the direction of arrival (DoA) value being associated with a time difference of arrival (TDoA) value of a peak of the angular spectrum data.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Markhovsky’s teaching with Shabtai modified’s teaching in order to accurately locate a track a mobile device, in this case applicable to detecting, locating and tracking a mobile emergency vehicle.
In regards to claim 21, Shabtai modified via Woodruff teaches the machine learned model being a neural network (Claims 1, 10).

Claims 2, 10, 11, 18 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shabtai et al. (US 2019029416 A1) in view of Mitsufuji (US 20180047407 A1), Woodruff et al. (US 20190208317 A1) and Markhovsky et al. (US 20190285722 A1) as applied above in claims 2, 6 and 17, in further view of Hu et al. (US 20120310646 A1).
In regards to claim 2, Shabtai modified fails to specifically discuss the audio data comprising a plurality of audio frames captured overtime.  Hu on the other hand teaches sound pick-up array (frames) 10 continuously receives the sound source signal of each sound frame to generate audio signals. A frequency-domain converter 14 receives the audio signals and transforms the audio signals into frequency signals. Then, a spatial feature spotting device 16 receives the frequency signals to obtain a space-frequency spectrum and calculate the angular estimation value thereof. Next, a spatial evaluator 18 receives the space-frequency spectrum to define and output at least one spatial eigenparameter. At the same time, a speech feature spotting and evaluation device 20 receives the angular estimation value and the frequency signals to perform spotting and evaluation and output a Bhattacharyya distance. The spotting can be undertaken with an LPC (Linear Predictive Coding) method or an MFCC (Mel-scale Frequency Cepstral Coefficient) method. The space-frequency spectrum to define and output at least one spatial eigenparameter, the spatial evaluator 18 can also simultaneously receive the space-frequency spectrum and the angular estimation value to define and output at least two spatial eigenparameters; wherein the spatial eigenparameter defined by the space-frequency spectrum is an angular estimation quantity, and wherein the spatial eigenparameter defined by the angular estimation value is an angular estimation variance. Then, a detection device 22 receives the spatial eigenparameters and the Bhattacharyya distance and compares them with corresponding thresholds to determine the correctness of the key phrase(Paragraph 28); one of ordinary skill in the art may then translate this method for the same purpose as to determine  the angular spectrum comprising determining  an angular spectrum  associated with an audio frame and further determine features associated with an angular spectrum of the plurality of angular spectra and further identify by way of a pre-training/learning machine the value of the audio data in the angular spectra.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Hu’s teaching with Shabtai’ modified’s teaching in order to effectively identify the audio data in the midst of a plurality of audio information.
In regards to claim 10, Shabtai modified teaches determining the direction of arrival value associated with a received audio signal based on one or more parameters (Paragraph 31).  Shabtai however fails to specifically discuss the audio data comprising a plurality of audio frames captured overtime.  Hu on the other hand teaches sound pick-up array (frames) 10 continuously receives the sound source signal of each sound frame to generate audio signals. A frequency-domain converter 14 receives the audio signals and transforms the audio signals into frequency signals. Then, a spatial feature spotting device 16 receives the frequency signals to obtain a space-frequency spectrum and calculate the angular estimation value thereof. Next, a spatial evaluator 18 receives the space-frequency spectrum to define and output at least one spatial eigenparameter. At the same time, a speech feature spotting and evaluation device 20 receives the angular estimation value and the frequency signals to perform spotting and evaluation and output a Bhattacharyya distance. The spotting can be undertaken with an LPC (Linear Predictive Coding) method or an MFCC (Mel-scale Frequency Cepstral Coefficient) method. The space-frequency spectrum to define and output at least one spatial eigenparameter, the spatial evaluator 18 can also simultaneously receive the space-frequency spectrum and the angular estimation value to define and output at least two spatial eigenparameters; wherein the spatial eigenparameter defined by the space-frequency spectrum is an angular estimation quantity, and wherein the spatial eigenparameter defined by the angular estimation value is an angular estimation variance. Then, a detection device 22 receives the spatial eigenparameters and the Bhattacharyya distance and compares them with corresponding thresholds to determine the correctness of the key phrase(Paragraph 28); one of ordinary skill in the art may then translate this method for the same purpose as to determine  the angular spectrum  of the various microphone sensors of the vehicle, and further determine one or more parameters associated with the audio frame.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Hu’s teaching with Shabtai modified’s teaching in order to effectively identify the audio data in the midst of a plurality of audio information.
In regards to claim 11, Shabtai modified teaches one or more parameters comprise at least a frequency spectrum (Paragraphs 30, 34, 37).  Furthermore, Mitsufuji teaches one or more parameters comprising at least a frequency spectrum (Paragraph 45).
In regards to claim 18, Shabtai fails to specifically discuss the audio data comprising a plurality of audio frames captured overtime.  Hu on the other hand teaches sound pick-up array (frames) 10 continuously receives the sound source signal of each sound frame to generate audio signals. A frequency-domain converter 14 receives the audio signals and transforms the audio signals into frequency signals. Then, a spatial feature spotting device 16 receives the frequency signals to obtain a space-frequency spectrum and calculate the angular estimation value thereof. Next, a spatial evaluator 18 receives the space-frequency spectrum to define and output at least one spatial eigenparameter. At the same time, a speech feature spotting and evaluation device 20 receives the angular estimation value and the frequency signals to perform spotting and evaluation and output a Bhattacharyya distance. The spotting can be undertaken with an LPC (Linear Predictive Coding) method or an MFCC (Mel-scale Frequency Cepstral Coefficient) method. The space-frequency spectrum to define and output at least one spatial eigenparameter, the spatial evaluator 18 can also simultaneously receive the space-frequency spectrum and the angular estimation value to define and output at least two spatial eigenparameters; wherein the spatial eigenparameter defined by the space-frequency spectrum is an angular estimation quantity, and wherein the spatial eigenparameter defined by the angular estimation value is an angular estimation variance. Then, a detection device 22 receives the spatial eigenparameters and the Bhattacharyya distance and compares them with corresponding thresholds to determine the correctness of the key phrase(Paragraph 28); one of ordinary skill in the art may then translate this method for the same purpose as to determine  the angular spectrum comprising determining  an angular spectrum  associated with an audio frame and further determine features associated with an angular spectrum of the plurality of angular spectra and further identify by way of a pre-training/learning machine the value of the audio data in the angular spectra.  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to Hu’s teaching with Shabtai modified’s teaching in order to effectively identify the audio data in the midst of a plurality of audio information.
In regards to claim 19, Shabtai modified teaches one or more parameters comprise at least a frequency spectrum (Paragraphs 30, 34, 37).  Mitsufuji also teaches parameters comprising at least a frequency spectrum (Paragraph 45).

Claims 4, 5, 12-15 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shabtai et al. (US 2019029416 A1) in view of Mitsufuji (US 20180047407 A1), Woodruff et al. (US 20190208317 A1) and Markhovsky et al. (US 20190285722 A1) as applied above in claims 1, 6 and 17, in further view of Nakamura et al. (US 20170092284 A1).
In regards to claim 4, Shabtai teaches audio data is discretized into a plurality of audio frames; wherein determining the occurrence of the sound comprises; inputting, at least a portion of the audio data into a classifier; and receiving  from the classifier, a classification of the sound, the classification comprising one or more emergency vehicle(s) (which would include an ambulance siren class, a police siren class, or a police siren class, or a fire truck siren class) (Paragraphs 9, 30, 35, 62).
Shabtai fails to teach determining a start time frame indicating an on-set of the sound and determining an end time frame indicating an off-set of the sound.  Nakamura on the other hand teaches formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Nakamura's teaching with Shabtai's teaching in order to effectively track the direction of the traveling vehicle.
In regards to claim 5, Shabtai modified via Nakamura teaches determining an event that starts at the on-set of the sound and ends at the off-set of the sound; that is the voice/sound processing apparatus may further include an event detection unit configured to detect an event which changes an acoustic environment, wherein the spectrum normalization unit may use an average spectrum after the event is detected as the average spectrum acquired until the present time; wherein, formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  Furthermore, Nakamura The sound source tracking unit 102 determines whether there is a current sound source direction of a sound source determined during an utterance which is detected at a current frame within a predetermined range from a previous sound source direction detected from a predetermined number of frames (for example, 3 to 5 frames) before to an immediately previous frame. The sound source tracking unit 102 determines that a sound source determined to be related to the current sound source direction is a sound source continuing from the previous frame, and forms a sound source direction row for each sound source by causing the previous sound source direction to follow the current sound source direction (sound source tracking) (Paragraph 59).  Nakamura elaborates the sound source tracking unit 102 determines a sound source related to a sound source direction determined to be outside of a predetermined range from any previous sound source direction as a new sound source. Thus, it is specified whether the current sound source direction is a sound source direction of a sound source related to any of the sound source direction rows (Paragraphs 60, 61). In other words, if the sound source falls within the range of a predetermined audio frames, it is identified as a stored event, whereas if the tacked sound frames are found outside of the predetermined threshold (below or above), the tracked sound is not a recognized event (previously/familiar grouped) and classified as a new event (removed for identified events).
In regards to claim 12, Shabtai teaches audio data is discretized into a plurality of audio frames; wherein determining the occurrence of the sound comprises; inputting, at least a portion of the audio data into a classifier; and receiving  from the classifier, a classification of the sound, the classification comprising one or more emergency vehicle(s) (which would include an ambulance siren class, a police siren class, or a police siren class, or a fire truck siren class) (Paragraphs 9, 30, 35, 62).
Shabtai fails to teach determining a start time frame indicating an on-set of the sound and determining an end time frame indicating an off-set of the sound.  Nakamura on the other hand teaches formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Nakamura's teaching with Shabtai’s teaching in order to effectively track the direction of the traveling vehicle.
In regards to claim 13, Shabtai modified via Nakamura teaches determining an event that starts at the on-set of the sound and ends at the off-set of the sound; that is the voice/sound processing apparatus may further include an event detection unit configured to detect an event which changes an acoustic environment, wherein the spectrum normalization unit may use an average spectrum after the event is detected as the average spectrum acquired until the present time; wherein, formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  Furthermore, Nakamura The sound source tracking unit 102 determines whether there is a current sound source direction of a sound source determined during an utterance which is detected at a current frame within a predetermined range from a previous sound source direction detected from a predetermined number of frames (for example, 3 to 5 frames) before to an immediately previous frame. The sound source tracking unit 102 determines that a sound source determined to be related to the current sound source direction is a sound source continuing from the previous frame, and forms a sound source direction row for each sound source by causing the previous sound source direction to follow the current sound source direction (sound source tracking) (Paragraph 59).  Nakamura elaborates the sound source tracking unit 102 determines a sound source related to a sound source direction determined to be outside of a predetermined range from any previous sound source direction as a new sound source. Thus, it is specified whether the current sound source direction is a sound source direction of a sound source related to any of the sound source direction rows (Paragraphs 60, 61). In other words, if the sound source falls within the range of a predetermined audio frames, it is identified as a stored event, whereas if the tacked sound frames are found outside of the predetermined threshold (below or above), the tracked sound is not an event (removed/isolated from identified events) and classified as a new event.
In regards to claim 14, Shabtai modified via Nakamura teaches determining an event that starts at the on-set of the sound and ends at the off-set of the sound; that is the voice/sound processing apparatus may further include an event detection unit configured to detect an event which changes an acoustic environment, wherein the spectrum normalization unit may use an average spectrum after the event is detected as the average spectrum acquired until the present time; wherein, formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  Furthermore, Nakamura elaborates the sound source tracking unit 102 determines whether there is a current sound source direction of a sound source determined during an utterance which is detected at a current frame within a predetermined range from a previous sound source direction detected from a predetermined number of frames (for example, 3 to 5 frames) before to an immediately previous frame. The sound source tracking unit 102 determines that a sound source determined to be related to the current sound source direction is a sound source continuing from the previous frame, and forms a sound source direction row for each sound source by causing the previous sound source direction to follow the current sound source direction (sound source tracking) (Paragraph 59).  Nakamura elaborates The sound source tracking unit 102 determines a sound source related to a sound source direction determined to be outside of a predetermined range from any previous sound source direction as a new sound source. Thus, it is specified whether the current sound source direction is a sound source direction of a sound source related to any of the sound source direction rows (Paragraphs 60, 61). In other words, if the sound source falls within the range of a predetermined audio frames, it is identified as a stored event, whereas if the tacked sound frames are found outside of the predetermined threshold (below or above), the tracked sound is not an event (removed/isolated from identified events) and classified as a new event.
Using Nakamura’s teaching of the sound source tracking unit 102 determines whether there is a current sound source direction of a sound source determined during an utterance which is detected at a current frame within a predetermined range from a previous sound source direction detected from a predetermined number of frames (for example, 3 to 5 frames) before to an immediately previous frame. The sound source tracking unit 102 determines that a sound source determined to be related to the current sound source direction is a sound source continuing from the previous frame, and forms a sound source direction row for each sound source by causing the previous sound source direction to follow the current sound source direction (sound source tracking) (Paragraph 59) as well as the sound source tracking unit 102 determines a sound source related to a sound source direction determined to be outside of a predetermined range from any previous sound source direction as a new sound source.  One of ordinary skill in the art may then analyze multiple audio events and further determine whether the detected audio signal(s) audio frame counts match a given threshold of a categorized event and otherwise group/remove/ or classify the audio frames outside the categorized event(s) as a new classification.
In regards to claim 15, Shabtai modified teaches determining based at least in part on the Doppler frequency shifting, at least one audio event approaching the vehicle  (Paragraph 38).
In regards to claim 20, Shabtai teaches audio data is discretized into a plurality of audio frames; wherein determining the occurrence of the sound comprises; inputting, at least a portion of the audio data into a classifier; and receiving  from the classifier, a classification of the sound, the classification comprising one or more emergency vehicle(s) (which would include an ambulance siren class, a police siren class, or a police siren class, or a fire truck siren class) (Paragraphs 9, 30, 35, 62).
Shabtai fails to teach determining a start time frame indicating an on-set of the sound and determining an end time frame indicating an off-set of the sound.  Nakamura on the other hand teaches formulas representing a frame number at a start time point of the previous utterance k-1 and a frame number at an end time point of the previous utterance k-1, respectively pertaining to the audio data of the traveling vehicle (Paragraphs 65-66).  It would have been obvious to a person of ordinary skill in the art before the effective filing of the invention to combine Nakamura’s teaching with Shabtai's teaching in order to effectively track the direction of the traveling vehicle.

Response to Arguments
Examiner acknowledges applicants amendments and has addressed above under new grounds of rejections above.
                                                                                                                                                                        Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY D AFRIFA-KYEI whose telephone number is (571)270-7826. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HAI PHAN can be reached on 5712726338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANTHONY D AFRIFA-KYEI/Examiner, Art Unit 2685                                

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2685