DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office action is in response to application filed on 8/5/2020.
Claim(s) 1-20 is/are pending in this Office Action.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d).
The certified copy has been filed in parent Application No. ROA202000318, filed 6/4/2020.
Information Disclosure Statement
Applicant’s information disclosure statement(s) (IDS) submitted on 8/5/2020 is/are being considered by the examiner. 	
Specification



Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided.
The abstract of the disclosure is objected to because the abstract is not clear and concise. The abstract is one long run-on sentence, and the examiner finds this format to be unclear.  Correction is required.  See MPEP § 608.01(b).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6, 12, 14-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 6, it is unclear whether the “safety-related events” are the “safety-related events” in claim 1, or if they’re entirely different components. For the purposes of examination, the examiner is interpreting “safety-related events” in claim 6 to be “the safety-related events”, instead.  
Regarding claim 12, it is unclear whether the “safety-related events” are the “safety-related events” in claim 8, or if they’re entirely different components. For the purposes of examination, the examiner is interpreting “safety-related events” in claim 12 to be “the safety-related events”, instead.  
Regarding claim 14, the limitation of the preamble, “…comprising a first plurality sensors” is unclear. For the purposes of examination, the examiner is interpreting the limitation to “…comprising a first plurality of sensors”, instead. 	
Claim 16 recites the limitation “the first plurality of sound sensors" in line 2.  There is insufficient antecedent basis for this limitation in the claim. For the purposes of examination, the examiner is interpreting the limitation to be “the first plurality of sensors”, instead. 	
Regarding claim 20, it is unclear whether the “safety-related events” are the “safety-related events” in claim 14, or if they’re entirely different components. For the purposes of examination, the examiner is interpreting “safety-related events” in claim 20 to be “the safety-related events”, instead.  	
Claims 15, 17-19 are rejected due to their dependency on a rejected base claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.




Claim(s) 1-14, 17-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Leenayongwut et al. (US 2020/0241551 A1), hereafter referred to as Leenayongwut.
Regarding claim 1, Leenayongwut teaches a computer-implemented method for generating a sound-enhanced sensing envelope, comprising: 
collecting, by a plurality of sensors (“image sensors 1301”, Fig. 13) and one or more passive sound sensors (“microphone array 1302”, Fig. 13) of a vehicle (“autonomous vehicle 100”, Fig. 1, “FIG. 13 is a block diagram of a sound classification and sound source localization system 1300 for AV 100”, para. 0131), sensor data signals (“pairs of image frames captured by image sensors 1301 and sounds captured by microphone array 1302”, para. 0136) characterizing an exterior environment (“environment 190”, Fig. 1) of the vehicle; 
processing one or more direct sensor data signals (“pairs of image frames captured by image sensors”, para. 0136) which are collected from the plurality of sensors to generate a sensing envelope (“field of view (FOV)”, para. 0054) around the vehicle using direct sensing data signals (“VBSL module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301 and sounds captured by microphone array 1302 to localize sound sources in a vision scene. The sound and vision modalities are processed, respectively, in separate sound and visual neural networks”, para. 0136, “sound sources within a field of view (FOV) of an image sensor of the autonomous vehicle are localized in a visual scene generated by the perception module”, para. 0054); 
processing one or more indirect sensor data signals (“sounds captured by microphone array 1302 to localize sound sources in a vision scene”, para. 0136) which are collected from the one or more passive sound sensors to generate a sound-enhanced sensing envelope (“sound sources within a field of view (FOV) of an image sensor…”, para. 0054) around the vehicle using indirect sensor data signals (see para. 0136 citation above in the previous “processing” step, “The addition of sound source information allows the planning module to make more informed prediction of the dynamic state of the ambulance then could otherwise be determined from the image data itself”, para. 0143, see also “FIG. 17 is a flow diagram of a process 1700 of using classified and localized objects to operate an AV, according to an embodiment”, para. 0147); and 
using the sound-enhanced sensing envelope to evaluate advanced driver assistance system commands (“perform an action”, para. 0151, see also “generate a route or trajectory”, para. 0054) for the vehicle with respect to safety-related events (“avoid collision”, para. 0145, see also “crashes”, para. 0074) identified by the indirect sensor data signals (“Process 1700 continues by causing, by the processing-circuit, the AV to perform an action based on the classified sound and the determined location of the sound source in the ambient sound environment (1704)”, para. 0151, “The planning module uses the visual scene and/or static digital map, together with the current location of the autonomous vehicle and other information (e.g., traffic information, passenger preferences), to generate a route or trajectory of the autonomous vehicle in the ambient sound environment”, para. 0054, “FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations”, para. 0144, “, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).

Regarding claim 2, Leenayongwut further teaches where collecting sensor data signals comprises passively capturing sound waves generated by other traffic participants using a microphone array affixed to the vehicle (“DOA estimator 1308 estimates spatial information (e.g., distance and direction) for multiple sound sources in the ambient sound environment. A spatial feature for each sound source is extracted from multi-channel observations captured by a plurality of spatially-distributed microphones in the microphone array 1302…the microphone array 1302 is mounted on the outside of AV 100 and includes an array of linear spaced microphones (e.g., 8 microphones)”, para. 0137).

Regarding claim 3, Leenayongwut further teaches where collecting sensor data signals comprises capturing sensor data signals with direct sensors selected from a group consisting of a camera, lidar detector, and radar detector (“System 1300 includes image sensors 1301 (e.g., LiDAR, RADAR, cameras)”, para. 0131, “sensors 121 also include sensors for sensing or measuring properties of the AV's environment. For example, monocular or stereo video cameras 122 in the visible light, infrared or thermal (or both) spectra, LiDAR 123, RADAR”, para. 0078).

Regarding claim 4, Leenayongwut further teaches where collecting sensor data signals comprises capturing sensor data signals with sensors selected from a group consisting of a camera, lidar detector, radar detector, and an ultrasound detector (“System 1300 includes image sensors 1301 (e.g., LiDAR, RADAR, cameras)”, para. 0131, “sensors 121 also include sensors for sensing or measuring properties of the AV's environment. For example, monocular or stereo video cameras 122 in the visible light, infrared or thermal (or both) spectra, LiDAR 123, RADAR, ultrasonic sensors”, para. 0078).

Regarding claim 5, Leenayongwut further teaches where a first processor (“vision-based sound localizer (VBSL) 1307”, Fig. 13) processes the one or more direct sensor data signals to generate the sensing envelope (“module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301”, para. 0136), and where a second processor (“sound classifier 1306”, Fig. 13) processes the one or more indirect sensor data signals to generate the sound-enhanced sensing envelope (“sound classifier 1306 classifies a sound captured in the ambient sound environment by microphone array 1302 by computing a frequency spectrum of the sound”, para. 0132).

Regarding claim 6, Leenayongwut further teaches where the sound-enhanced sensing envelope is used to evaluate a defensive maneuver command (“avoid collision”, see para. 0145 citation in the rejection to claim 1, see also “avoiding critical situations”, para. 0074) for the vehicle with respect to safety-related events identified by the indirect sensor data signals (“Referring to FIG. 1, an AV system 120 operates the AV 100 along a trajectory 198 through an environment 190 to a destination 199 (sometimes referred to as a final location) while avoiding objects (e.g., natural obstructions 191, vehicles 193, pedestrians 192, cyclists, and other obstacles) and obeying rules of the road (e.g., rules of operation or driving preferences)”, para. 0075, see also “if the planning module 1304 knows the siren is on, then the AV 100 can make an appropriate maneuver like a safe stop maneuver and pull to the side of the road”, para. 0143).

Regarding claim 7, Leenayongwut further teaches where the sound-enhanced sensing envelope is used to evaluate a free-space checking command (“compare the trajectory with a trajectory for the AV 100 to avoid collision”, para. 0145) for the vehicle with respect to safety-related events identified by the indirect sensor data signals (“FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations, according to an embodiment. In situations where a sound source is associated with a direction and distance but is not in within the FOV of image sensors of the AV 100, the planning module 1304 uses the directions and distances of the sound sources to localize the sound sources in the static digital map 1605. In the example shown, the emergency vehicle 1604 is represented by marker 1606 and the AV 100 is represented by marker 1607 in static map 1605. Accordingly, even though the sound source has not been detected by the perception module 1303, the directions and distances computed using beamformer system 1500 are provided to the planning module 1304”, para. 0144, “The locations of the sound sources in digital map 1605 are used by the planning module 1306 to predict the trajectory of the emergency vehicle 1604 and also the change in dynamic states of other vehicles or pedestrians in response to the siren. For example, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).

Regarding claim 8, Leenayongwut teaches: 
a plurality of sensors (“image sensors 1301”, Fig. 13) configured to collect first sensor data signals (“pairs of image frames captured by image sensors”, para. 0136) from respective portions of an environment of a vehicle (“autonomous vehicle 100”, Fig. 1, “FIG. 13 is a block diagram of a sound classification and sound source localization system 1300 for AV 100”, para. 0131); 
one or more passive sound sensors (“microphone array 1302”, Fig. 13) configured to collect second sensor data signals (“sounds captured by microphone array 1302 to localize sound sources in a vision scene”, para. 0136) characterizing an exterior environment (“…in a vision scene”, para. 0136) of the vehicle; 
one or more processors (“processors 146”, Fig. 1, see also “processor 304”, Fig. 3) and data storage storing instructions (“machine instructions”, para. 0079, see also “instructions”, para. 0090) that, when executed by the one or more processors, cause the system to perform operations (“the AV system 120 includes a data storage unit 142 and memory 144 for storing machine instructions associated with computer processors 146 or data collected by sensors 121”, para. 0079)comprising: 
processing the first sensor data signals which are collected from the plurality of sensors to generate a sensing envelope (“field of view (FOV)”, para. 0054) around the vehicle (“VBSL module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301 and sounds captured by microphone array 1302 to localize sound sources in a vision scene. The sound and vision modalities are processed, respectively, in separate sound and visual neural networks”, para. 0136, “sound sources within a field of view (FOV) of an image sensor of the autonomous vehicle are localized in a visual scene generated by the perception module”, para. 0054); 
processing the second sensor data signals which are collected from the one or more passive sound sensors to generate a sound-enhanced sensing envelope (“sound sources within a field of view (FOV) of an image sensor…”, para. 0054) around the vehicle (see para. 0136 citation above in the previous “processing” step, “The addition of sound source information allows the planning module to make more informed prediction of the dynamic state of the ambulance then could otherwise be determined from the image data itself”, para. 0143, see also “FIG. 17 is a flow diagram of a process 1700 of using classified and localized objects to operate an AV, according to an embodiment”, para. 0147); and 
using the sound-enhanced sensing envelope to evaluate advanced driver assistance system commands (“perform an action”, para. 0151, see also “generate a route or trajectory”, para. 0054) for the vehicle with respect to safety-related events (“avoid collision”, para. 0145, see also “crashes”, para. 0074) identified by the second sensor data signals (“Process 1700 continues by causing, by the processing-circuit, the AV to perform an action based on the classified sound and the determined location of the sound source in the ambient sound environment (1704)”, para. 0151, “The planning module uses the visual scene and/or static digital map, together with the current location of the autonomous vehicle and other information (e.g., traffic information, passenger preferences), to generate a route or trajectory of the autonomous vehicle in the ambient sound environment”, para. 0054, “FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations”, para. 0144, “, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).

Regarding claim 9, Leenayongwut further teaches where the one or more passive sound sensors comprise a microphone array affixed to the vehicle for capturing sound waves generated by other traffic participants (“DOA estimator 1308 estimates spatial information (e.g., distance and direction) for multiple sound sources in the ambient sound environment. A spatial feature for each sound source is extracted from multi-channel observations captured by a plurality of spatially-distributed microphones in the microphone array 1302…the microphone array 1302 is mounted on the outside of AV 100 and includes an array of linear spaced microphones (e.g., 8 microphones)”, para. 0137).

Regarding claim 10, Leenayongwut further teaches where the plurality of sensors comprises one or more direct sensors selected from a group consisting of a camera, lidar detector, radar detector, and an ultrasound detector (“System 1300 includes image sensors 1301 (e.g., LiDAR, RADAR, cameras)”, para. 0131, “sensors 121 also include sensors for sensing or measuring properties of the AV's environment. For example, monocular or stereo video cameras 122 in the visible light, infrared or thermal (or both) spectra, LiDAR 123, RADAR, ultrasonic sensors”, para. 0078).

Regarding claim 11, Leenayongwut further teaches where a first processor (“vision-based sound localizer (VBSL) 1307”, Fig. 13) processes the first sensor data signals to generate the sensing envelope (“module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301”, para. 0136), and where a second processor (“sound classifier 1306”, Fig. 13) processes the second sensor data signals to generate the sound-enhanced sensing envelope (“sound classifier 1306 classifies a sound captured in the ambient sound environment by microphone array 1302 by computing a frequency spectrum of the sound”, para. 0132).

Regarding claim 12, Leenayongwut further teaches where the sound-enhanced sensing envelope is used to evaluate a defensive maneuver command (“avoid collision”, see para. 0145 citation in the rejection to claim 8, see also “avoiding critical situations”, para. 0074) for the vehicle with respect to safety-related events identified by the second sensor data signals (“Referring to FIG. 1, an AV system 120 operates the AV 100 along a trajectory 198 through an environment 190 to a destination 199 (sometimes referred to as a final location) while avoiding objects (e.g., natural obstructions 191, vehicles 193, pedestrians 192, cyclists, and other obstacles) and obeying rules of the road (e.g., rules of operation or driving preferences)”, para. 0075, see also “if the planning module 1304 knows the siren is on, then the AV 100 can make an appropriate maneuver like a safe stop maneuver and pull to the side of the road”, para. 0143).

Regarding claim 13, Leenayongwut further teaches where the sound-enhanced sensing envelope is used to evaluate a free-space checking command (“compare the trajectory with a trajectory for the AV 100 to avoid collision”, para. 0145) for the vehicle with respect to safety-related events identified by the second sensor data signals (“FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations, according to an embodiment. In situations where a sound source is associated with a direction and distance but is not in within the FOV of image sensors of the AV 100, the planning module 1304 uses the directions and distances of the sound sources to localize the sound sources in the static digital map 1605. In the example shown, the emergency vehicle 1604 is represented by marker 1606 and the AV 100 is represented by marker 1607 in static map 1605. Accordingly, even though the sound source has not been detected by the perception module 1303, the directions and distances computed using beamformer system 1500 are provided to the planning module 1304”, para. 0144, “The locations of the sound sources in digital map 1605 are used by the planning module 1306 to predict the trajectory of the emergency vehicle 1604 and also the change in dynamic states of other vehicles or pedestrians in response to the siren. For example, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).

Regarding claim 14, Leenayongwut teaches an apparatus for operating an advanced driver assistance system (ADAS) on a vehicle (“autonomous vehicle 100”, Fig. 1, “FIG. 13 is a block diagram of a sound classification and sound source localization system 1300 for AV 100”, para. 0131) comprising a first plurality of sensors (“image sensors 1301”, Fig. 13) and a second plurality of sound sensors (“microphone array 1302”, Fig. 13) that are arrayed around the vehicle to collect sensor data signals (“sounds captured by microphone array 1302 to localize sound sources in a vision scene”, para. 0136) characterizing an exterior environment (“…in a vision scene”, para. 0136) of the vehicle, the apparatus comprising: 
one or more electronic control units (ECUs) connected to receive a first set of primary sensor data signals (“pairs of image frames captured by image sensors”, para. 0136) from the first plurality sensors which characterize a surrounding environment (“environment 190”, Fig. 1) of the vehicle and to receive a second set of augmenting sensor data signals (“sounds captured by microphone array 1302 to localize sound sources in a vision scene”, para. 0136) from the second plurality of sensors which characterize a surrounding audio environment (“a vision scene with bounding boxes, confidence scores and labels that are enhanced with sound information, as described in reference to FIG. 16A”, para. 0136) of the vehicle, where the one or more ECUs are configured to generate sound-enhanced advanced driver assistance system commands (“perform an action”, para. 0151, see also “generate a route or trajectory”, para. 0054) by using the second set of augmenting sensor data signals to augment a sensing envelope (“sound sources within a field of view (FOV) of an image sensor…”, para. 0054) around the vehicle which is computed from the first set of primary sensor data signals and used to identify safety-related events (“avoid collision”, para. 0145, see also “crashes”, para. 0074) in proximity to the vehicle
(“VBSL module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301 and sounds captured by microphone array 1302 to localize sound sources in a vision scene. The sound and vision modalities are processed, respectively, in separate sound and visual neural networks”, para. 0136, “sound sources within a field of view (FOV) of an image sensor of the autonomous vehicle are localized in a visual scene generated by the perception module”, para. 0054, “The addition of sound source information allows the planning module to make more informed prediction of the dynamic state of the ambulance then could otherwise be determined from the image data itself”, para. 0143, see also “FIG. 17 is a flow diagram of a process 1700 of using classified and localized objects to operate an AV, according to an embodiment”, para. 0147,
“Process 1700 continues by causing, by the processing-circuit, the AV to perform an action based on the classified sound and the determined location of the sound source in the ambient sound environment (1704)”, para. 0151, “The planning module uses the visual scene and/or static digital map, together with the current location of the autonomous vehicle and other information (e.g., traffic information, passenger preferences), to generate a route or trajectory of the autonomous vehicle in the ambient sound environment”, para. 0054, “FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations”, para. 0144, “, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).

Regarding claim 17, Leenayongwut further teaches wherein the one or more ECUs are configured to detect, classify, and provide notification of hazardous road conditions by passively capturing sound signal from the driving environment using the second plurality of sound sensors to augment the sensing envelope around the vehicle which is used to identify safety-related events (“sound classifier 1306 classifies a sound captured in the ambient sound environment by microphone array 1302 by computing a frequency spectrum of the sound”, para. 0132, “FIG. 17 is a flow diagram of a process 1700 of using classified and localized objects to operate an AV, according to an embodiment. Process 1700 can be implemented using, for example, sound source classification/localization system 1300”, para. 0147, “Process 1700 begins by capturing, using a plurality of microphones coupled to an autonomous vehicle (AV), an ambient sound environment in which the AV is operating (1701).”, para. 0148, “Process 1700 continues by classifying, based on the captured ambient sound environment, a sound created by a sound source in the ambient sound environment (1702)”, para. 0149, “Process 1700 continues by causing, by the processing-circuit, the AV to perform an action based on the classified sound and the determined location of the sound source in the ambient sound environment (1704)”, para. 0151). 

Regarding claim 18, Leenayongwut further teaches wherein the one or more ECUs are configured to use the second set of augmenting sensor data signals to improve vehicle sensing capabilities by creating a multi-dimensional sensing map (“static map with sound source locations and vision scene”, para. 0141, see Fig. 16A-16B) around the vehicle which augments the sensing envelope around the vehicle (“Planning module 1304 uses the directions and distances to determine the locations of the sound sources on a static map retrieved from map database 1305. Planning module also receives a vision scene with labeled bounding boxes that have been enhanced with sound information. Planning module 1304 uses the static map with sound source locations and vision scene to plan a route or trajectory through the environment as described in reference to FIGS. 9 and 10”, para. 0141).

Regarding claim 19, Leenayongwut further teaches where a first processor processes (“vision-based sound localizer (VBSL) 1307”, Fig. 13) the first set of primary sensor data signals to generate the sensing envelope (“module 1307 is implemented using a unified end-to-end deep CNN that uses pairs of image frames captured by image sensors 1301”, para. 0136), and where a second processor (“sound classifier 1306”, Fig. 13) processes the second set of augmenting sensor data signals to generate a sound-enhanced sensing envelope (“sound classifier 1306 classifies a sound captured in the ambient sound environment by microphone array 1302 by computing a frequency spectrum of the sound”, para. 0132). 

Regarding claim 20, Leenayongwut further teaches wherein the one or more ECUs are configured to generate the sound-enhanced advanced driver assistance system commands by using the sound-enhanced sensing envelope to evaluate a defensive maneuver command (“avoid collision”, see para. 0145 citation in the rejection to claim 8, see also “avoiding critical situations”, para. 0074) (“Referring to FIG. 1, an AV system 120 operates the AV 100 along a trajectory 198 through an environment 190 to a destination 199 (sometimes referred to as a final location) while avoiding objects (e.g., natural obstructions 191, vehicles 193, pedestrians 192, cyclists, and other obstacles) and obeying rules of the road (e.g., rules of operation or driving preferences)”, para. 0075) or a free-space checking command (“compare the trajectory with a trajectory for the AV 100 to avoid collision”, para. 0145) for the vehicle with respect to safety-related events identified by the second set of augmenting sensor data signals (“FIG. 16B illustrates augmenting a static digital map 1605 with sound source locations, according to an embodiment. In situations where a sound source is associated with a direction and distance but is not in within the FOV of image sensors of the AV 100, the planning module 1304 uses the directions and distances of the sound sources to localize the sound sources in the static digital map 1605. In the example shown, the emergency vehicle 1604 is represented by marker 1606 and the AV 100 is represented by marker 1607 in static map 1605. Accordingly, even though the sound source has not been detected by the perception module 1303, the directions and distances computed using beamformer system 1500 are provided to the planning module 1304”, para. 0144, “The locations of the sound sources in digital map 1605 are used by the planning module 1306 to predict the trajectory of the emergency vehicle 1604 and also the change in dynamic states of other vehicles or pedestrians in response to the siren. For example, the planning module 1304 can use the position, speed and direction of the emergency vehicle 1604 to generate a likely trajectory of the emergency vehicle 1604 in the environment, and compare the trajectory with a trajectory for the AV 100 to avoid collision with the emergency vehicle 1604 or any other static or dynamic object in the environment”, para. 0145).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leenayongwut et al. (US 2020/0241551 A1).
Regarding claim 15, Leenayongwut further teaches where the one or more ECUs are configured to generate sound-enhanced advanced driver assistance system commands by: 
computing a sound-enhanced sensing envelope (“sound sources within a field of view (FOV) of an image sensor…”, para. 0054) around the vehicle from the second set of augmenting sensor data signals (“After integrating (correlating) the information from a sound context vector and activations of the visual neural network, an attention mechanism localizes the sound source in the vision scene. The result is a vision scene with bounding boxes, confidence scores and labels that are enhanced with sound information, as described in reference to FIG. 16A”, para. 0136). 
Leenayongwut does not explicitly teach generating advanced driver assistance system commands from the first set of primary sensor data signals; and
using the sound-enhanced sensing envelope to evaluate the advanced driver assistance system commands for the vehicle with respect to safety-related events identified by the sound-enhanced sensing envelope. Instead, Leenayongwut teaches using both sets of sensor data signals to generate the advanced driver assistance system commands (Fig. 13).  
However, Leenayongwut teaches, “The addition of sound source information allows the planning module to make more informed prediction of the dynamic state of the ambulance then could otherwise be determined from the image data itself. For example, without sound source information the planning module 1304 is limited to speed and heading data for the emergency vehicle. However, the dynamic state of the emergency vehicle can potentially change dramatically if the emergency vehicle is responding to a call. For example, the emergency vehicle can run the traffic light at intersection 1601 or suddenly accelerate or turn. Additionally, any other vehicles at the intersection 1601 (not shown) would likely respond to the siren of the emergency vehicle and suddenly stop or pull to the side of the road to let the emergency vehicle pass. Accordingly, if the planning module 1304 knows the siren is on, then the AV 100 can make an appropriate maneuver like a safe stop maneuver and pull to the side of the road” (para. 0143). 
In para. 0143, Leenayongwut describes generating advanced driver assistance commands using only the first set of primary sensor data signal (“…without sound source information”, para. 0143). Leenayongwut further teaches improving on the advanced driver assistance commands generated only with the image sensors (using only “speed and heading data for the emergency vehicle” (para. 0143). Thus, given these teachings of Leenayongwut, the claimed invention of claim 15 would have been obvious to one of ordinary skill in the art before the effective filing date. The motivation for making such a modification would be to allow the “image sensors 1301” to predict the state of objects surrounding the vehicle without the “microphone array 1302” prior to combining their signal as seen in Fig. 13. This would save processing power using only high-resolution image data to plan a trajectory for the AV. Further support for obviousness can be found in para. 0003, “The autonomous vehicles driven today use sophisticated planning algorithms to generate routes and trajectories through environments with many static and dynamic objects. These planning algorithms require detailed information about the environment. Although onboard sensors, such as LiDAR, RADAR and cameras provide high-resolution image data, there is an increasing need for additional information about the environment for planning and other functions of autonomous vehicles”. 

Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leenayongwut et al. (US 2020/0241551 A1) in view of Shumard et al. (US 2021/0136487 A1), hereafter referred to as Shumard.
Regarding claim 16, Leenayongwut further teaches wherein the apparatus operates in a multi-modal sensor network (“System 1300 includes image sensors 1301 (e.g., LiDAR, RADAR, cameras), microphone array 1302”, para. 0131) that combines the second plurality of sound sensors with the first plurality of sensors (see “vision scene”, Fig. 13, see also “…pairs of image frames captured by image sensors 1301 and sounds captured by microphone array 1302 to localize sound sources in a vision scene”, para. 0136 citation in the rejection to claim 14) comprising a plurality of complementary sensors (“A spatial feature for each sound source is extracted from multi-channel observations captured by a plurality of spatially-distributed microphones in the microphone array 1302, and a peak search in the spatial feature is used to compute the DOA estimate for the sound source. In an embodiment, the microphone array 1302 is mounted on the outside of AV 100 and includes an array of linear spaced microphones (e.g., 8 microphones)”, para. 0137).
Leenayongwut does not explicitly teach wherein the “microphone array 1302” comprises a plurality of orthogonal sensors, but instead teaches “an array of linear spaced microphones (e.g., 8 microphones)” (para. 0137).
However, Shumard teaches a proximity microphone, comprising:
a first plurality of sensors (“sub-array 812comprising four microphone elements 822a, 822b, 822c, and 822d ”, para. 0069, Fig. 9) comprising a plurality of orthogonal and complementary sensors (“Referring now to FIG. 9, shown is a top view of a microphone cluster or sub-array 812 comprising four microphone elements 822a, 822b, 822c, and 822d (collectively referred to as microphone elements 822) arranged in a “cross-pattern.” This pattern may be achieved by arranging the microphone elements 822 at right angles relative to a center 816 of the sub-array 812, so that microphone elements 822a and 822c are horizontally aligned along a first plane of the microphone grille 820 and microphone elements 822b and 822d are horizontally aligned along a second plane of the microphone grille 820 that is perpendicular to the first plane”, para. 0069). 
All the components are known in Leenayongwut and Shumard. Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify the invention of Leenayogwut with the teachings of Shumard such that the “microphone array 1302” (Fig. 13) of Leenayongwut is “arranged in a “cross-pattern”” (para. 0069), as taught by Shumard. The motivation for doing so would be because “The cross-pattern arrangement may improve a working distance of the sub-array 812 by placing the microphone elements 822 in closer proximity to each other, for example, as compared to the arrangement shown in FIG. 2A” (para. 0069), as taught by Shumard. 
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure:  See Notice of References Cited.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMELIA VORCE whose telephone number is (313) 446-4917.  The examiner can normally be reached on Monday-Friday, 9AM-6PM, Mountain Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christian Chace can be reached at (571) 272-4190.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	/A.V./               Examiner, Art Unit 3665                                                                                                                                                                                         
	/CHRISTIAN CHACE/               Supervisory Patent Examiner, Art Unit 3665