Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
The drawing submitted on 12/24/2020 is considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “dialog manager” “automotive assistance” “isolator” in “reasoning stage” “speech daemon” in claims 1-18.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 10-12, and 14-17, are rejected under 35 U.S.C. 102(a((1) as being anticipated by Mohammad et al.(US 2018/0190282 A1).

Regarding Claim 1, Mohammad et al. teach: An apparatus comprising an automotive assistant (24) that is configured to receive first and second microphone signals (22) from first and second microphones (16) disposed in corresponding first and second acoustic zones of a vehicle (10)( [0036] As further shown in the example of FIG. 1, the vehicle 10 may include an interface device 22, multiple zone microphones 28, and one or more functional units 26. In some examples, interface device may include one or more microphones that are configured to capture audio data of spoken commands provided by occupants of vehicle 10. For instance, the interface device 22 may overlap with the zone microphones 28, to the extent that the interface device 22 includes a driver zone microphone or pilot zone microphone or other microphones positioned in various in-cabin vehicle zones of vehicle 10.), wherein said first and second microphone signals carry first (spoken command in one of the in-cabin vehicle zones 30-36) and second utterances (a passenger participating in a phone call in another one of the in-cabin vehicle zones 30-36) from corresponding first and second passengers (12) of said vehicle ([0042] The zone microphones 28 of the vehicle 10 may represent a microphone array, with at least one microphone positioned in each in-cabin vehicle zone of a cabin of the vehicle 10, where each in-cabin vehicle zone represents an area that typically seats or otherwise accommodates a single occupant. Each of the zone microphones 28 may represent a data-input component or a combination of data-input components configured to capture audio data or a combination of audio data and directional information (such as an EigenMike® microphone). [0054] As one example, the processing circuitry may control the functional unit(s) 26 using spoken commands received at the zone microphone 28A of the driver zone 30 at all times, in addition to spoken commands received at any of zone microphones 28 that is positioned in an active zone of in-cabin vehicle zones 32-36. [0065] FIG. 2E illustrates an example of the vehicle 10, in which the processing circuitry 12 may process voice commands from multiple in-cabin vehicle zones simultaneously. In the example of FIG. 2E, each of zone microphones 28A and 28C may receive voice commands 29 and 31 simultaneously or concurrently. The processing circuitry 12 may simultaneously use spoken command inputs received from the in-cabin vehicle zone 30 (via the zone microphone 28A) and the in-cabin vehicle zone 34 (via the zone microphone 28C) to control respective functional units in the in-cabin vehicle zone 30 and the in-cabin vehicle zone 34.), wherein said automotive assistant comprises a dialog manager (42) (functional unit 26, i.e. infotainment system of vehicle 10) that is configured to initiate a dialog (auditory responses) with said first passenger based on said first utterance and to advance said dialog based on (identify the second utterance matching as noise) an said second utterance ( [0067] In some instances, in which the vehicle 10 is equipped with parametric speakers or with multiple loudspeakers, the processing circuitry 12 may implement one or more of the above-described techniques to render localized playback of auditory responses to the spoken command within the respective one of the in-cabin vehicle zones 30-36, while leaving the rest of the cabin of the vehicle 10 uninterrupted. [0068] In some examples, the processing circuitry 12 may remove background audio data that originated from a phone call (e.g., a passenger speaking on a cellular telephone) that occurs in any of the in-cabin vehicle zones 30-36 that is not, at present, the selected in-cabin vehicle zone. In this way, the processing circuitry 12 may implement the background audio data-removal aspects of this disclosure to process voice commands received from a selected zone of the in-cabin vehicle zones 30-36, without substantive interference or audio garbling caused by a passenger participating in a phone call in another one of the in-cabin vehicle zones 30-36. In some examples, the processing circuitry 12 may remove background audio data that corresponds with multiple phone calls occurring in two or more of the in-cabin vehicle zones 30-36, outside of the selected in-cabin vehicle zone of the in-cabin vehicle zones 30-36. [0085] In some examples, the processing circuitry 12 is further configured to identify respective voice information associated with audio input received from respective microphones of the in-cabin vehicle zones other than the selected in-cabin vehicle zone, to determine that any portion of the audio input that is received from a respective microphone of the selected in-cabin vehicle zone and is associated with the identified voice information received from the respective microphones of the in-cabin vehicle zones other than the selected in-cabin vehicle zone comprises noise with respect to the selected in-cabin vehicle zone, to apply, based on the determination, noise cancellation to the portion of the audio input that comprises the noise with respect to the selected in-cabin vehicle zone to obtain noise-cancelled audio input associated with the selected in-cabin vehicle zone, and to control the functional unit using the noise-cancelled audio input associated with the selected in-cabin vehicle zone. In various examples, the functional unit may include one or more of a climate control system of the vehicle, an entertainment system of the vehicle, an integrated wireless phone link system, or an integrated emergency notification system. [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 44. For instance, the processing circuitry 12 may implement a learning algorithm with respect to audio data received from the inactive microphone 48, and thereby form voice recognition heuristics with respect to the inactive microphone 48. If the processing circuitry 12 detects audio data received at the active microphone 46 that matches, or substantially matches the voice data associated with the inactive microphone 48, the processing circuitry 12 may identify the audio data matching the voice data of the inactive microphone 48 as noise with respect to the active microphone 46.).

Regarding Claim 2, Mohammad et al. teach: The apparatus of claim 1, further comprising a reasoning stage (40) that is configured to infer relevance (identify the second utterance matching as noise) of said second utterance to said dialog (See rejection of claim 1).

Regarding Claim 3, Mohammad et al. teach: The apparatus of claim 1, wherein said dialog is a first dialog (voice command), wherein said apparatus further comprises a reasoning stage (40) that is configured to infer whether said second utterance (identify the second utterance matching as noise) is intended to initiate a second dialog that differs from said first dialog (See rejection of claim 1 and [0065] FIG. 2E illustrates an example of the vehicle 10, in which the processing circuitry 12 may process voice commands from multiple in-cabin vehicle zones simultaneously. [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 44. For instance, the processing circuitry 12 may implement a learning algorithm with respect to audio data received from the inactive microphone 48, and thereby form voice recognition heuristics with respect to the inactive microphone 48.).

Regarding Claim 4, Mohammad et al. teach: The apparatus of claim 1, wherein said dialog is a first dialog and wherein said dialog manager (42) is configured to manage said first dialog and a second dialog that is being carried out concurrently with said first dialog (See rejection of claim 1 and [0065] FIG. 2E illustrates an example of the vehicle 10, in which the processing circuitry 12 may process voice commands from multiple in-cabin vehicle zones simultaneously. [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 44. For instance, the processing circuitry 12 may implement a learning algorithm with respect to audio data received from the inactive microphone 48, and thereby form voice recognition heuristics with respect to the inactive microphone 48. If the processing circuitry 12 detects audio data received at the active microphone 46 that matches, or substantially matches the voice data associated with the inactive microphone 48, the processing circuitry 12 may identify the audio data matching the voice data of the inactive microphone 48 as noise with respect to the active microphone 46.).

Regarding Claim 10, Mohammad et al. teach: The apparatus of claim 1, wherein said automotive assistant is connected to loudspeakers (18) disposed in different acoustic zones respectively(See Fig. 2A-2E), wherein said dialog manager is configured to advance said dialog by providing a distribution signal (48) that causes a loudspeaker signal to be provided to a proper subset of said loudspeakers (See rejection of claim 1 and [0066] As shown in FIGS. 2A-2E, each of in-cabin vehicle zones 30-36 includes at least one loudspeaker. In some examples, each of the in-cabin vehicle zones 30-36 of the vehicle 10 may include an array of loudspeakers along with the regular speakers in the respective in-cabin vehicle zone. Pre-filtering may include additional signal processing to cancel out any out-of-zone high frequency audio content. [0067] In some examples, with multiple loudspeakers positioned in one or more of the in-cabin vehicle zones 30-36, the processing circuitry 12 may implement various techniques of this disclosure to render or otherwise provide auditory responses in a localized fashion within the respective one of the in-cabin vehicle zones 30-36, while enabling passengers in the rest of the cabin of the vehicle 10 to consume uninterrupted audio and/or video data from the infotainment system of the vehicle 10. As one example, the processing circuitry 12 may perform noise masking, by creating a diffused sound field without a detectable sound source. As another example, the processing circuitry 12 may focus the sound in a localized fashion within one of the in-vehicle zones 30-36, or towards any particular (e.g., predetermined) direction. As another example, the processing circuitry 12 may send multiple sound beams in different directions within the cabin of the vehicle 10. In some instances, in which the vehicle 10 is equipped with parametric speakers or with multiple loudspeakers, the processing circuitry 12 may implement one or more of the above-described techniques to render localized playback of auditory responses to the spoken command within the respective one of the in-cabin vehicle zones 30-36, while leaving the rest of the cabin of the vehicle 10 uninterrupted.).

Regarding Claim 11, Mohammad et al. teach: The apparatus of claim 1, wherein said automotive assistant is configured to be pre-set to a state in which an utterance from one of said first and second microphones is ignored ([0051] As such, in one example, the vehicle 10 may represent a vehicle comprising an interface device 22 configured to receive a spoken command to identify an in-cabin vehicle zone of two or more in-cabin vehicle zones of the vehicle 10 and to receive background audio data concurrently with a portion of the spoken command. [0053] For instance, to separate the background audio data from the spoken command, the processing circuitry 12 may linearly remove the background audio data from the spoken command. For instance, the processing circuitry 12 may implement beamforming to determine the directionality of audio data received from various speakers positioned within the cabin of the vehicle 10, and may leverage the directionality information to identify the background audio data when received concurrently with the spoken command. [0065] In the example of FIG. 2E, each of zone microphones 28A and 28C may receive voice commands 29 and 31 simultaneously or concurrently. The processing circuitry 12 may simultaneously use spoken command inputs received from the in-cabin vehicle zone 30 (via the zone microphone 28A) and the in-cabin vehicle zone 34 (via the zone microphone 28C) to control respective functional units in the in-cabin vehicle zone 30 and the in-cabin vehicle zone 34. [0066] Pre-filtering may include additional signal processing to cancel out any out-of-zone high frequency audio content. [0068] In some examples, the processing circuitry 12 may remove background audio data that originated from a phone call (e.g., a passenger speaking on a cellular telephone) that occurs in any of the in-cabin vehicle zones 30-36 that is not, at present, the selected in-cabin vehicle zone. In this way, the processing circuitry 12 may implement the background audio data-removal aspects of this disclosure to process voice commands received from a selected zone of the in-cabin vehicle zones 30-36, without substantive interference or audio garbling caused by a passenger participating in a phone call in another one of the in-cabin vehicle zones 30-36. In some examples, the processing circuitry 12 may remove background audio data that corresponds with multiple phone calls occurring in two or more of the in-cabin vehicle zones 30-36, outside of the selected in-cabin vehicle zone of the in-cabin vehicle zones 30-36. [0070] In the example of FIG. 3, occupant 46 has voice control over the functional unit(s) 26, based on an “active” status of a microphone 42 positioned in the in-cabin vehicle zone associated with the passenger 46. The active status of the microphone 42 is denoted by an adjacent asterisk in FIG. 3, and as such, the microphone 42 is referred to hereinafter as an active microphone 42. By contrast, the microphone 44 that is positioned in the in-cabin zone in which the occupant 48 is seated is currently inactive, and is referred to herein as an inactive microphone 48. [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 48. If the processing circuitry 12 detects audio data received at the active microphone 46 that matches, or substantially matches the voice data associated with the inactive microphone 48, the processing circuitry 12 may identify the audio data matching the voice data of the inactive microphone 48 as noise with respect to the active microphone 46. In turn, the processing circuitry 12 may suppress the identified noise in the audio data received from the active microphone 46, thereby filtering out noise, and using voice commands received from the active microphone 46 to control the functional unit(s) 26.).

Regarding Claim 12, Mohammad et al. teach:  The apparatus of claim 1, wherein said automotive assistant is connected to loudspeakers (18), each of which is disposed in a different acoustic zone of said vehicle and wherein said apparatus further comprises a distributor (46) (the processing circuitry 12) that distributes a loudspeaker signal to selected ones of said loudspeakers (See rejection of claim 10).

Regarding Claim 14, Mohammad et al. teach:  The apparatus of claim 1, wherein each of said acoustic zones corresponds to a seat in said vehicle (See rejection of claim 1 and [0042] The zone microphones 28 of the vehicle 10 may represent a microphone array, with at least one microphone positioned in each in-cabin vehicle zone of a cabin of the vehicle 10, where each in-cabin vehicle zone represents an area that typically seats or otherwise accommodates a single occupant.).

Regarding Claim 15, Mohammad et al. teach:   A method comprising, based on a first utterance from a first zone, establishing a first dialog, receiving a second utterance from a second zone, using a reasoning stage, determining a property of said second utterance, wherein said property is selected from the group consisting of the property of advancing said dialog and the property of not advancing said dialog (See rejection of claim 1).

Regarding Claim 16, Mohammad et al. teach:  The method of claim 15, wherein determining said property comprises determining that said second utterance has the property of not advancing said dialog, said method further comprising ignoring said second utterance (See rejection of claim 11 and [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 48. If the processing circuitry 12 detects audio data received at the active microphone 46 that matches, or substantially matches the voice data associated with the inactive microphone 48, the processing circuitry 12 may identify the audio data matching the voice data of the inactive microphone 48 as noise with respect to the active microphone 46. In turn, the processing circuitry 12 may suppress the identified noise in the audio data received from the active microphone 46, thereby filtering out noise, and using voice commands received from the active microphone 46 to control the functional unit(s) 26.).

Regarding Claim 17, Mohammad et al. teach:   The method of claim 15, wherein determining said property comprises determining that said second utterance has the property of advancing said dialog, said method further comprising advancing said dialog based on said second utterance([0065] FIG. 2E illustrates an example of the vehicle 10, in which the processing circuitry 12 may process voice commands from multiple in-cabin vehicle zones simultaneously. In the example of FIG. 2E, each of zone microphones 28A and 28C may receive voice commands 29 and 31 simultaneously or concurrently. The processing circuitry 12 may simultaneously use spoken command inputs received from the in-cabin vehicle zone 30 (via the zone microphone 28A) and the in-cabin vehicle zone 34 (via the zone microphone 28C) to control respective functional units in the in-cabin vehicle zone 30 and the in-cabin vehicle zone 34.).



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Mohammad et al..
Mohammad et al. teach: [0053] For instance, to separate the background audio data from the spoken command, the processing circuitry 12 may linearly remove the background audio data from the spoken command. For instance, the processing circuitry 12 may implement beamforming to determine the directionality of audio data received from various speakers positioned within the cabin of the vehicle 10, and may leverage the directionality information to identify the background audio data when received concurrently with the spoken command. [0065] In the example of FIG. 2E, each of zone microphones 28A and 28C may receive voice commands 29 and 31 simultaneously or concurrently. The processing circuitry 12 may simultaneously use spoken command inputs received from the in-cabin vehicle zone 30 (via the zone microphone 28A) and the in-cabin vehicle zone 34 (via the zone microphone 28C) to control respective functional units in the in-cabin vehicle zone 30 and the in-cabin vehicle zone 34.
Mohammad et al. do not specifically teach: The method of claim 15, wherein determining said property comprises determining that said second utterance has the property of not advancing said dialog, said method further comprising determining that said second utterance is an attempt to initiate a new dialog and starting said new dialog based on said second utterance.
However, “determining that said second utterance has the property of not advancing said dialog, said method further comprising determining that said second utterance is an attempt to initiate a new dialog and starting said new dialog based on said second utterance” would obvious based on the above teaching of Mohammad et al. specifically  [0053] and [0065].  During concurrently or simultaneously processing spoken commands 29 and 31, (first utterance and second utterance) from microphone 28 A and 28 C from in cabin vehicle zone 30 and 34 respectively, microphone  28 A will receive both  voice commands 29 from zone 30 and voice command 31 from zone 34. Similarly microphone 28C will receive both voice command 29, from zone 30 and voice command 31 from zone 34. Therefore, each zone’s microphone during processing it’s zone specific voice command corresponding to its microphone, would process the other zone voice command as  background audio data and thus would remove the background audio data from its zone specific  spoken command corresponding microphone. 
Therefore, the limitation “determining that said second utterance  (voice command from 31 from zone 34 ) has the property (noise) of not advancing said dialog” would be obvious when processing first utterance (voice command 29 from zone 30) where the second utterance would be treated as noise. Similarly, the limitation  “determining that said second utterance is an attempt to initiate a new dialog and starting said new dialog based on said second utterance” would be obvious during processing second utterance (voice command 30 from zone 34) wherein the first utterance(voice command 29 from zone 30)  would be treated as noise. 
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Mohammad et al. to include the teaching of “determining that said second utterance has the property of not advancing said dialog, said method further comprising determining that said second utterance is an attempt to initiate a new dialog and starting said new dialog based on said second utterance” in order to control respective functional units in the multiple in-cabin vehicle zone simultaneously by receiving spoken command inputs from each in-cabin vehicle zone simultaneously. 

Claims 5-7 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Mohammad et al. in view of Premont et al.(US 2019/0073999 A1).

Regarding Claim 5, Mohammad et al. teach: “selecting a particular zone of the in-cabin vehicle zones 30-36, and  invoke the speech recognition engine 33 to process data from a spoken command received via the zone microphones 28 (See Mohammad et al. [0056-0058] and [0063].).
Mohammad et al. do not teach: The apparatus of claim 1, further comprising a first speech daemon (30), and a second speech daemon, wherein said first speech daemon is configured to monitor a first acoustic signal (28), which is derived from said first microphone signal, wherein said second speech daemon is configured to monitor a second acoustic signal, which is derived from said second microphone signal, wherein said second speech daemon is configured to extract, from said second acoustic signal, information relevant to determining whether said second utterance is intended to advance said dialog.
Premont et al. teach: a first speech daemon (Fig.2, anyone of ASR engines 220a-220d), and a second speech daemon (Fig.2, anyone of ASR engines 220a-220d), wherein said first speech daemon is configured to monitor a first acoustic signal (28), which is derived from said first microphone signal, wherein said second speech daemon is configured to monitor a second acoustic signal, which is derived from said second microphone signal, wherein said second speech daemon is configured to extract, from said second acoustic signal, information relevant to determining whether said second utterance is intended to advance said dialog ([0040] As discussed above, once a wake-up word has been detected within an acoustic zone, it may be desirable to target sounds produced from that acoustic zone when performing subsequent speech recognition. In the example of FIG. 2, the arbitration unit 230 and the channel selection unit 240 work together to ensure that, once a wake-up word has been detected in an acoustic zone, subsequent sounds produced from that acoustic zone will be provided to the primary ASR 250, which performs speech recognition of sound produced from the spatial preprocessor selected by the channel selection unit. The primary ASR engine 250 may include an ASR language and acoustic model for recognizing a wide range of speech commands.
[0041] As a non-limiting example of this process, a user within acoustic zone 3 may utter the wake-up word “Hello Tetra,” which is captured by the three microphones of illustrative system 200. The four spatial preprocessors 215a-215d determine which sounds were produced by each of the four acoustic zones; in this example, it would be expected that the sound identified by spatial preprocessor 215c corresponding to acoustic zone 3 would contain the majority of the sound of the user uttering the wake-up word captured by the microphones. When the four ASR engines 220a-220d analyze their respective acoustic signals, ASR engine 220c will most likely produce the highest confidence that the wake-up word “Hello Tetra” was uttered within its associated acoustic zone. Accordingly, arbitration unit 230 will instruct channel selection unit 240 to provide the output of spatial preprocessor 215c to the primary ASR engine 250 to recognize subsequent voice commands from the user.)
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Mohammad et al. to include the teaching of Premont et al. above in order to select a channel/microphone of a specific acoustic zone producing the wake-up word, and providing subsequent sounds produced from that acoustic zone to the primary ASR.

Regarding Claim 6: The apparatus of claim 1, further comprising plural natural-language processors (36) that are configured to execute while said dialog manager is managing said dialog, wherein each of said natural-language processors is configured to receive an acoustic signal derived from one of said microphones and to extract, from said acoustic signal, information indicative of relevance of said acoustic signal to said dialog (See Premont et al. teaching in rejection of claim 5 and further Premont et al. teaching of  [0021] According to some embodiments, once a wake-up word has been detected as produced from within a given acoustic zone, a system may then preferentially target that zone for subsequent automatic speech recognition. For instance, the system may identify sound produced from that zone within sound signals received by one or more microphones and perform speech recognition on the identified sound. Such an approach may allow for more accurate speech detection in a noisy environment, such as those environments discussed above, since the system may perform speech recognition on sounds produced from the selected acoustic zone whilst excluding sounds produced from other acoustic zones. [0022] For instance, returning to the motor vehicle example described above, the techniques described herein may allow a passenger in a rear seat to speak a wake-up word that is recognized by one or more microphones within the vehicle even though other sounds may be present in the environment, since an acoustic zone may be defined that includes the passenger's seat and sounds produced from within that acoustic zone may be targeted for wake-up word recognition. Moreover, once the wake-up word has been recognized as being produced from the rear seat passenger's acoustic zone, subsequent sounds produced from that acoustic zone may be used as input for automatic speech recognition. In this manner, a passenger's speech, including both a wake-up word and subsequent speech commands, may be recognized in an environment that includes other sound sources, even other sources of speech. As discussed above, using a separate speech recognizer or separate instances of a speech recognizer to recognize the speech content of the acoustic signal arising from each of the acoustic zones in particular provides more robust speech recognition in such an environment. [0023] According to some embodiments, one or more hardware sensors may aid in detection of a wake-up word. Since a user must be present in an acoustic zone in order to produce speech (including a wake-up word) from within that acoustic zone, hardware sensors may be used to determine whether, in fact, a user is present in that acoustic zone. Such sensors may include any one or combination of motion sensors (e.g., to determine if any users are present in a room), pressure sensors (e.g., to determine if a user is sitting in a seat, such as a car seat), cameras to provide optical data, sensors to detect when a seat belt is engaged and/or any other suitable sensor(s) that facilitates determining whether there is a user located within an acoustic zone. If it is determined that no user is present in a particular acoustic zone, that acoustic zone need not be further considered during wake-up word detection and/or subsequent speech recognition by the system. [0044] According to some embodiments, one or more actions taken in response to recognition of speech commands subsequent to detection of a wake-up word in an acoustic zone may be based upon an identification of a user who uttered the wake-up word. ASR engine 250 (or another component of unit 201) may perform biometrics of a voice that uttered the detected wake-up word to identify the speaker. The response of the system to subsequent voice commands may then be determined based on knowledge of the user's name, preferences, etc.).

Regarding Claim 7: The apparatus of claim 1, further comprising speech daemons (30), wherein each of said speech daemons is configured to monitor an acoustic signal derived from one of said microphones, wherein each of said speech daemons comprises a wake-word detector (32) (ASR engines 220a-220d), a natural-language processor (36) (preprocessors 215a-215d), and an automatic speech-recognizer (34) (ASR 50) (See Premont et al. teaching in rejection of claim 5).

Regarding Claim 13: The apparatus of claim 1, further comprising a first speech daemon and a remote natural-language processor (37) (Fig.2, ASR 50), wherein said first speech daemon is configured to monitor an acoustic signal (wake-up word) that is derived from said first microphone signal and to communicate a request to said remote natural-language processor for interpretation of a command in said first utterance (See rejection of claim 5 and [0041] As a non-limiting example of this process, a user within acoustic zone 3 may utter the wake-up word “Hello Tetra,” which is captured by the three microphones of illustrative system 200. The four spatial preprocessors 215a-215d determine which sounds were produced by each of the four acoustic zones; in this example, it would be expected that the sound identified by spatial preprocessor 215c corresponding to acoustic zone 3 would contain the majority of the sound of the user uttering the wake-up word captured by the microphones. When the four ASR engines 220a-220d analyze their respective acoustic signals, ASR engine 220c will most likely produce the highest confidence that the wake-up word “Hello Tetra” was uttered within its associated acoustic zone. Accordingly, arbitration unit 230 will instruct channel selection unit 240 to provide the output of spatial preprocessor 215c to the primary ASR engine 250 to recognize subsequent voice commands from the user.).

11.	Claim 8  is rejected under 35 U.S.C. 103 as being unpatentable over Mohammad et al. in view of Ramprashad  et al.(US 2018/0033447 A1).

Regarding Claim 8, Mohammad et al. teach: The apparatus of claim 1, further comprising an isolator (26)(processing circuity 12) configured to receive a microphone signal from said first microphone, said microphone signal including a superposition of first and second utterances(detects audio data received at the active microphone  that matches, or substantially matches the voice data associated with the inactive microphone, i.e. a phone call (e.g., a passenger speaking on a cellular telephone) that occurs in any of the in-cabin vehicle zones 30-36 that is not, at present, the selected in-cabin vehicle zone), said first utterance being an utterance from an intrinsic speaker (audio input received from selected respective in-cabin zone microphone)  for said first microphone and said second utterance being an utterance from an extrinsic speaker ( audio input received from respective microphones of the in-cabin vehicle zones other than the selected in-cabin vehicle zone i.e. audio garbling caused by a passenger participating in a phone call in another one of the in-cabin vehicle zones) for said microphone, ([0058] At various portions of this disclosure, the implementation described above may be described as the processing circuitry being configured to “simultaneously” or “concurrently” control the functional unit(s) 26 using the respective zone microphones 28 of multiple zones of the in-cabin vehicle zones 30-36. It will be appreciated that the use of the terms “simultaneous” or “concurrent” is not limited to scenarios in which the spoken commands from multiple zones overlap in time, but also include scenarios in which the processing circuitry 12 receives the spoken commands from the multiple zones during a discrete, fixed, window of time. [0065] FIG. 2E illustrates an example of the vehicle 10, in which the processing circuitry 12 may process voice commands from multiple in-cabin vehicle zones simultaneously. In the example of FIG. 2E, each of zone microphones 28A and 28C may receive voice commands 29 and 31 simultaneously or concurrently. The processing circuitry 12 may simultaneously use spoken command inputs received from the in-cabin vehicle zone 30 (via the zone microphone 28A) and the in-cabin vehicle zone 34 (via the zone microphone 28C) to control respective functional units in the in-cabin vehicle zone 30 and the in-cabin vehicle zone 34. [0068] In some examples, the processing circuitry 12 may remove background audio data that originated from a phone call (e.g., a passenger speaking on a cellular telephone) that occurs in any of the in-cabin vehicle zones 30-36 that is not, at present, the selected in-cabin vehicle zone. In this way, the processing circuitry 12 may implement the background audio data-removal aspects of this disclosure to process voice commands received from a selected zone of the in-cabin vehicle zones 30-36, without substantive interference or audio garbling caused by a passenger participating in a phone call in another one of the in-cabin vehicle zones 30-36. [0071] In some examples, the processing circuitry 12 may apply noise cancellation at the active microphone 46, to dampen or suppress any unwanted voice commands that may be detected from the occupant 44. For instance, the processing circuitry 12 may implement a learning algorithm with respect to audio data received from the inactive microphone 48, and thereby form voice recognition heuristics with respect to the inactive microphone 48. If the processing circuitry 12 detects audio data received at the active microphone 46 that matches, or substantially matches the voice data associated with the inactive microphone 48, the processing circuitry 12 may identify the audio data matching the voice data of the inactive microphone 48 as noise with respect to the active microphone 46.).
Mohammad et al. do not teach: wherein said isolator is configured to output an acoustic signal corresponding to said microphone signal, wherein a first power ratio is a ratio of power in said first utterance relative to power in said second utterance in said microphone signal, wherein a second power ratio is a ratio of power in said first utterance to power in said second utterance in said acoustic signal, wherein said second power ratio and said first power ratio indicate that said second utterance has been suppressed in said acoustic signal.
Ramprashad et al, teach: wherein said isolator is configured to output an acoustic signal corresponding to said microphone signal, wherein a first power ratio (initial or instantaneous ratio of two beams in time domain, i.e. ratio of desire speech and undesired speech in time domain) is a ratio of power in said first utterance (desired speech or voice) relative to power in said second utterance(undesired speech or noise) in said microphone signal, wherein a second power ratio( ratio of two beams in frequency domain, i.e. ratio of desired speech and undesired speech in spectral or frequency domain) is a ratio of power in said first utterance to power in said second utterance in said acoustic signal, wherein said second power ratio and said first power ratio indicate that said second utterance has been suppressed in said acoustic signal ([0005] Thus, when desired speech is active it is expected that there is to be an energy or power spectrum difference between the two channels in line with the SNR difference. The parameters of a Voice Activity Detector (VAD) or of a noise estimator, where the latter could be part of a noise suppressor, can therefore be adjusted, based on the voice separation value. [0037] Beam analyzers 150 and 155 may each analyze the received microphone signals to determine which of the microphone signals will produce a beam that captures a desired source (such as a local voice) and an undesired source (such as ambient noise), respectively. The determination may be based on a variety of factors.[0044]  In some embodiments, a noise estimator may first be used to process the noise beam (the noise dominant input) and the voice beam (the voice dominant input) to compute the respective noise components, and the respective strengths of these noise components are used to determine instantaneous and average ratios over the time interval. The instantaneous ratios may be computed directly in the discrete time domain on a frame by frame basis. Alternatively, the instantaneous ratios may be computed in the discrete time domain at different points in time in each audio frame. In other embodiments, the strengths of the voice and noise beams are computed as power spectra in the spectral or frequency domain, or they may be computed as energy spectra. This may be based on having first transformed the primary and secondary sound pick up channels on a frame by frame basis into the frequency domain (also referred to as spectral domain.) [0049] According to some embodiments, to determine whether there is sufficient voice-separation between two beams, ratios are considered between a strength of a voice beam (a desired signal or an acoustic pickup beam dominated primarily by a primary talker's voice) and a strength of a noise beam (an undesired signal, or an acoustic pickup beam dominated primarily by noise). For example, initially, ratios are obtained between a strength of the noise beam and a strength of the voice beam. In embodiments in which the correction factor for noise matching has been determined, these ratios may be adjusted by applying the correction factor for noise-matching. In such embodiments, these adjusted ratios are compared to set thresholds for voice-separation in order to determine whether there is sufficient voice-separation between the two beams. In some embodiments, the adjusted ratios are used to obtain instantaneous and average ratios over a time interval (e.g., a digital audio time frame), and the instantaneous and average ratios are compared to the set thresholds to determine whether there is sufficient voice-separation. The instantaneous ratios may be computed directly in the discrete time domain on a frame by frame basis. Alternatively, the instantaneous ratios may be computed in the discrete time domain at different points in time in each audio frame. In other embodiments, the strengths of the voice and noise beams are computed as power spectra in the spectral or frequency domain. This may be based on having first transformed the primary and secondary sound pick up channels on a frame by frame basis into the frequency domain (also referred to as spectral domain.) [0061] In the example of FIG. 3, power spectrums of the voice and noise beams are estimated by power spectrum calculators 112 and 114, respectively, using the frequency domain representations generated by the time to frequency converters 108 and 110. These power spectrums are then used as input to the power spectrum estimators 116 and 118 to drive estimation of an undesired noise signal and a desired voice signal. The power spectrums of the undesired and desired signals are provided as input to the signal-to-noise estimator 122 and may also be used as input to the VAD 120. A suppression gain calculator 126 receives as input signal-to-noise ratios calculated by signal-to-noise estimator 122 and information calculated by VAD 120 in order to calculate a set of suppression gains. The suppression gains are applied at 124 to the frequency domain representation generated by time to frequency convertor 110 and the suppressed output is converted back to the time domain by frequency to time convertor 128 in order to generate a noise-reduced voice output..).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Mohammad et al. to include the teaching of Ramprashad et al. above in order to generate a noise-reduced voice output.

Claim 9  is rejected under 35 U.S.C. 103 as being unpatentable over Mohammad et al. in view of in view of Premont et al.(US 2019/0073999 A1) further in view of Cohen (US 2017/0133036 A1).

Regarding Claim 9, Mohammad et al. teach: The apparatus of claim 1, wherein said automotive assistant is further configured to receive signals from one or more cameras and receive an acoustic signal derived from one of said microphones and information indicative of relevance of said acoustic signal to said dialog (See rejection of claim 6 and  [0041] In examples where the vehicle 10 includes the autonomous control system 24, the autonomous control system 24 may include various sensors and units, such as a global positioning system (GPS) unit, one or more accelerometer units, one or more gyroscope units, one or more compass units, one or more radar units, one or more LiDaR (which refers to a Light Detection and Ranging) units, one or more cameras, one or more sensors for measuring various aspects of the vehicle 10 (such as a steering wheel torque sensor, steering wheel grip sensor, one or more pedal sensors, tire sensors, tire pressure sensors), and any other type of sensor or unit that may assist in autonomous operation of vehicle 10. In this respect, the autonomous control system 24 may control operation of the vehicle 10 allowing the occupant to participate in tasks unrelated to the operation of the vehicle 10.).
Mohammad et al. do not disclose:  wherein said automotive assistant is further configured to receive first and second camera signals from first and second cameras (20), disposed in said first and second acoustic zones respectively, wherein said automotive assistant is configured to determine relevance of said second utterance to said dialog based at least in part on information provided by said second camera.
Premont et al. teach: wherein said automotive assistant is further configured to receive camera signals from  cameras (cameras to provide optical data), wherein said automotive assistant is configured to determine relevance of said second utterance (includes other sound sources, even other sources of speech) to said dialog based at least in part on information provided by said cameras ([0021] Such an approach may allow for more accurate speech detection in a noisy environment, such as those environments discussed above, since the system may perform speech recognition on sounds produced from the selected acoustic zone whilst excluding sounds produced from other acoustic zones. [0022] For instance, returning to the motor vehicle example described above, the techniques described herein may allow a passenger in a rear seat to speak a wake-up word that is recognized by one or more microphones within the vehicle even though other sounds may be present in the environment, since an acoustic zone may be defined that includes the passenger's seat and sounds produced from within that acoustic zone may be targeted for wake-up word recognition. Moreover, once the wake-up word has been recognized as being produced from the rear seat passenger's acoustic zone, subsequent sounds produced from that acoustic zone may be used as input for automatic speech recognition. In this manner, a passenger's speech, including both a wake-up word and subsequent speech commands, may be recognized in an environment that includes other sound sources, even other sources of speech. As discussed above, using a separate speech recognizer or separate instances of a speech recognizer to recognize the speech content of the acoustic signal arising from each of the acoustic zones in particular provides more robust speech recognition in such an environment. [0023] According to some embodiments, one or more hardware sensors may aid in detection of a wake-up word. Since a user must be present in an acoustic zone in order to produce speech (including a wake-up word) from within that acoustic zone, hardware sensors may be used to determine whether, in fact, a user is present in that acoustic zone. Such sensors may include any one or combination of motion sensors (e.g., to determine if any users are present in a room), pressure sensors (e.g., to determine if a user is sitting in a seat, such as a car seat), cameras to provide optical data, sensors to detect when a seat belt is engaged and/or any other suitable sensor(s) that facilitates determining whether there is a user located within an acoustic zone. If it is determined that no user is present in a particular acoustic zone, that acoustic zone need not be further considered during wake-up word detection and/or subsequent speech recognition by the system. ).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Mohammad et al. to include the teaching of Premont et al. above in order to select a channel/microphone of a specific acoustic zone producing the wake-up word, and providing subsequent sounds produced from that acoustic zone to the primary ASR.
Mohammad et al. in view of Premont et al. do not teach: automotive assistant is further configured to receive first and second camera signals from first and second cameras (20) disposed in said first and second acoustic zones respectively, wherein said automotive assistant is configured to determine relevance of said second utterance to said dialog based at least in part on information provided by said second camera.
Cohen et al. teach: First and second cameras (20) disposed in said first and second acoustic zones (position i.e. first or second physical position of the speaker based on the physical positions of the plurality of microphones) respectively, wherein said automotive assistant is configured to determine relevance of said second utterance (background noise) to said dialog based at least in part on information provided by said second camera ([0003] In a particular embodiment, a method provides receiving audio captured by the plurality of microphones at a location and receiving video captured of a scene that includes the plurality of microphones captured by a first camera at a first camera position. The method further provides identifying the plurality of microphones in the scene and determining physical positions of the plurality of microphones at the location relative to the first camera position. The method then provides adjusting the audio based on the physical positions of the plurality of microphones. [0004] In some embodiments, the method further provides identifying a speaker in the audio, determining a first physical position of the speaker based on the physical positions of the plurality of microphones, and adjusting a video camera to feature the first physical position. [0012] In some embodiments, the method provides receiving second video captured of a second scene that includes the plurality of microphones captured by a second camera at a second camera position, identifying the plurality of microphones in the second scene, determining second physical positions of the plurality of microphones at the location relative to the second camera position, and adjusting the audio based on the second physical positions of the plurality of microphones. [0030] The positions of microphones relative to one another are what allows audio enhancement algorithms to perform audio enhancement. Therefore, method 200 further provides audio management system 101 adjusting the audio based on the physical positions microphones 102 and 103 (205). For example, audio management system 101 may adjust the audio so that the voice of a speaker more pronounced relative to other sound (e.g. background noise). Other ways in which audio can be adjusted with knowledge of microphone positioning may also be used. Regardless of the way in which the audio is enhanced, audio management system 101, using an image captured by camera 104, is able to determine the positions of microphones 102 and 103 on its own. After adjustment, the audio may be stored, played back immediately, transferred to another system, transferred as an audio media stream or as an audio component of a video media stream, or used for some other purpose—including combinations thereof.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filling date of the invention was made for Mohammad et al.  in view of Premont et al. to include the teaching of Cohen et al. above in order to enhanced/adjust audio with knowledge of microphone positioning so that the voice of a speaker more pronounced relative to other sound (e.g. background noise).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Mizumoto et al. (Us 2016/0064000A1) teach: A sound source-separating device includes a sound-collecting part, an imaging part, a sound signal-evaluating part, an image signal-evaluating part, a selection part that selects whether to estimate a sound source direction based on the first sound signal or the first image signal, a person position-estimating part that estimates a sound source direction using the first image signal, a sound source direction-estimating part that estimates a sound source direction.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2656