DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 10/20/2020. The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5-7, 10-11, 14-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Holm (US 20210065712 A1) and further in view of Khoe et al. (US 20140035823 A1). 


As to independent claim 1, Holm teaches:
1. A method implemented by a computing device of a vehicle (see ¶ [0015]: “In accordance with various aspects of the invention, one or more of the audio interface, the image interface, the speech processing module and the speaker preprocessing module may be located within the vehicle, e.g., may include part of a local embedded system.”), the method comprising:
obtaining, using a sensor operating in association with the computing device, image data associated with a passenger in the vehicle (see ¶ [0015] citation as in preamble, above and [0042]: “…In one example, image data obtained from within a vehicle, such as from a driver and/or passenger camera [i.e., sensor], is processed to identify a person [i.e., identification of passenger] and to determine a feature vector that numerically represents certain characteristics of the person [i.e. characteristics of the passenger based on identification].”);
obtaining an identification of the passenger based on the image data (see ¶ [0015] citation as in preamble and [0042] as in previous limitation, above.);
retrieving one or more voice characteristics of the passenger based on the identification (see ¶ [0015] citation as in preamble and [0042] as in previous limitation, above and further [0042]: “…In one example, image data obtained from within a vehicle, such as from a driver and/or passenger camera, is processed to identify a person and to determine a feature vector that numerically represents certain characteristics of the person. These characteristics [i.e., characteristics obtained from feature vector determined by the image data from previous sentence] include audio characteristics, e.g., a numerical representation of expected variance within audio data for an acoustic model. In another example, image data obtained from within a vehicle, such as from a driver and/or passenger camera, is processed to determine a feature vector that numerically represents certain visual characteristics of the person, e.g., characteristics associated with an utterance by the person. In one case, the visual characteristics may be associated with a mouth area of the person, e.g., represent lip position and/or movement.”);
selecting, by the computing device, an operating condition for an additional sensor of the vehicle based on the one or more voice characteristics of the passenger (see ¶ [0015] citation as in preamble, [0042] citations as in previous limitations, above and [0045-0047]:  “[0045] … The speech processing module [i.e., associated with both the sensor and additional sensor] 130 is configured to process the image data 145 and the audio data 155 to generate a set of linguistic features 160 that are useable to parse an utterance of the person 102. [0046] A person (such as person 102) may use the configuration of FIGS. 1A and 1B to issue voice commands while operating the motor vehicle. For example, the person 102 may speak within the interior, e.g., generate an utterance, in order to control the motor vehicle or obtain information. An utterance in this context is associated with a vocal sound produced by the person and the utterance represents linguistic information such as speech. For example and in accordance with one aspect of the invention, an utterance includes speech that emanates from a larynx of the person 102. The utterance includes a voice command, e.g., a spoken request from a user. The voice command includes, for example, any one or any combination of: a request to perform an action (e.g., “Play music”, “Turn on air conditioning”, “Activate cruise control”);… [0047] …In accordance with some embodiments, the audio data 155 is captured from one audio capture device; in accordance with other embodiments, the audio data 155 is captured from multiple audio capture devices, e.g., there may be multiple microphones [i.e., additional sensor] at different positions within the interior 100. [0068] In accordance with various aspects and embodiments, the vehicle includes multiple image capture devices and multiple audio capture devices. As such, the speaker preprocessing module 320 provides further functionality to determine an appropriate facial area from one or more captured images. In accordance with one embodiment, audio data from a plurality of audio capture devices may be processed to determine a closest audio capture device associated with the utterance. […] In this manner, the speech processing apparatus 300 of FIG. 3 may be used to identify a speaker from a plurality of people within a vehicle and configure the speech processing module 330 to the specific characteristics of that speaker. [0070] In the example of FIG. 3, the vector generator 372 may compute an i-vector for one or more utterances. In a case, where there are no speaker feature vectors stored within data store 374, an i-vector may be computed by the vector generator 372 based on one or more frames of audio data for an utterance 355.” 
Here, the additional sensor is interpreted as the closest audio capture device (i.e., microphone).); and
detecting, by the additional sensor that operates in the operating condition, voice data specific to the passenger (see ¶ [0045-0047, 0068, and 0070] citations as in limitation above.).

The Examiner notes that although Holm teaches selection of a closest mic based on an utterance, it is not clear how it is based on a voice characteristic, as claimed in the instant application. For the purposes of compact prosecution, the Examiner brings in the Khoe et al. reference, as disclosed below.
Khoe et al. teaches:
selecting, by the computing device, an operating condition for an additional sensor of the vehicle based on the one or more voice characteristics of the passenger (see ¶ [0086]:  “ In some embodiments, voice capture module 910 can capture the user's voice at set intervals. The rate at which voice can be captured may be determined based on the type of language [i.e., voice characteristics of user (e.g., passenger)] that is being spoken. For example, the rate at which Spanish is captured may be at a faster rate compared to Dutch. As the amount of time people pause in between conversation (i.e., the duration of the gap in between words and/or sentences) generally differs from one language speaker to another, voice capture module 910 can intake voice in designated intervals for different languages. In some embodiments, the capture rate can be set at a default rate corresponding to the default language set to the device. The capture rate [i.e., operating condition] can be adjusted in accordance with the type of language [i.e., voice characteristics of user (e.g., passenger)] being analyzed. While in some embodiments, a voice capture module is used to capture dictated language from the user in set intervals, some embodiments allow the user's voice to be captured and analyzed in real-time.”); and
detecting, by the additional sensor that operates in the operating condition, voice data specific to the passenger (see ¶ [0086] citation above.).
Holm and Khoe et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm to incorporate the teachings of Khoe et al. of selecting, by the computing device, an operating condition for an additional sensor based on the one or more voice characteristics of the passenger and detecting, by the additional sensor that operates in the operating condition, voice data specific to the passenger which provides the benefit of providing accurate textual representation of an audio input ([0008] of Khoe et al.).

As to independent claim 10, Holm in combination with Khoe et al. teaches the limitations as in claim 1.
Holm further teaches:
10. A vehicle comprising:
a sensor (see ¶ [0007]: “In accordance with one aspect, an apparatus for a vehicle includes an audio interface configured to receive audio data from within the vehicle, an image interface configured to receive image data [i.e., sensor] from within the vehicle, and a speech processing module [i.e., additional sensor] configured to parse an utterance of the person based on the audio data and the image data.”);
an additional sensor (see ¶ [0007] citation as in limitation above.);
a processor  (see ¶ [0022]: “According to other aspects of the invention, a non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to receive audio data from an audio capture device, receive a speaker feature vector, the speaker feature vector being obtained based on image data from an image capture device, the image data featuring a facial area of a user, and parse the utterance using a speech processing module,”) configured to: [perform the limitations as in claim 1, above].
As to independent claim 19, Holm in combination with Khoe et al. teaches the limitations as in claim 1.
Holm further teaches:
19. A method comprising:
[the limitations of claim 1, above.]
obtaining additional image data associated with an additional passenger in the vehicle (see ¶ [0015] citation as in preamble of claim 1, [0042, 0045-0047, and 0068] citations as in limitations of claim 1, above and ¶ [0044 and 0051]: “[0042]…In one example, image data obtained from within a vehicle, such as from a driver and/or passenger camera, is processed to identify a person [i.e., identification of passenger] and to determine a feature vector that numerically represents certain characteristics of the person [i.e. characteristics of the passenger based on identification]. [0044] …Although a driver seat context is shown in FIG. 1A, a similar configuration may be provided for one or more passenger seats (e.g. both front and rear) [i.e., additional passenger]… [0051] Although FIG. 1A shows an example where the person 102 is a driver of a motor vehicle, in other applications, one or more image capture devices and audio capture devices may be arranged to capture image data featuring a person that is not controlling the motor vehicle, such as a passenger. For example, a motor vehicle may have a plurality of image capture devices arranged to capture image data relating to people present in one or more passenger seats of the vehicle (e.g., at different locations within the vehicle such as front and back). ”);
obtaining additional identification of the additional passenger based on the additional image data (see ¶ [0015] citation as in preamble of claim 1, [0042, 0045-0047, and 0068] citations as in limitations of claim 1, above and ¶ [0044 and 0051] citations as in limitation above.);
retrieving one or more additional voice characteristics of the additional passenger based on the identification of the additional passenger (see ¶ [0015] citation as in preamble of claim 1, [0042, 0045-0047, and 0068] citations as in limitations of claim 1, above and ¶ [0044 and 0051] citations as in limitation above and specifically [0042]:  “…In one example, image data obtained from within a vehicle, such as from a driver and/or passenger camera, is processed to identify a person and to determine a feature vector that numerically represents certain characteristics of the person. These characteristics [i.e., characteristics obtained from feature vector determined by the image data from previous sentence] include audio characteristics, e.g., a numerical representation of expected variance within audio data for an acoustic model. In another example, image data obtained from within a vehicle, such as from a driver and/or passenger camera, is processed to determine a feature vector that numerically represents certain visual characteristics of the person, e.g., characteristics associated with an utterance by the person. In one case, the visual characteristics may be associated with a mouth area of the person, e.g., represent lip position and/or movement.””);
selecting, by the computing device, a different operating condition for the additional sensor of the vehicle based on the one or more voice characteristics of the additional passenger (see ¶ [0015] citation as in preamble of claim 1, [0042, 0045-0047, and 0068] citations as in limitations of claim 1, above and ¶ [0044 and 0051] citations as in limitation above and specifically [0045-0047]: “[0045] … The speech processing module [i.e., additional sensor] 130 is configured to process the image data 145 and the audio data 155 to generate a set of linguistic features 160 that are useable to parse an utterance of the person 102. [0046] A person (such as person 102) may use the configuration of FIGS. 1A and 1B to issue voice commands while operating the motor vehicle. For example, the person 102 may speak within the interior, e.g., generate an utterance, in order to control the motor vehicle or obtain information. […]The voice command includes, for example, any one or any combination of: a request to perform an action (e.g., “Play music”, “Turn on air conditioning”, “Activate cruise control”);… [0047] …In accordance with some embodiments, the audio data 155 is captured from one audio capture device; in accordance with other embodiments, the audio data 155 is captured from multiple audio capture devices, e.g., there may be multiple microphones [i.e., additional sensor] at different positions within the interior 100. [0068] In accordance with various aspects and embodiments, the vehicle includes multiple image capture devices and multiple audio capture devices. As such, the speaker preprocessing module 320 provides further functionality to determine an appropriate facial area from one or more captured images. In accordance with one embodiment, audio data from a plurality of audio capture devices may be processed to determine a closest audio capture device associated with the utterance. […] In this manner, the speech processing apparatus 300 of FIG. 3 may be used to identify a speaker from a plurality of people within a vehicle and configure the speech processing module 330 to the specific characteristics of that speaker. [0070] In the example of FIG. 3, the vector generator 372 may compute an i-vector for one or more utterances. In a case, where there are no speaker feature vectors stored within data store 374, an i-vector may be computed by the vector generator 372 based on one or more frames of audio data for an utterance 355.” 
Here, the different operating condition is interpreted as a the change to a different audio capture device (i.e., microphone) that is closest to the additional (i.e., different) passenger.); and
detecting, by the additional sensor that operates in the different operating condition, additional voice data specific to the additional passenger (see ¶ [0045-0047, 0068, and 0070] citations as in limitation above.).

The Examiner notes that although Holm teaches selection of a closest mic based on an utterance, it is not clear how it is based on a voice characteristic, as claimed in the instant application. For the purposes of compact prosecution, the Examiner brings in the Khoe et al. reference, as disclosed below.
Khoe et al. teaches:
selecting, by the computing device, a different operating condition for the additional sensor of the vehicle based on the one or more voice characteristics of the additional passenger  (see ¶ [0086]:  “ In some embodiments, voice capture module 910 can capture the user's voice at set intervals. The rate at which voice can be captured may be determined based on the type of language [i.e., voice characteristics of user (e.g., passenger)] that is being spoken. For example, the rate at which Spanish is captured may be at a faster rate compared to Dutch. As the amount of time people pause in between conversation (i.e., the duration of the gap in between words and/or sentences) generally differs from one language speaker to another, voice capture module 910 can intake voice in designated intervals for different languages. In some embodiments, the capture rate can be set at a default rate corresponding to the default language set to the device. The capture rate [i.e., operating condition] can be adjusted in accordance with the type of language [i.e., voice characteristics of user (e.g., passenger)] being analyzed. While in some embodiments, a voice capture module is used to capture dictated language from the user in set intervals, some embodiments allow the user's voice to be captured and analyzed in real-time.”); and
detecting, by the additional sensor that operates in the different operating condition, additional voice data specific to the additional passenger (see ¶ [0086] citation above.).
Holm and Khoe et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm to incorporate the teachings of Khoe et al. of selecting, by the computing device, a different operating condition for an additional sensor based on the one or more voice characteristics of the passenger and detecting, by the additional sensor that operates in the different operating condition, voice data specific to the passenger which provides the benefit of providing accurate textual representation of an audio input ([0008] of Khoe et al.).

Regarding claims 2 and 11, Holm in combination with Khoe et al. teach:
2 and 11. The method of claim 1, further comprising:
[the last 5 limitations as in claim 19].
Regarding claims 5 and 14, Holm in combination with Khoe et al. teach all of the limitations as in claims 1 and 10. 
Holm further teaches:
5 and 14. The method of claim 1, wherein the sensor is a camera and the additional sensor is a microphone (see ¶ [0042 and 0047]: “[0042] … image data obtained from within a vehicle, such as from a driver and/or passenger camera,… [0047] … audio data 155 is captured from multiple audio capture devices, e.g., there may be multiple microphones [i.e., additional sensor]”).

Regarding claims 7 and 16, Holm in combination with Khoe et al. teach all of the limitations as in claims 2 and 11. 
Holm further teaches:
7 and 16. The method of claim 2, wherein the one or more voice characteristics of the passenger and the one or more additional voice characteristics of the additional passenger are stored locally in a database included in the vehicle (see ¶ [0015 and 0064]: “[0015] …For example, in accordance with some embodiments, the apparatus may be locally implemented within the vehicle and a further copy of at least one component of the apparatus may be implemented on a remote server device, such that certain functions are performed remotely, e.g., as well as or instead of local processing. Remote server devices may have enhanced processing resources that improve accuracy. [0064] In the example above, the use of the data store 374 to save a speaker feature vector 325 reduces run-time computational demands for an in-vehicle system. For example, the data store 374 includes a local data storage device within the vehicle and, as such, a speaker feature vector 325 is retrieved for a particular user from the data store 374 rather than being computed by the vector generator 372. [0087] …In FIG. 6, the speech processing module 600 includes a database of acoustic model configurations 632, an acoustic model selector 634 and an acoustic model instance 636. The database of acoustic model configurations 632 stores a number of parameters to configure an acoustic model. In this example, the acoustic model instance 636 includes a general acoustic model that is instantiated (e.g., configured or calibrated) using a particular set of parameter values from the database of acoustic model configurations 632. For example, the database of acoustic model configurations 636 stores a plurality of acoustic model configurations. Each acoustic model configuration is associated with a different user, including one or more default acoustic model configurations that are used if a user is not detected or a user is detected but not specifically recognized.”), and 
wherein the one or more voice characteristics of the passenger and the one or more additional voice characteristics of the additional passenger are included as part of a profile accessible by the computing device/processor (see ¶ [0012-0013]: “[0012]…The static speaker feature vector may be stored within a memory of the vehicle. This again can improve speech processing capabilities within resource-constrained vehicle computing systems. [0013] In accordance with one embodiment, the apparatus includes memory configured to store one or more user profiles. In this case, the speaker preprocessing module is configured to perform facial recognition on the image data to identify a user profile within the memory associated with the person within the vehicle, compute a speaker feature vector for the person, store the speaker feature vector in the memory, and associate the stored speaker feature vector with the identified user profile.”).

Claims 3, 8-9, 12, 17-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Holm (US 20210065712 A1) in combination with Khoe et al. (US 20140035823 A1) as applied to claims 1 and 10 above, and further in view of Huang et al. (US 20130185072 A1). 

Regarding claims 3, 12, and 20, Holm in combination with Khoe et al. teach all of the limitations as in claims 2, 11, and 19, above.
However, Holm in combination with Khoe et al. does not explicitly teach, but Huang et al. does teach:
3, 12, and 20. The method of claim 2, wherein the one or more voice characteristics of the passenger and the one or more additional voice characteristics of the additional passenger include pitch, tone, or speech rate (see ¶ [0050 and 0059]: “[0050] … The in-vehicle system 102 determines a characteristic of the command and/or voice, e.g., voice type, frequency, speed, prosody, intelligibility, tone, pitch, size, direction, volume, and/or duration. . [0059] In an embodiment the external voice recognition module 420 may include a user profile for the vehicle. In alternate embodiments there may be multiple user profiles associated with the vehicle where each user profile assists the external voice recognition module 420 perform voice recognition of the voice command. For example, a profile may include information such as a user's accent, pronunciation, sentence structure, speech training data, information about the tendencies of a user,…”).
Holm in combination with Khoe et al. and Huang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in voice/speech recognition/analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm in combination with Khoe et al. to incorporate the teachings of Huang et al. wherein the one or more voice characteristics of the passenger and the one or more additional voice characteristics of the additional passenger include pitch, tone, or speech rate which provides the benefit of determining a characteristic of the command and/or voice ([0050] of Huang et al.).

Regarding claims 8 and 17, Holm in combination with Khoe et al. teach all of the limitations as in claims 2 and 11. 
Holm further teaches:
8 and 17. The method of claim 2, 
wherein the one or more voice characteristics of the passenger and the one or more additional voice characteristics of the additional passenger are included as part of a profile accessible by the computing device/processor (see ¶ [0012-0013] citations as in claims 7 and 16, above.)


However, Holm in combination with Khoe et al.  does not explicitly teach, but Huang et al. does teach:
wherein the one or more voice characteristics of the passenger and one or more additional voice characteristics of the additional passenger are stored in a server external to the vehicle (see ¶ [0047 and 0059]: “[0047] …the results of the voice recognition analysis can be stored 510 in the vehicle in a voice profile (and optionally also sent to the server 110 for storing in a voice profile located off the vehicle) associated with the user or the driver of the vehicle. Thereafter, the command can be executed 534 or the information requested can be presented to the user. [0059] In an embodiment the external voice recognition module 420 may include a user profile for the vehicle. In alternate embodiments there may be multiple user profiles associated with the vehicle where each user profile assists the external voice recognition module 420 perform voice recognition of the voice command. For example, a profile may include information such as a user's accent, pronunciation, sentence structure, speech training data, information about the tendencies of a user,…”)
Holm in combination with Khoe et al. and Huang et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in voice/speech recognition/analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have substituted the storage of voice characteristics in a local database as taught by Holm in combination with Khoe et al. with the use of a server for storage of the voice characteristics off the vehicle as taught by Huang et al. in order to yield predictable results providing the user with the information remotely. (See KSR v. Teleflex).
Regarding claims 9 and 18, Holm in combination with Khoe et al. and Huang et al. teach all of the limitations as in claims 8 and 17, above.
Holm further teaches:
9 and 18. The method of claim 8, further comprising retrieving, prior to selecting the operation condition for the additional sensor, the one or more voice characteristics of the passenger from the server external to the vehicle (see ¶ [0045-0047, 0110, and 0113]: “[0045] … The speech processing module [i.e., associated with both the sensor and additional sensor] 130 is configured to process the image data 145 and the audio data 155 to generate a set of linguistic features 160 that are useable to parse an utterance of the person 102. [0046] A person (such as person 102) may use the configuration of FIGS. 1A and 1B to issue voice commands while operating the motor vehicle. For example, the person 102 may speak within the interior, e.g., generate an utterance, in order to control the motor vehicle or obtain information. An utterance in this context is associated with a vocal sound produced by the person and the utterance represents linguistic information such as speech. For example and in accordance with one aspect of the invention, an utterance includes speech that emanates from a larynx of the person 102. The utterance includes a voice command, e.g., a spoken request from a user. The voice command includes, for example, any one or any combination of: a request to perform an action (e.g., “Play music”, “Turn on air conditioning”, “Activate cruise control”);… [0047] …In accordance with some embodiments, the audio data 155 is captured from one audio capture device; in accordance with other embodiments, the audio data 155 is captured from multiple audio capture devices, e.g., there may be multiple microphones [i.e., additional sensor] at different positions within the interior 100. [0068] In accordance with various aspects and embodiments, the vehicle includes multiple image capture devices and multiple audio capture devices. As such, the speaker preprocessing module 320 provides further functionality to determine an appropriate facial area from one or more captured images. In accordance with one embodiment, audio data from a plurality of audio capture devices may be processed to determine a closest audio capture device associated with the utterance. […] In this manner, the speech processing apparatus 300 of FIG. 3 may be used to identify a speaker from a plurality of people within a vehicle and configure the speech processing module 330 to the specific characteristics of that speaker. [0070] In the example of FIG. 3, the vector generator 372 may compute an i-vector for one or more utterances. In a case, where there are no speaker feature vectors stored within data store 374, an i-vector may be computed by the vector generator 372 based on one or more frames of audio data for an utterance 355.” [0110] […]; if a connection to the remote server device is available then one or more of the audio data, image data and speaker feature vector may be transmitted to the remote server device for parsing a captured utterance. [0113] In a case where a speech processing module is remote from the vehicle [i.e., server external to the vehicle], a local speech processing apparatus includes a transceiver to transmit data derived from one or more of audio data, image data and the speaker feature vector [i.e., associated with the voice characteristics as discussed, above] to the speech processing module and to receive control data [i.e., associated with the utterance (e.g., to control the motor vehicle)] from the parsing of the utterance. [0114] …In one case, the utterance data 442 may be used by the control unit 1010 of automobile 1005 and used to implement a voice command. In one case, the utterance parser 436 may be located within a remote server device and utterance parsing [i.e., associated with the feature vector and hence the voice characteristics] may involve identifying an appropriate service to execute the voice command [i.e., operation condition happening after retrieval/processing of utterance associated with the voice characteristics as discussed, above] from the output of the speech processing module.”
Here, the operating condition is interpreted as the selection of closest audio capture device (i.e., microphone) to the utterance (e.g., passenger).).

The Examiner notes that although Holm teaches selection of a closest mic based on an utterance, it is not clear how it is based on a voice characteristic, as claimed in the instant application. For the purposes of compact prosecution, the Examiner brings in the Khoe et al. reference, as disclosed below.
Khoe et al. teaches:
selecting the operation condition for the additional sensor (see ¶ [0086]:  “ In some embodiments, voice capture module 910 can capture the user's voice at set intervals. The rate at which voice can be captured may be determined based on the type of language [i.e., voice characteristics of user (e.g., passenger)] that is being spoken. For example, the rate at which Spanish is captured may be at a faster rate compared to Dutch. As the amount of time people pause in between conversation (i.e., the duration of the gap in between words and/or sentences) generally differs from one language speaker to another, voice capture module 910 can intake voice in designated intervals for different languages. In some embodiments, the capture rate can be set at a default rate corresponding to the default language set to the device. The capture rate [i.e., operating condition] can be adjusted in accordance with the type of language [i.e., voice characteristics of user (e.g., passenger)] being analyzed. While in some embodiments, a voice capture module is used to capture dictated language from the user in set intervals, some embodiments allow the user's voice to be captured and analyzed in real-time.”); and
Holm and Khoe et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm to incorporate the teachings of Khoe et al. of selecting the operation condition for the additional sensor which provides the benefit of providing accurate textual representation of an audio input ([0008] of Khoe et al.).

Claims 4, 6, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Holm (US 20210065712 A1) in combination with Khoe et al. (US 20140035823 A1) as applied to claims 1 and 10 above, and further in view of Penilla et al. (US 20160104486 A1). 

Regarding claims 4 and 13, Holm in combination with Khoe et al. teach all of the limitations as in claims 2 and 11, above.
However, Holm in combination with Khoe et al.  doe not explicitly teach, but Penilla et al. does teach:
4 and 13. The method of claim 2, 
wherein the different operating condition of the additional sensor [as taught by Holm in claims 1 and 10] 
includes a microphone setting corresponding to a speech rate value associated with the additional passenger (see ¶ [0136]: “In the embodiments described herein, mood and emotions can be used to modify the response provided by a vehicle, e.g., in response to a voice driven input. In one embodiment, sensors can be used to detect changes in the autonomic nervous system that are exposed by a user's speech [i.e., passenger]…Vocal parameters and prosody features such as pitch variables and speech rate may be analyzed through pattern recognition, e.g., using one or more microphones of a vehicle [i.e., microphone setting corresponding to speech rate value].”).
Holm in combination with Khoe et al.  and Penilla et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm in combination with Khoe et al. to incorporate the teachings of Penilla et al. of including a microphone setting corresponding to a speech rate value associated with the additional passenger which provides the benefit of producing a system capable of recognizing affect based on extracted features of speech ([0136] of Penilla et al.).

Regarding claims 6 and 15, Holm in combination with Khoe et al. teach all of the limitations as in claims 1 and 10, above.
However, Holm in combination with Khoe et al.  do not explicitly teach, but Penilla et al. does teach:
6 and 15. The method of claim 1, wherein the operating condition of the additional sensor includes a microphone setting corresponding to a speech rate value associated with the passenger (see ¶ [0136] citation as in claims 4 and 13 above).
Holm in combination with Khoe et al.  and Penilla et al. are both considered to be analogous to the claimed invention because they are in the same field of endeavor in speech processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Holm in combination with Khoe et al. to incorporate the teachings of Penilla et al. of including a microphone setting corresponding to a speech rate value associated with the additional passenger which provides the benefit of producing a system capable of recognizing affect based on extracted features of speech ([0136] of Penilla et al.).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659                                                                                                                                                                                                        
/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        
09/23/2022