DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 1/31/2022. Claims 1-19 are pending in the application and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Response to Amendment
The response filed on 1/31/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-19 have been examined. Applicant’s amendment to the title of the invention have been noted to overcome the objections to the specifications. Applicant’s amendments to the claims 5, 9, 14, 19 overcome the claim rejections under 35 U.S.C 112(b) previously set forth in the Non-Final Office Action mailed 11/23/2021. Amendments to claims 1, 10, 15, 9, 14, 19 indicating the detection of the change in style of the utterance for the processing of the target device, overcome the 35 U.S.C 101 rejections previously set forth in the Non-Final Office Action mailed 11/23/2021. The dependent claims 7, 12 and 17 overcome the 35 U.S.C 101 rejections previously set forth in the Non-Final Office Action mailed 11/23/2021 based on their dependency to the amended claims 1, 10 and 15 respectively.

Response to Arguments
Applicant's arguments filed 1/31/2022 have been fully considered as follows:
Applicant’s arguments with respect to claim(s) 1, 7-10, 12-15, 17-19 have been considered but 


Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 7, 8 and 9 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Baughman et. al. (U.S. Patent Application Publication 2012/0290297).
Regarding claim 1 SaganeGowda teaches a method for controlling a device according to a user's calling, the method comprising: receiving a user utterance collected by a plurality of electronic devices (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the plurality of electronic devices (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information.); and controlling an operation of the device that is a target of the user's calling, based on the determination(SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508). However SaganeGowda fails to analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network ; determininga target device among the plurality of electronic devices, which is located farther away from the user than one or more other devices among the plurality of electronic devices, based on a change in style of the user utterance being different than a pre- stored average utterance style of the user  
However, Baughman teaches analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (Baughman [0083] illustrates the stimulus is also provided to the cross-correlation calculation block 710. At 706, the stimulus interacts with the handset and room, and at 708 a signal is received from the microphone of the putative speaker's audio device. This is also supplied to block 710. The delay of the correlation peak(s) is/are extracted and compared in block 712 and the correlation profiles are compared in block 714; this process is interpreted as analyzing the style of the user utterance. Baughman [0059] teaches the speech processing related techiniques using neural networks); determininga target device among the plurality of electronic devices, which is located farther away from the user than one or more other devices among the plurality of electronic devices, based on a change in style of the user utterance being different than a pre- stored average utterance style of the user  (Baughman, [0085, 0090, 0091] and FIG. 9 teach employing the Lombard speech response for speaker and the examining of the received stimulus includes at least the Lombard analysis; the signal purportedly emanating from the live speaker within the remote environment includes a signal representative of purported live speech of the live speaker within the remote environment; and the influence of the unpredictable audio stimulus and comparing the extracted features with speaker-independent live and non-live models trained with Lombard and non-Lombard speech to determine the likelihood ratio).
SaganeGowda and Baughman are both considered to be analogous to the claimed invention because both relate to speech recognition techniques to verify the speaker intent. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using analysis and classification of the influence of the unpredictable audio stimulus teachings of Baughman to improve authentication( see Baughman [0008]).
Regarding claim 7, SaganeGowda in view of Baughman teaches the method of claim 1. SaganeGowda further teaches wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 8, SaganeGowda in view of  Baughman teaches the method of claim 1. SaganeGowda further teaches wherein the analyzing (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through the  (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 9, SaganeGowda in view of Baughman teaches the method of claim 8. Baughman further teaches wherein the determining the target device the change in the style of the user utterance being greater than a predetermined amount(Baughman, [0085, 0090, 0091] and FIG. 9 teach employing the Lombard speech response for speaker and the examining of the received stimulus includes at least the Lombard analysis; the signal purportedly emanating from the live speaker within the remote environment includes a signal representative of purported live speech of the live speaker within the remote environment; and the influence of the unpredictable audio stimulus and comparing the extracted features with speaker-independent live and non-live models trained with Lombard and non-Lombard speech to determine the likelihood ratio; the likelihood ratio is interpreted as the change of style of the user utterance compared to predetermined amount).
SaganeGowda and Baughman are both considered to be analogous to the claimed invention because both relate to speech recognition techniques to verify the speaker intent. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using analysis and classification of the influence of the unpredictable audio stimulus teachings of Baughman to improve authentication( see Baughman [0008]).
Claim 2 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Baughman et. al. (U.S. Patent Application Publication 2012/0290297) further in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 2, SaganeGowda in view of Baughman teach the method of claim 1, but fail to teach wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted at least one speech feature with a speech feature of the . However, Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation (Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers);  and comparing the extracted at least one speech feature with a speech feature of the  (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Bachman and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art see Shriberg, col1, lines 16-18).
Claims 3, 4, and 6 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Baughman et. al. (U.S. Patent Application Publication 2012/0290297) further in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Marxer, Barker, J., Alghamdi, N., & Maddock, S. (2018) “The impact of the Lombard effect on audio and visual speech recognition systems.” Speech Communication, 100, 58–68.
Regarding claim 3, SaganeGowda, in view of Baughman and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature . However Marxer teaches wherein the extracting the at least one user speech feature  (Marxer pg. 59 col 2 lines 17-20 teaches “In the temporal domain the main effect is an increase in vowel duration leading to an overall reduction in speech rate. This effect has been observed to have a linguistic dependency: the vowel lengthening is greater in content words than in function words”, teaches the Lombard effects on vowel duration).
 Marxer, pg.59 col 1, lines 34-37).
Regarding claim 4, SaganeGowda in view of Baughman and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature .  However Marxer teaches wherein the extracting the at least one user speech feature (Marxer pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
SaganeGowda, Baughman, Shriberg and Marxer are all considered to be analogous to the claimed invention because they relate to automatic speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Baughman and Shriberg on extracting at least one user speech feature and then using the vowel lengthening teachings of  Marxer, pg.59 col 1, lines 34-37).
Regarding claim 6, SaganeGowda in view of Baughman and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature . However Marxer teaches wherein the extracting the at least one user speech feature ( Marxer, pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
SaganeGowda, Baughman, Shriberg and Marxer are all considered to be analogous to the claimed invention because they relate to automatic speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Baughman and Shriberg on extracting at least one user speech feature and then using the vowel lengthening teachings of Marxer to examine potential for the Lombard effect to improve speech recognition performance (see Marxer, pg.59 col 1, lines 34-37).
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Baughman et. al. (U.S. Patent Application Publication 2012/0290297) further in view of Shriberg et. al. (U.S. Patent “Lombard Speech Recognition Based on Voiced Sound Detection and Application to the Fabric Inspection System in Factories.” Systems and computers in Japan 34.7 (2003): 10–23.
Regarding claim 5, SaganeGowda in view of Baughman and Shriberg teach the method of claim 2, however fails to teach wherein the comparing of the speech feature comprises determining whether a harmonic structure a relative to a harmonic structure in a speech signal of the pre- stored average utterance style. However Kanno teaches wherein the comparing of the speech feature comprises determining whether a harmonic structure a relative to a harmonic structure in a speech signal of the pre- stored average utterance style (Kanno, pg. 12, col. lines 6-15 teaches “The pitch-type low-band LPC analysis method is a narrow-band LPC analysis method which focuses on the low-band where the effects of noise are minimal compared to the high-band so as to be able to efficiently extract noise-contaminated voiced sound in a noisy factory. In this method, analysis is performed with the spectrum peaks for the pitch frequency and harmonics resulting from the glottal source oscillation taken to represent one all-pole model, then voiced sound is detected from the degree of the conformity”).
SaganeGowda, Baughman, Shriberg and Kanno are all considered to be analogous to the claimed invention because they relate to speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Baughman and Shriberg on extracting at least one user speech feature and then using the Lombard speech recognition based on voiced sound detection using pitch frequency and harmonics analysis teachings of Kanno to  Kanno, pg. 11).
Claims 10, 12, 13, 14, 15, 17, 18 and 19 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Foerster et. al. (U.S. Patent 9,424,841).
Regarding claim 10, SaganeGowda teaches the method for determining a response of a first device to a user's calling, the method comprising: receiving a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the first device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information); in response to the determining the first device as the target device, responding to the user utterance by executing an operation of the first devic(SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).  However, SaganeGowda fails to teach analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network; determining the first device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user measured distance. However Foerster teaches analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (Foerster, col. 7 lines 3-4, The computing device determines a loudness score for the audio data (230); interpreted as analyzing the style of user utterance, Forester, col. 4 lines 18-21 describes the hotworder may use classifying windows to process these audio features such as by using a support vector machine or a neural network); determining the first device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user  (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled see Foerster, col 1, lines 47-68).
Regarding claim 12, SaganeGowda in view of Foerster teaches the method of claim 10.  SaganeGowda further teaches wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the  ( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 13, SaganeGowda in view of Foerster teaches the method of claim 10.  SaganeGowda further teaches wherein the analyzing of the style of the user utterance comprises: selecting utterance data of the user collected by the first device (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through the  (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 14, SaganeGowda in view of Foerster teaches the method of claim 13.  Foerster further teaches wherein the determining the first device as the target device the change in the style of the user utterance being greater than a predetermined amount  (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled environment (see Foerster, col 1, lines 47-68).
Regarding claim 15, SaganeGowda teaches device configured to determine a response to a user's calling, the device comprising: at least one processor; and a memory connected to the at least one processor, the memory storing a pre-stored average utterance style for a user, wherein the processor is configured to (SaganeGowda teaches such a device as indicated in [0025], [0070] and Fig. 1 ): receive a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources), measure a distance between the user and the first device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information), in response to the determining the device as the target device, responding to the user utterance by executing an operation of the device SaganeGowda [0075] illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).  However, SaganeGowda fails to teach analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network, determine the device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance. However Foerster teaches analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (Foerster, col. 7 lines 3-4, The computing device determines a loudness score for the audio data (230); interpreted as analyzing the style of user utterance), determine the device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled environment (see Foerster, col 1, lines 47-68).
Regarding claim 17, SaganeGowda in view of Foerster teach the device of claim 15. SaganeGowda further teaches wherein the processor is further configured to collect  distance information on a distance between the user and each of a plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like), and estimate  (SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 18, SaganeGowda in view of Foerster teach the device of claim 15. SaganeGowda further teaches wherein the processor is further configured to select (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information), and extract  (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
the device of claim 18. Foerster further teaches wherein the processor is further configured to determine that the the change in the style of the user utterance being greater than a predetermined amount (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled environment (see Foerster, col 1, lines 47-68).
Claims 11 and 16 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Foerster et. al. (U.S. Patent 9,424,841) further in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 11, SaganeGowda in view of Foerster teach the method of claim 10, teach wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted speech feature with a speech feature of the .  However Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparing the extracted speech feature with a speech feature of the (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Foerster and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Foerster on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Regarding claim 16, SaganeGowda in view of Foerster teach the device of claim 15, but fails to teach wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparison of the extracted speech feature with a speech feature of the .  However Shriberg teaches wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparison of the extracted speech feature with a speech feature of the r(Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Foerster and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Foerster on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hassani (US Patent Application Publication 2017/0125038) discusses generating and associating Lombard effect speech database on neutral speech for speech recognition training (see Hassani, Fig. 5, Fig. 6).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 2:00pm - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656