DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Request for Continued Examination filed on 06/27/2022 based on the Applicant's submission filed on 05/31/2022 has been entered. Claims 1-19 are pending in the application and have been examined.
 
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Response to Amendment
The response filed on 05/31/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-19 have been examined. Amendments to claims 1, 10, 15 indicating the determining the device to perform the operation based on the Applicant’s Specifications [00200-00202] has been noted and examined.

Response to Arguments
Applicant's arguments filed 05/31/2022 have been fully considered as follows:
Applicant’s arguments with respect to amended claim 1 on page 10 state that
“SaganeGowda merely discloses that the devices are enabled whenever the user with within a generic specified distance or radius (e.g., each device is enabled for response if the user is within a default 20 foot distance)...”
	
The examiner respectfully disagrees, SaganeGowda teaches “Operation 508 illustrates determining whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences” in SaganeGowda, [0074],  after measuring the distance between the user and the location of the device, it is determined if the device enabled can perform an operation corresponding to the user utterance. Therefore, SaganeGowda teaches a device that can perform an operation corresponding to the user utterance and therefore, the rejections of Claims 1, 10 and 15 are rejected under 35 U.S.C. 103 are sustained and further updated accordingly.
Applicant’s further arguments with respect to amended claim 1 on pg. 10 state that
“Baughman is merely directed to a user voice authentication method, which is
NOT relevant to the embodied invention...”

Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the art rejections of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 05/31/2022, Examiner respectfully notes as follows. For completeness, should the mentioned claims are likewise traversed for similar reasons to independent claims 1, 10 and 15 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1, 10 and 15 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 7, 8 and 9 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Kim et. al. (U.S. Patent 11,315,553).
Regarding claim 1 SaganeGowda teaches a method for controlling a device according to a user's calling, the method comprising: receiving a user utterance collected by a plurality of electronic devices (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the plurality of electronic devices (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information.); and controlling an operation of the device that is a target of the user's calling, based on the determination(SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508). However SaganeGowda fails to teach extracting at least one electronic device that can perform an operation corresponding to the user utterance among the plurality of electronic devices; analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network ; determining a target device among the at least one electronic devicethe at least one electronic devicthe at least one electronic device 
However, Kim teaches extracting at least one electronic device that can perform an operation corresponding to the user utterance among the plurality of electronic devices; (see Kim, col 11 lines 8-20 For this purpose, the first electronic device 10 may also transmit information on the target function (or target application) using the voice recognition result to the second electronic device 20 together with the voice recognition result corresponding to the voice. The second electronic device 20 may determine whether the second electronic device 20 supports information about a target function, and if it is determined that the second electronic device 20 does not support the function, the second electronic device 20 may ignore the voice data regarding the voice and the voice recognition result corresponding to the voice without using them as data for training; interpreted as extracting one device that can perform the operation corresponding to the user utterance  ); analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (see Kim, col 15, lines 42-52, In this case, since the voice recognition result provided by the first electronic device 10 to the second electronic device 20 corresponds to voices from time t1 to time t6, there may be difference with the voices which the second electronic device 20 recognize from time t2 to time t7. Accordingly, the first electronic device 10 may transmit the time information related to the voice corresponding to the voice recognition result and the time information on which the wake-up command is transmitted to the second electronic device 20 together with the voice recognition result. See Kim, col 11 lines 22-37 for speech recognition neural network; the processing at the second device along with the voice recognition result and timing from first device is interpreted as analyzing the style of the user utterance); determining a target device among the at least one electronic devicethe at least one electronic devicthe at least one electronic device (see Kim, col 12 lines 64-col 13 lines 6 Accordingly, in general, the second electronic device 20 or the third electronic device 30 located far from the user 1 has a problem that it is difficult to accurately recognize the voice uttered by the user 1, but according to the disclosure, the voice recognition model may be trained using the accurate voice recognition result together with the voice of the user 1 at the far distance, so that the second electronic device 20 or the third electronic device 30 may provide an accurate voice recognition result even for the voice uttered by the user 1 at a far distance; interpreted as the second or third device (interpreted as target device) located farther away from the user based on the change in style of the user and the distance from the user will be trained to recognize the command to determine the target device).
SaganeGowda and Kim are both considered to be analogous to the claimed invention because both relate to speech recognition techniques to verify the speaker intent. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using training the speech recognition model to account for the actual utterance environment  see Kim, col 2, lines 18-32).
Regarding claim 7, SaganeGowda in view of Kim teaches the method of claim 1. SaganeGowda further teaches wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 8, SaganeGowda in view of  Kim teaches the method of claim 1. SaganeGowda further teaches wherein the analyzing (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through the speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 9, SaganeGowda in view of Kim teaches the method of claim 8. Kim further teaches wherein the determining the target device comprises determining that a device located at a farthest distance from the user is being called in response to the change in the style of the user utterance being greater than a predetermined amount (see Kim, col 12 lines 64-col 13 lines 6 Accordingly, in general, the second electronic device 20 or the third electronic device 30 located far from the user 1 has a problem that it is difficult to accurately recognize the voice uttered by the user 1, but according to the disclosure, the voice recognition model may be trained using the accurate voice recognition result together with the voice of the user 1 at the far distance, so that the second electronic device 20 or the third electronic device 30 may provide an accurate voice recognition result even for the voice uttered by the user 1 at a far distance. See Kim, col 16, lines 50-62, the processor determines the distance of the device from the user and the specific time of the voice uttered by the user to train the second device; interpreted as the second or third device (interpreted as target device) located farther away from the user based on the change in style of the user and the distance from the user will be trained to recognize the command to determine the target device and the processer training of the device based on the distance of the device from the user is interpreted as the change of style of the user utterance compared to a predetermined amount).
Claim 2 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Kim et. al. (U.S. Patent 11,315,553) further in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 2, SaganeGowda in view of Kim teach the method of claim 1, but fail to teach wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted at least one speech feature with a speech feature of the pre-stored average utterance style for the user. However, Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation (Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers);  and comparing the extracted at least one speech feature with a speech feature of the pre-stored average utterance style for the user (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Kim and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Kim on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Claims 3, 4, and 6 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Kim et. al. (U.S. Patent 11,315,553) further in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Marxer, Barker, J., Alghamdi, N., & Maddock, S. (2018) “The impact of the Lombard effect on audio and visual speech recognition systems.” Speech Communication, 100, 58–68.
Regarding claim 3, SaganeGowda, in view of Kim and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises determining whether the utterance time of the vowel section in the user utterance is greater than or equal to a preset time. However, Marxer teaches wherein the extracting the at least one user speech feature comprises determining whether the utterance time of the vowel section in the user utterance is greater than or equal to a preset time (Marxer pg. 59 col 2 lines 17-20 teaches “In the temporal domain the main effect is an increase in vowel duration leading to an overall reduction in speech rate. This effect has been observed to have a linguistic dependency: the vowel lengthening is greater in content words than in function words”, teaches the Lombard effects on vowel duration).
SaganeGowda, Kim, Shriberg and Marxer are all considered to be analogous to the claimed invention because they relate to automatic speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Baughman and Shriberg on extracting at least one user speech feature and then using the vowel lengthening teachings of Marxer to examine potential for the Lombard effect to improve speech recognition performance (see Marxer, pg.59 col 1, lines 34-37).
Regarding claim 4, SaganeGowda in view of Kim and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises: deriving a base frequency from a speech signal of the user utterance; and extracting a rising section from the derived base frequency. However, Marxer teaches wherein the extracting the at least one user speech feature comprises: deriving a base frequency from a speech signal of the user utterance; and extracting a rising section from the derived base frequency(Marxer pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Regarding claim 6, SaganeGowda in view of Kim and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises: deriving a spectrum from the speech signal of the user utterance; and extracting a slope reduction section from the derived spectrum. However, Marxer teaches wherein the extracting the at least one user speech feature comprises: deriving a spectrum from the speech signal of the user utterance; and extracting a slope reduction section from the derived spectrum ( Marxer, pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Kim et. al. (U.S. Patent 11,315,553) further in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Kanno, Sukeyasu, and Testuo Funada. “Lombard Speech Recognition Based on Voiced Sound Detection and Application to the Fabric Inspection System in Factories.” Systems and computers in Japan 34.7 (2003): 10–23.
Regarding claim 5, SaganeGowda in view of Kim and Shriberg teach the method of claim 2, however fails to teach wherein the comparing of the speech feature comprises determining whether a harmonic structure in a speech signal of the user utterance has increased relative to a harmonic structure in a speech signal of the pre-stored average utterance style. However Kanno teaches wherein the comparing of the speech feature comprises determining whether a harmonic structure in a speech signal of the user utterance has increased relative to a harmonic structure in a speech signal of the pre-stored average utterance style(Kanno, pg. 12, col. lines 6-15 teaches “The pitch-type low-band LPC analysis method is a narrow-band LPC analysis method which focuses on the low-band where the effects of noise are minimal compared to the high-band so as to be able to efficiently extract noise-contaminated voiced sound in a noisy factory. In this method, analysis is performed with the spectrum peaks for the pitch frequency and harmonics resulting from the glottal source oscillation taken to represent one all-pole model, then voiced sound is detected from the degree of the conformity”).
SaganeGowda, Kim, Shriberg and Kanno are all considered to be analogous to the claimed invention because they relate to speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Kim and Shriberg on extracting at least one user speech feature and then using the Lombard speech recognition based on voiced sound detection using pitch frequency and harmonics analysis teachings of Kanno to examine potential for the Lombard effect to improve speech recognition performance (see Kanno, pg. 11).
Claims 10, 12, 13, 14, 15, 17, 18 and 19 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Foerster et. al. (U.S. Patent 9,424,841).
Regarding claim 10, SaganeGowda teaches the method for determining a response of a first device to a user's calling, the method comprising: receiving a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the first device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information); determining whether the first device can perform an operation corresponding to the user utterance (see SaganeGowda, [0074], Operation 508 illustrates determining whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences; interpreted as the device can perform an operation corresponding to the user utterance); in response to the determining the first device as the target device, responding to the user utterance by executing an operation of the first device (SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).  However, SaganeGowda fails to teach analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network; determining, when the first device can perform the operation corresponding to the user utterance, the first device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance. However, Foerster teaches analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (Foerster, col. 7 lines 3-4, The computing device determines a loudness score for the audio data (230); interpreted as analyzing the style of user utterance, Forester, col. 4 lines 18-21 describes the hotworder may use classifying windows to process these audio features such as by using a support vector machine or a neural network); determining, when the first device can perform the operation corresponding to the user utterance, the first device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data (interpreted by the device is determined can perform the operation corresponding to the user utterance as indicated in earlier portion of the claim)).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled environment (see Foerster, col 1, lines 47-68).
Regarding claim 12, SaganeGowda in view of Foerster teaches the method of claim 10.  SaganeGowda further teaches wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices ( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 13, SaganeGowda in view of Foerster teaches the method of claim 10.  SaganeGowda further teaches wherein the analyzing of the style of the user utterance comprises: selecting utterance data of the user collected by the first device (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through the speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 14, SaganeGowda in view of Foerster teaches the method of claim 13.  Foerster further teaches wherein the determining the first device as the target device comprises determining that the first device is a device located at a farthest distance from the user in response to the change in the style of the user utterance being greater than a predetermined amount (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
Regarding claim 15, SaganeGowda teaches device configured to determine a response to a user's calling, the device comprising: at least one processor; and a memory connected to the at least one processor, the memory storing a pre-stored average utterance style for a user, wherein the processor is configured to (SaganeGowda teaches such a device as indicated in [0025], [0070] and Fig. 1 ): receive a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources), measure a distance between the user and the device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information), determine whether the first device can perform an operation corresponding to the user utterance (see SaganeGowda, [0074], Operation 508 illustrates determining whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences; interpreted as the device enabled to perform an operation corresponding to the user utterance), and in response to determining the device as the target device, respond to the user utterance by executing an operation of the device (SaganeGowda [0075] illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).  However, SaganeGowda fails to teach analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network, determine, when the device can perform the operation corresponding to the user utterance, the device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance. However Foerster teaches analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (Foerster, col. 7 lines 3-4, The computing device determines a loudness score for the audio data (230); interpreted as analyzing the style of user utterance), determine, when the device can perform the operation corresponding to the user utterance, the device as a target device, which is located farther away from the user than one or more other devices among a plurality of electronic devices, based on a change in style of the user utterance being different than a pre-stored average utterance style of the user and the measured distance (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data(interpreted by the device is determined can perform the operation corresponding to the user utterance as indicated in earlier portion of the claim)).
SaganeGowda and Foerster are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device based on the proximity of the user to the device with the users’ interaction with the voice enabled system teachings of Foerster to improve response of a voice enabled environment (see Foerster, col 1, lines 47-68).
Regarding claim 17, SaganeGowda in view of Foerster teach the device of claim 15. SaganeGowda further teaches wherein the processor is further configured to collect distance information on a distance between the user and each of a plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like), and estimate a distance between the user and each of the plurality of electronic devices based on the collected distance information, when the distance is measured (SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 18, SaganeGowda in view of Foerster teach the device of claim 15. SaganeGowda further teaches wherein the processor is further configured to select utterance data of the user collected by the device (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information), and extract a user speech feature from the selected utterance data through the speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 19, SaganeGowda in view of Foerster teach the device of claim 18. Foerster further teaches wherein the processor is further configured to determine that the device is a device at a farthest distance from the user in response to the change in the style of the user utterance being greater than a predetermined amount (Foerster, col. 6 lines 38-43, col. 7 lines 17-20, lines 27-47 the utterance is analyzed for the hotword likelihood ( interpreted as the pre-stored average utterance style of the user). The loudness (interpreted as change in style of the user being different than pre-stored average utterance) of the audio data received by the computing device may reflect a distance between the computing device and the source of the audio (interpreted as the measured distance). The loudness score is determined and compared to a threshold and based on the determination, and the computing device processes the audio data).
Claims 11 and 16 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Foerster et. al. (U.S. Patent 9,424,841) further in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 11, SaganeGowda in view of Foerster teach the method of claim 10, but fail to teach wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted speech feature with a speech feature of the .  However Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparing the extracted speech feature with a speech feature of the (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Foerster and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Foerster on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Regarding claim 16, SaganeGowda in view of Foerster teach the device of claim 15, but fails to teach wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparison of the extracted speech feature with a speech feature of the .  However Shriberg teaches wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparison of the extracted speech feature with a speech feature of the r(Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Foerster and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Foerster on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hassani (US Patent Application Publication 2017/0125038) discusses generating and associating Lombard effect speech database on neutral speech for speech recognition training (see Hassani, Fig. 5, Fig. 6).
Lee ( US Patent 9,779,734) discusses a speech recognition method that improves recognition accuracy of the recognized commands to control target devices regardless of distances between target devices and a speaker (see Lee, col 5, lines 60-col 6, lines 5).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/            Examiner, Art Unit 2656                                                                                                                                                                                            
/BHAVESH M MEHTA/            Supervisory Patent Examiner, Art Unit 2656