DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 09/02/2022. Claims 1-19 are pending in the application and have been examined.
	
Notice of Pre-AIA  or AIA  Status
The present application is being examined under the pre-AIA  first to invent provisions. 

Response to Amendment
The response filed on 09/02/2022 has been correspondingly accepted and considered in this Office Action. Claims 1-19 have been examined. Amendments to claims 1, 9, 10 and 14-15 have been noted.

Response to Arguments
Applicant's arguments filed 09/02/2022 have been fully considered as follows:
Applicant’s arguments with respect to amended claim 1 on page 12 state that
“In contrast to the embodied invention, Kim is not concerned with selecting an individual target device from among a plurality of devices able to respond to a user utterance, which matches the user's intention, even if the intended device is located farther away from the user than other devices that are also able to respond to the user utterance. 
In even further contrast, Kim does not select a different individual device from among a plurality of devices based on whether or not a user utterance matches a pre-stored average utterance. Kim remains completely silent with regards to these features. 
...”
	Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the art rejections of the remainder of independent and dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 09/02/2022, Examiner respectfully notes as follows. For completeness, should the mentioned independent claims 10 and 15 be likewise traversed for similar reasons to independent claim 1 and should the mentioned dependent claims be likewise traversed for similar reasons to independent claims 1, 10 and 15 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1, 10 and 15 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 7-10, 12-15 and 17-19  are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Meyers et. al. (U.S. Patent Application Publication 2017/0083285).
Regarding claim 1 SaganeGowda teaches a method for controlling a device according to a user's calling, the method comprising: receiving a user utterance collected by a plurality of electronic devices (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the plurality of electronic devices (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information.); and controlling an operation of the device that is a target of the user's calling, based on the determination(SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508). However, SaganeGowda fails to teach extracting a first electronic device and a second electronic device configured to  in response to the style of the user utterance corresponding to a pre-stored average utterance style of the user and the first electronic device being located a shorter distance away from the user than the second electronic device, selecting the first electronic device as an individual target device and controlling an operation of the first electronic device based on the user utterance while the second electronic device does not respond to the user utterance; and in response to the style of the user utterance being different than the pre-stored average utterance style of the user and the second electronic device being located a farther distance away from the user than the first electronic device, selecting the second electronic device as the individual target device and controlling an operation of the second electronic device based on the user utterance while the first electronic device does not respond to the user utterance.
However, Meyers teaches extracting a first electronic device and a second electronic device configured to (see Meyers, [0020-23] Two devices 102 may be located near enough to each other so that both of the devices 102 may detect an utterance of the user 104.  Sound corresponding to a spoken user request 106 is received by each of the devices 102. The device 102 may detect the wakeword and interpret subsequent user speech as being directed to the device 102. The speech interface device 102 may begin providing an audio signal to a speech recognition system for detecting and responding to subsequent user utterances);  analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (see Meyers, [0029] The metadata 110 may comprise various information that can be used to determine or infer the proximity of the user 104 relative to the respective device 102 and more generally that can be used to determine which of the devices 102 a speech response or other action should be directed to. Proximity in this environment may correspond to either or both of physical proximity and temporal proximity. The metadata 110 may include other information such as the signal energy of the audio signal 108 and/or a level of voice presence in the audio signal 108 as detected by the speech interface device 102; see Meyers [0060-0062] analyses the utterance based on the audio signal and metadata); in response to the style of the user utterance corresponding to a pre-stored average utterance style of the user and the first electronic device being located a shorter distance away from the user than the second electronic device, selecting the first electronic device as an individual target device and controlling an operation of the first electronic device based on the user utterance while the second electronic device does not respond to the user utterance (see Meyers, [0066] If the action 210 determines that the first and second audio signals 108(a) and 108(b) do represent the same user utterance, an action 212 is performed of arbitrating between the corresponding devices 102(a) and 102(b) to determine which of the devices will provide a response to the single user utterance that was detected and provided by both of the devices 102(a) and 102(b). The action 212 may comprise comparing attributes indicated by the metadata 110 for each of the audio signals 108. The device whose audio signal 108 has the strongest set of attributes is selected as the winner of the arbitration. See Meyers [0067] If the first device 102(a) wins the arbitration, an action 214 is performed of processing and responding to the first audio signal 108(a), including producing an appropriate response by the first device 102(a) to the user command represented by the first audio signal 108(a). An action 216 comprises canceling the processing of the second audio signal 108(b) and canceling any response that might otherwise have been provided based on the second audio signal 108(b), including any response that might have otherwise been given by the device 102(b). see Meyers, [0069] The arbitration action 212 may comprise determining which of the devices 102(a) and 102(b) is physically or acoustically nearest the user 104 and selecting the nearest device to provide a response to the user request 106 ); and in response to the style of the user utterance being different than the pre-stored average utterance style of the user and the second electronic device being located a farther distance away from the user than the first electronic device, selecting the second electronic device as the individual target device and controlling an operation of the second electronic device based on the user utterance while the first electronic device does not respond to the user utterance(see Meyers, 0068] If the second device 102(b) wins the arbitration, an action 218 is performed of processing and responding to the second audio signal 108(b), including producing an appropriate response by the second device 102(b) to the user command represented by the second audio signal 108(b). An action 220 comprises canceling the processing of the first audio signal 108(a) and canceling any response that might otherwise have been provided based on the first audio signal 108(a), including any response that might have otherwise been given by the first device 102(a)).
SaganeGowda and Meyers are both considered to be analogous to the claimed invention because both relate to speech recognition techniques to verify the speaker intent. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the device selection teachings of Meyers to selects one of the devices that is to respond to the user command ( see Meyers, [0016]).
Regarding claim 7, SaganeGowda in view of Meyers teaches the method of claim 1. SaganeGowda further teaches wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 8, SaganeGowda in view of  Meyers teaches the method of claim 1. SaganeGowda further teaches wherein the analyzing a closest distance from the user, among utterance data of the user collected by the plurality of electronic devices(SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through the speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 9, SaganeGowda in view of Meyers teaches the method of claim 8. Meyers further teaches (see Meyers, [0076] In some cases, the action 208 may comprise determining that the user request 106 relates to an activity that is currently being performed by one of the devices 102 and selecting the same device 102 to respond to the request 106. For example, the first device 102(a) may be playing music and the user request may comprise a “stop” command. The user request can be interpreted as relating to current activity of the first device 102(a), and the first device 102(a) is therefore selected as the device that should respond to the “stop” request; the first device may be farther from the user ).
Regarding claim 10, SaganeGowda teaches the method for determining a response of a first device to a user's calling, the method comprising: receiving a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the first device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information); determining whether the first device can perform an operation corresponding to the user utterance (see SaganeGowda, [0074], Operation 508 illustrates determining whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences; interpreted as the device can perform an operation corresponding to the user utterance).  However, SaganeGowda fails to teach analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network; and in response to the style of the user utterance being different than a pre-stored average utterance style of the user and the first electronic device being located a farther distance away from the user than one or more other electronic devices able to respond to the user utterance, selecting the first electronic device as an individual target device and controlling an operation of the first electronic device based on the user utterance while the one or more other electronic devices do not respond to the user utterance. 
However, Meyers teaches analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (see Meyers, [0020-23] Two devices 102 may be located near enough to each other so that both of the devices 102 may detect an utterance of the user 104.  Sound corresponding to a spoken user request 106 is received by each of the devices 102. The device 102 may detect the wakeword and interpret subsequent user speech as being directed to the device 102. The speech interface device 102 may begin providing an audio signal to a speech recognition system for detecting and responding to subsequent user utterances); and in response to the style of the user utterance being different than a pre-stored average utterance style of the user and the first electronic device being located a farther distance away from the user than one or more other electronic devices able to respond to the user utterance, selecting the first electronic device as an individual target device and controlling an operation of the first electronic device based on the user utterance while the one or more other electronic devices do not respond to the user utterance (see Meyers, [0076] In some cases, the action 208 may comprise determining that the user request 106 relates to an activity that is currently being performed by one of the devices 102 and selecting the same device 102 to respond to the request 106. For example, the first device 102(a) may be playing music and the user request may comprise a “stop” command. The user request can be interpreted as relating to current activity of the first device 102(a), and the first device 102(a) is therefore selected as the device that should respond to the “stop” request; the first device may be farther from the user ).
SaganeGowda and Meyers are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the device selection teachings of Meyers to selects one of the devices that is to respond to the user command (see Meyers, [0016]).
Regarding claim 12, is directed to a method claim for a response of a first device corresponding to the method claim presented in claim 7 and is rejected under the same grounds stated above regarding claim 7.
Regarding claim 13, is directed to a method claim for a response of a first device corresponding to the method claim presented in claim 8 and is rejected under the same grounds stated above regarding claim 8.
Regarding claim 14, is directed to a method claim for a response of a first device corresponding to the method claim presented in claim 9 and is rejected under the same grounds stated above regarding claim 9.
Regarding claim 15, SaganeGowda teaches device configured to determine a response to a user's calling, the device comprising: at least one processor; and a memory connected to the at least one processor, the memory storing a pre-stored average utterance style for a user, wherein the processor is configured to (SaganeGowda teaches such a device as indicated in [0025], [0070] and Fig. 1 ): receive a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources), measure a distance between the user and the device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information), determine whether the first device can perform an operation corresponding to the user utterance (see SaganeGowda, [0074], Operation 508 illustrates determining whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences; interpreted as the device enabled to perform an operation corresponding to the user utterance).  However, SaganeGowda fails to teach analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network, and in response to the style of the user utterance being different than a pre-stored average utterance style of the user and the device being located a farther distance away from the user than one or more other devices able to respond to the user utterance, select the device as an individual target device and control an operation of the device based on the user utterance while the one or more other devices do not respond to the user utterance. 
However, Meyers teaches analyze a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network (see Meyers, [0020-23] Two devices 102 may be located near enough to each other so that both of the devices 102 may detect an utterance of the user 104.  Sound corresponding to a spoken user request 106 is received by each of the devices 102. The device 102 may detect the wakeword and interpret subsequent user speech as being directed to the device 102. The speech interface device 102 may begin providing an audio signal to a speech recognition system for detecting and responding to subsequent user utterances), and in response to the style of the user utterance being different than a pre-stored average utterance style of the user and the device being located a farther distance away from the user than one or more other devices able to respond to the user utterance, select the device as an individual target device and control an operation of the device based on the user utterance while the one or more other devices do not respond to the user utterance (see Meyers, [0066] The action 212 may comprise comparing attributes indicated by the metadata 110 for each of the audio signals 108. The device whose audio signal 108 has the strongest set of attributes is selected as the winner of the arbitration. See Meyers, [0068] If the second device 102(b) wins the arbitration, an action 218 is performed of processing and responding to the second audio signal 108(b), including producing an appropriate response by the second device 102(b) to the user command represented by the second audio signal 108(b); the second device may be located farther distance away from the user).
SaganeGowda and Meyers are both considered to be analogous to the claimed invention because both relate to activating speech enabled devices. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the device selection teachings of Meyers to selects one of the devices that is to respond to the user command (see Meyers, [0016]).
Regarding claim 17, is directed to a device claim for a response of a device corresponding to the method claim presented in claim 12 and is rejected under the same grounds stated above regarding claim 12.
Regarding claim 18, is directed to a device claim for a response of a device corresponding to the method claim presented in claim 13 and is rejected under the same grounds stated above regarding claim 13.
Regarding claim 19, is directed to a device claim for a response of a device corresponding to the method claim presented in claim 14 and is rejected under the same grounds stated above regarding claim 14.
Claims 2, 11 and 16 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Meyers et. al. (U.S. Patent Application Publication 2017/0083285) further in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 2, SaganeGowda in view of Meyers teach the method of claim 1, but fail to teach wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted at least one speech feature with a speech feature of the pre-stored average utterance style for the user. 
However, Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting, through a weight calculation neural network, at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation ( see Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers);  and comparing the extracted at least one speech feature with a speech feature of the pre-stored average utterance style for the user ( see Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Meyers and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Meyers on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Regarding claim 11, SaganeGowda in view of Meyers teach the method of claim 10, but fail to teach wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted speech feature with a speech feature of the . 
 However, Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation( see Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparing the extracted speech feature with a speech feature of the (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda, Meyers and Shriberg are considered to be analogous to the claimed invention because they relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Meyers on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Regarding claim 16, is directed to a device claim for a response of a device corresponding to the method claim presented in claim 11 and is rejected under the same grounds stated above regarding claim 11.
Claims 3, 4, and 6 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view Meyers et. al. (U.S. Patent Application Publication 2017/0083285) further in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Marxer, Barker, J., Alghamdi, N., & Maddock, S. (2018) “The impact of the Lombard effect on audio and visual speech recognition systems.” Speech Communication, 100, 58–68.
Regarding claim 3, SaganeGowda, in view of Meyers and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises determining whether the utterance time of the vowel section in the user utterance is greater than or equal to a preset time. 
However, Marxer teaches wherein the extracting the at least one user speech feature comprises determining whether the utterance time of the vowel section in the user utterance is greater than or equal to a preset time (Marxer pg. 59 col 2 lines 17-20 teaches “In the temporal domain the main effect is an increase in vowel duration leading to an overall reduction in speech rate. This effect has been observed to have a linguistic dependency: the vowel lengthening is greater in content words than in function words”, teaches the Lombard effects on vowel duration).
SaganeGowda, Meyers, Shriberg and Marxer are all considered to be analogous to the claimed invention because they relate to automatic speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Meyers and Shriberg on extracting at least one user speech feature and then using the vowel lengthening teachings of Marxer to examine potential for the Lombard effect to improve speech recognition performance (see Marxer, pg.59 col 1, lines 34-37).
Regarding claim 4, SaganeGowda in view of Meyers and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises: deriving a base frequency from a speech signal of the user utterance; and extracting a rising section from the derived base frequency. 
However, Marxer teaches wherein the extracting the at least one user speech feature comprises: deriving a base frequency from a speech signal of the user utterance; and extracting a rising section from the derived base frequency(Marxer pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Regarding claim 6, SaganeGowda in view of Meyers and Shriberg teach the method of claim 2, however fails to teach wherein the extracting the at least one user speech feature comprises: deriving a spectrum from the speech signal of the user utterance; and extracting a slope reduction section from the derived spectrum. 
However, Marxer teaches wherein the extracting the at least one user speech feature comprises: deriving a spectrum from the speech signal of the user utterance; and extracting a slope reduction section from the derived spectrum ( Marxer, pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Meyers et. al. (U.S. Patent Application Publication 2017/0083285) further in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Kanno, Sukeyasu, and Testuo Funada. “Lombard Speech Recognition Based on Voiced Sound Detection and Application to the Fabric Inspection System in Factories.” Systems and computers in Japan 34.7 (2003): 10–23.
Regarding claim 5, SaganeGowda in view of Meyers and Shriberg teach the method of claim 2, however fails to teach wherein the comparing of the speech feature comprises determining whether a harmonic structure in a speech signal of the user utterance has increased relative to a harmonic structure in a speech signal of the pre-stored average utterance style. 
However Kanno teaches wherein the comparing of the speech feature comprises determining whether a harmonic structure in a speech signal of the user utterance has increased relative to a harmonic structure in a speech signal of the pre-stored average utterance style(Kanno, pg. 12, col. lines 6-15 teaches “The pitch-type low-band LPC analysis method is a narrow-band LPC analysis method which focuses on the low-band where the effects of noise are minimal compared to the high-band so as to be able to efficiently extract noise-contaminated voiced sound in a noisy factory. In this method, analysis is performed with the spectrum peaks for the pitch frequency and harmonics resulting from the glottal source oscillation taken to represent one all-pole model, then voiced sound is detected from the degree of the conformity”).
SaganeGowda, Meyers, Shriberg and Kanno are all considered to be analogous to the claimed invention because they relate to speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda, Meyers and Shriberg on extracting at least one user speech feature and then using the Lombard speech recognition based on voiced sound detection using pitch frequency and harmonics analysis teachings of Kanno to examine potential for the Lombard effect to improve speech recognition performance (see Kanno, pg. 11).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hassani (US Patent Application Publication 2017/0125038) discusses generating and associating Lombard effect speech database on neutral speech for speech recognition training (see Hassani, Fig. 5, Fig. 6).
Lee ( US Patent 9,779,734) discusses a speech recognition method that improves recognition accuracy of the recognized commands to control target devices regardless of distances between target devices and a speaker (see Lee, col 5, lines 60-col 6, lines 5).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656