DETAILED ACTION
Introduction
This office action is in response to Applicant’s submission filed on 03/05/2020. Claims 1-19 are pending in the application and have been examined.
		
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/05/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

	
Specification



The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 5, 9, 14, 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term "increased" in claim 5 is a relative term which renders the claim indefinite.  The term "harmonic structure clarity" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  (See specifications [0173]).
The term "large" in claims 9, 14 and19 is a relative term which renders the claim indefinite.  The term "change of the extracted user speech feature" is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  (See specifications [0168]).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 10 and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite analyzing a user utterance to determine which device the user is controlling and nothing more. 
The limitation of analyzing a style of the user utterance based on a user speech feature extracted from the user utterance, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic 
Similarly, the limitation of, as drafted, of analyzing a user utterance to determine which device the user is controlling device, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the “speech recognition neural network” language, “analyzing the style of the user utterance” in the context of these claims encompasses the user thinking of which device to control and accordingly adjusting the speech style. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.
 	This judicial exception is not integrated into a practical application. In particular, the claims only recite one additional element – using a speech recognition neural network and a generic processor to perform the determination and controlling steps. The processor in the steps is recited at a high-level of generality (i.e., as a generic processor  performing a generic computer function of determining, analyzing, controlling  information based on a determined use) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. 







Claims 7, 12 and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information.
The limitation of estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “distance measurement sensor,” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “from a distance measurement sensor” language, “estimating” in the context of this claim encompasses the user manually estimating the proximity of each of the electronic devices. Similarly, the limitation of collecting distance information on a distance between the user and each of the plurality of electronic devices, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the “a distance measurement sensor” language, “estimating” in the context of this claim encompasses the user 
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a distance measurement sensor to perform both the collecting and estimating steps. The processor in both steps is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of estimating the distance between the user and the electronic devices) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a processor to perform both the collecting and estimating steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
Claims 8, 13 and 18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites extracting a user speech feature from the selected utterance data through a speech recognition neural network.

This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a speech recognition neural network to perform both the selecting and the extracting steps. The speech recognition neural network is recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of extracting a user speech feature from the selected utterance data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a speech recognition neural network to extracting a user speech feature from the selected utterance data amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.	
Claims 9, 14 and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim recites determining that a device located at a farthest distance from the user is being called in response to a change of the extracted user speech feature being large.
The limitation of determining that a device located at a farthest distance from the user is being called in response to a change of the extracted user speech feature being large, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “a device,” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “a device” language, “in response to a change of the extracted user speech feature being large” in the context of this claim encompasses a user altering once utterance style based on the distance of the device the user wishes to control. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within 
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a device to respond if it is located farthest from the user. The device is recited at a high-level of generality. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a device to perform both the response and determining steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible
	

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of pre-AIA  35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –











Claims 1, 7, 8, 9, 10, 12, 13, 14, 15, 17, 18 and 19 are rejected under 35 U.S.C. 102 (a)(2) as being anticipated by SaganeGowda, et. al. (US Patent Application Publication 2018/0268814).
Regarding claim 1 SaganeGowda teaches a method for controlling a device according to a user's calling, the method comprising: receiving a user utterance collected by a plurality of electronic devices (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the plurality of electronic devices (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information.); analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network(SaganeGowda [0072] illustrates operation 504 analyzing, by the voice activated device, the received data, and based on the analyzing, selecting a location with a highest probability among possible locations of the user. SaganeGowda [0043, 0044] teaches the proximity analysis components using neural networks); determining which device among the plurality of electronic devices the user utterance is calling based on the analyzed utterance style and the distance between the user and the plurality of electronic devices(SaganeGowda [0074] teaches operation 508 which determines whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences); and controlling an operation of the device that is a target of the user's calling, based on the SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).
Regarding claim 7, SaganeGowda teaches the method of claim 1, wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of the plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 8, SaganeGowda teaches the method of claim 1, wherein the analyzing of the style of the user utterance comprises: selecting utterance data of the user received from an electronic device located at a closest distance from the user, among utterance data of the user collected by the plurality of electronic devices(SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through a speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 9, SaganeGowda teaches the method of claim 8, wherein the determining of which device among the plurality of electronic devices the user is calling comprises determining that a device located at a farthest distance from the user is being called in response to a change of the extracted user speech feature being large (SaganeGowda [0014] teaches the user's presence may be inferred based on secondary or non-direct information. For example, the presence of a user device such as a key fob or a smartphone can be used to infer that the user is within an estimated proximity to the voice activated or voice command device. The presence of the user may be inferred using information such as the Bluetooth connection for the user becoming disconnected, suggesting that the user has left the room and is of a sufficient distance so as to disconnect from the Bluetooth transceiver of the voice activated device. Other secondary information may include, for example, sensors that detect door opening and closing, heat sensors, and self-reported location information from a user's mobile device).
Regarding claim 10, SaganeGowda teaches the method for determining a response of a first device to a user's calling, the method comprising: receiving a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources); measuring a distance between the user and the first device (SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information); analyzing a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network SaganeGowda [0072] illustrates operation 504 analyzing, by the voice activated device, the received data, and based on the analyzing, selecting a location with a highest probability among possible locations of the user. SaganeGowda [0043, 0044] teaches the proximity analysis components uses neural networks); determining whether the user utterance is calling the first device based on the analyzed user utterance style and the measured distance(SaganeGowda [0074] teaches operation 508 which determines whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences); and determining whether to respond to the user's calling based on the determination (SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).
Regarding claim 12, SaganeGowda teaches the method of claim 10, wherein the measuring of the distance comprises: collecting distance information on a distance between the user and each of a plurality of electronic devices, which is obtained from a distance measurement sensor of each of the plurality of electronic devices ( SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like); and estimating the distance between the user and each of the plurality of electronic devices based on the collected distance information(SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 13, SaganeGowda teaches the method of claim 10, wherein the analyzing of the style of the user utterance comprises: selecting utterance data of the user collected by the first device (SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information); and extracting a user speech feature from the selected utterance data through a speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
Regarding claim 14, SaganeGowda teaches the method of claim 13, wherein the determining of whether to respond to the user's calling comprises determining that the first device is a device located at a farthest distance from the user in response to a change of the extracted user speech feature being large (SaganeGowda [0014] teaches the user's presence may be inferred based on secondary or non-direct information. For example, the presence of a user device such as a key fob or a smartphone can be used to infer that the user is within an estimated proximity to the voice activated or voice command device. The presence of the user may be inferred using information such as the Bluetooth connection for the user becoming disconnected, suggesting that the user has left the room and is of a sufficient distance so as to disconnect from the Bluetooth transceiver of the voice activated device. Other secondary information may include, for example, sensors that detect door opening and closing, heat sensors, and self-reported location information from a user's mobile device).
SaganeGowda teaches such a device as indicated in [0025] and Fig. 1 ), reception of a user utterance (SaganeGowda [0071] and FIG. 5 teaches an example operation for controlling a voice activated feature of a voice activated device, operation 502 illustrates receiving, by a voice activated device from one or more data sources), analysis of a style of the user utterance based on a user speech feature extracted from the user utterance through a speech recognition neural network(SaganeGowda [0072] illustrates operation 504 analyzing, by the voice activated device, the received data, and based on the analyzing, selecting a location with a highest probability among possible locations of the user. SaganeGowda [0043, 0044] teaches the proximity analysis components uses neural networks), measurement of a distance between the user and the first device(SaganeGowda [0073] teaches in operation 506 the likely proximity of the user relative to a location of the voice activated device, is determined based in part on availability of sensor and computing status information), determination of whether the user utterance is calling the first device based on the analyzed user utterance style and the measured distance(SaganeGowda [0074] teaches operation 508 which determines whether one or more voice activated features of the voice activated device should be enabled, based at least in part on the determined proximity, one or more rules, and one or more user preferences); and determining whether to respond to the user's calling based on the determination (SaganeGowda [0075]illustrates operation 510 which activates at least one of the voice activated features based on the determination in step 508).
SaganeGowda [0010] teaches an interconnected network may be coupled to one or more user computing devices as well as data sources such as sensors and computing devices located throughout a home or office environment. At least one of the devices is a voice activated device. The sensors can include luminosity sensors, passive infrared (IR) sensors, cameras with image recognition, depth sensors, and the like), and estimation of a distance between the user and each of the plurality of electronic devices based on the collected distance information, when the distance is measured (SaganeGowda [0010]teaches based on the data from the data sources, the voice activated device may make a determination as to a user's position and status relative to the voice activated device and the distance may correlate to an expected proximity of a user when the user is intending to activate the voice activated device).
Regarding claim 18, SaganeGowda teaches the device of claim 15, wherein the memory stores commands configured to cause selection of utterance data of the user collected by the first device(SaganeGowda [0011] teaches the voice activated device may only respond to voice commands when a user is inferred to be within a specified location based on available presence information), and extraction of a user speech feature from the selected utterance data through a speech recognition neural network (SaganeGowda [0012] teaches how by linking voice activated functions to the user's presence, a device such as a voice activated thermostat may be configured to operate based on high fidelity voice inputs that are more readily available based on close physical proximity).
SaganeGowda [0014] teaches the user's presence may be inferred based on secondary or non-direct information. For example, the presence of a user device such as a key fob or a smartphone can be used to infer that the user is within an estimated proximity to the voice activated or voice command device. The presence of the user may be inferred using information such as the Bluetooth connection for the user becoming disconnected, suggesting that the user has left the room and is of a sufficient distance so as to disconnect from the Bluetooth transceiver of the voice activated device. Other secondary information may include, for example, sensors that detect door opening and closing, heat sensors, and self-reported location information from a user's mobile device).

Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.











Claims 2, 11 and 16 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Shriberg et. al. (U.S. Patent 10,529,321).
Regarding claim 2, SaganeGowda teaches the method of claim 1, but fails to teach wherein the analyzing of the style of the user utterance comprises: extracting, through a weight Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers);  and comparing the extracted speech feature with a speech feature of a pre-stored average utterance style for the user (Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
SaganeGowda and Shriberg are both considered to be analogous to the claimed invention because both relate to interpret user intent, and engage in natural dialog to accomplish complex tasks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda on analyzing the user utterance to determine the control of the device with the extracting prosodic information from the utterance, classifying accordingly and then using the utterance based on the classification teachings of Shriberg to improve addressee detection is used in spoken dialog systems to detect whether or not user speech is directed toward the system ( see Shriberg, col1, lines 16-18).
Regarding claim 11, SaganeGowda teaches the method of claim 10, but fails to teach wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparing the extracted speech feature with a speech feature of a pre-stored average utterance style for the user.  However Shriberg teaches wherein the analyzing of the style of the user utterance comprises: extracting at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparing the extracted speech feature with a speech feature of a pre-stored average utterance style for the user(Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
Regarding claim 16, SaganeGowda teaches the device of claim 15, but fails to teach wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation; and comparison of the extracted speech feature with a speech feature of a pre-stored average utterance style for the user.  However Shriberg teaches wherein the memory stores commands configured to cause: extraction of at least one user speech feature among an utterance speed, a pronunciation stress, a pause section, a pitch, a base frequency, an utterance time of a vowel section, a signal to noise ratio (SNR), or an intonation(Shriberg col 5, lines 24-43 teaches  prosodic and acoustic features that capture a speaker's vocal effort may be used, because speakers tend to raise their vocal effort when speaking to a computer as opposed to a human. Vocal effort changes modify the absolute energy, the relative energy in different frequency regions, and relative energy magnitudes between voiceless and voiced speech segments. Other features that capture vocal effort do not require normalization. Such features include measures of spectral tilt and spectral slope, and delta log energy from unvoiced to voiced speech regions. A variety of machine learning approaches may be used to model the features described above, and to obtain classifiers for addressee detection. According to an embodiment, the classifiers output a real value that can serve either as a detection score, or as a new feature to be fed into second-level classifiers); and comparison of the extracted speech feature with a speech feature of a pre-stored average utterance style for the user(Shriberg col 11, lines 61-col 12 lines 4  teaches in operation 650, the speaking style is classified as human directed or computer directed (interpreted as comparing with pre stored utterance style), combining available sources of evidence (acoustic-prosodic and/or lexical), using linear logistic regression or some other combination scheme. A score may be calculated (as described above) that is used in determining whether the speech is computer directed or human directed).
Claims 3, 4, and 6 are rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Marxer, Barker, J., Alghamdi, N., & Maddock, S. (2018) “The impact of the Lombard effect on audio and visual speech recognition systems.” Speech Communication, 100, 58–68.
Regarding claim 3, SaganeGowda and Shriberg teach the method of claim 2, however fails to teach wherein the extracting of the user speech feature comprises determining whether the utterance time of the vowel section in the user utterance is greater than or equal to a preset Marxer pg. 59 col 2 lines 17-20 teaches “In the temporal domain the main effect is an increase in vowel duration leading to an overall reduction in speech rate. This effect has been observed to have a linguistic dependency: the vowel lengthening is greater in content words than in function words”, teaches the Lombard effects on vowel duration).
SaganeGowda, Shriberg and Marxer are all considered to be analogous to the claimed invention because they relate to automatic speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Shriberg on extracting at least one user speech feature and then using the vowel lengthening teachings of Marxer to examine potential for the Lombard effect to improve speech recognition performance (see Marxer, pg.59 col 1, lines 34-37).
Regarding claim 4, SaganeGowda and Shriberg teach the method of claim 2, however fails to teach wherein the extracting of the user speech feature comprises: deriving a base frequency from a speech signal of the user utterance, and extracting a rising section from the derived base frequency.  However Marxer teaches wherein the extracting of the user speech feature comprises: deriving a base frequency from a speech signal of the user utterance, and extracting a rising section from the derived base frequency(Marxer pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Marxer, pg. 59 col 2 lines 9-14 teaches “Although the findings of these studies have differed in detail, a consistent description of Lombard speech has emerged: Spectral effects include an increase in fundamental frequency, a tilting of the spectrum that emphasises higher frequencies and a shift in formant center frequencies (particularly an increase of F1)”).
Claim 5 is rejected under 35 U.S.C. §103 as being unpatentable over SaganeGowda et. al. (U.S. Patent Application Publication 2018/0268814) in view of Shriberg et. al. (U.S. Patent 10,529,321), further in view of Kanno, Sukeyasu, and Testuo Funada. “Lombard Speech Recognition Based on Voiced Sound Detection and Application to the Fabric Inspection System in Factories.” Systems and computers in Japan 34.7 (2003): 10–23.
Regarding claim 5, SaganeGowda and Shriberg teach the method of claim 2, however fails to teach wherein the comparing of the speech feature comprises determining whether a harmonic structure clarity in the speech signal of the user utterance is increased. However Kanno teaches wherein the comparing of the speech feature comprises determining whether a harmonic structure clarity in the speech signal of the user utterance is increased (Kanno, pg. 12, col. lines 6-15 teaches “The pitch-type low-band LPC analysis method is a narrow-band LPC analysis method which focuses on the low-band where the effects of noise are minimal compared to the high-band so as to be able to efficiently extract noise-contaminated voiced sound in a noisy factory. In this method, analysis is performed with the spectrum peaks for the pitch frequency and harmonics resulting from the glottal source oscillation taken to represent one all-pole model, then voiced sound is detected from the degree of the conformity”).
SaganeGowda, Shriberg and Kanno are all considered to be analogous to the claimed invention because they relate to speech recognition systems. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of SaganeGowda and Shriberg on extracting at least one user speech feature and then using the Lombard speech recognition based on voiced sound detection using pitch frequency and harmonics analysis teachings of Kanno to examine potential for the Lombard effect to improve speech recognition performance (see Kanno, pg. 11).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hassani (US Patent Application Publication 2017/0125038) discusses generating and associating Lombard effect speech database on neutral speech for speech recognition training (see Hassani, Fig. 5, Fig. 6).
Hansen, & Varadarajan, V. (2009) “Analysis and Compensation of Lombard Speech Across Noise Type and Levels With Application to In-Set/Out-of-Set Speaker Recognition.” IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 366–378 discusses the variations in speech due to the Lombard Effect is dependent on both the noise-type and noise-level (see Hansen pg. 366, col2 lines 30-34 & abstract).

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached on (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NANDINI SUBRAMANI/Examiner, Art Unit 2656                                                                                                                                                                                                        
/EDGAR X GUERRA-ERAZO/Primary Examiner, Art Unit 2656