DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 26 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 26 recites the limitation “the one or more cues” in line 3. There is insufficient antecedent basis for the limitation in the claim. It is unclear to what one or more cues the limitation is referring since one or more cues has not been previously established and thus unclear on what selecting one or more speech modification parameters is based on. Examiner presumes applicant intended the limitation “the one or more cues” to be ““one or more cues””, and is interpreted as such. Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1 – 7, 9, 11, 12, 15 – 18, 20 - 25 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Ocampo (U.S. Patent Application Publication 2018/0122361).

Regarding Claim 1, Ocampo discloses:
An appliance (para 0047-48: user device of fig. 5) comprising a microphone transducer (para 0047, 0050: microphones 506), a processor, and a memory storing instructions that, when executed by the processor, cause the appliance to (para 0047-49, 0090-94: processor 510/520, and instructions stored in memory executed by processor to configure/control all functions of the user device):
receive an audio signal at the microphone transducer (para 0050, 0059, 0064-65: microphones 506 receives an audio signal);

classify a speech mode based on the utterance (para 0064-65: classifiers determine voice features from the received audio signal having the utterance from which attributes characterizing the voice are determined to provide a classification such as the user is whispering, shouting, excited, etc. [speech mode]);
determine one or more cues, wherein each cue corresponds to a condition of an environment of the appliance (para 0069-70, 0058-59, 0075: multiple environment features [cues] are determined pertaining to the environment around the user device, such as sounds, motion, lighting, distance from user device, each environment feature corresponding to a condition of the environment of the user device, such as being within a crowd, restaurant, automobile, quiet space, proximity, and/or motion);
determine a speech output mode based on the classification and the one or more cues (para 0064-65, 0073-77: parameters for outputting a synthesized speech response are selected and set to match a type of response [speech output mode], such as whispering, shouting, excited, etc., based on the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch); and
output synthesized speech according to the determined speech output mode (para 0073-78: a synthesized speech response audio signal is generated by 520 using the determined parameters settings and is outputted).

Claim 2, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein instructions that cause the appliance to classify a speech mode comprise instructions that cause the appliance to classify the utterance according to at least one of [examiner notes the following limitations are claimed in the alternative]: a pitch (para 0064-65: classification is according to pitch of the voice), number of formants, spectral tilt, speech rate, direction of arrival and energy content (para 0064-65: classification is according to amplitude of the voice).

Regarding Claim 3, in addition to the elements stated above regarding claim 2, Ocampo further discloses:
wherein the instructions cause the appliance to classify the speech mode of the utterance as [examiner notes the following limitations are claimed in the alternative] a whisper mode, a normal mode, or a Lombard effect mode according to one or more characteristics of the utterance (para 0064-65: the classification is determined from determined features of the user’s voice and can be classified as whispering, happy [normal]).

Regarding Claim 4, in addition to the elements stated above regarding claim 3, Ocampo further discloses:
wherein the one or more characteristics of the utterance comprise at least one of [examiner notes the following limitations are claimed in the alternative]: a pitch, an energy content, a number of formant, a spectral tilt, a speech rate, or a combination 

Regarding Claim 5, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
further comprising instructions that, when executed by the processor, cause the appliance to select a playback volume according to [examiner notes the following limitations related to selection of playback volume are claimed in the alternative] the classified speech mode, the one or more cues, or a combination thereof, and to output the synthesized speech at the selected playback volume (para 0064-65, 0073-77, 0035, 0037-38: the determined parameters selected and set for outputting the synthesized speech response - which is according to the determined attributes characterizing the voice to provide the classification such as the user is whispering, shouting, excited, etc. [classified speech mode], and the environment features [cues] - determines the output volume/amplitude at which the synthesized speech response is output).

Regarding Claim 6, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein the instructions to select a speech output mode further comprise instructions to select one or more speech synthesis parameters (para 0073-74: parameters for outputting a synthesized speech response are selected and set to match the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch), the memory further 

Regarding Claim 7, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein the instructions to select a speech output mode further comprise instructions to select a speech synthesis model from a plurality of speech synthesis models (para 0077: the speech synthesizer can use [select] any suitable audio synthesizer technique [model], such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis [from a plurality of speech synthesis models], to generate the synthesized speech response for outputting a synthesized speech response using the selected and set to match determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch, to generate by 520 a synthesized speech response audio signal), the memory further comprising instructions that, when executed by the processor, cause the appliance to generate synthesized speech 

Regarding Claim 9, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein selected speech output mode corresponds to the speech mode of the utterance (para 0073-75: parameters for outputting a synthesized speech response are selected and set to match [corresponds to] the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch, for example, the synthesized speech response being output as a whispering response corresponds to the user uttered command in a whispering tone).

Regarding Claim 11, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein the instructions to determine one or more cues comprise instructions to determine one or more acoustic cues from the audio signal, the one or more acoustic cues comprising at least one of [examiner notes the following limitations are claimed in 

Regarding Claim 12, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein the instructions to determine one or more cues comprise instructions to determine one or more non-acoustic cues (para 0051-53, 0058: non-acoustic cues are determined) comprising at least one of [examiner notes the following limitations are claimed in the alternative]: a time of day, a location type (para 0053, 0058: a moving location is determined by sensing movement), an appliance mode (para 0053: determining the application to respond to a command is configured for TTS (which is text-to-speech - see Background) output), a user profile (para 0058: the user is in close proximity to the user device is determined by sensors), a location layout, an acoustic profile of a location, or a combination thereof.

Regarding Claim 15, Ocampo discloses:
An audio appliance (para 0047-48: user device of fig. 5), comprising:

a speech classifier configured to detect an utterance in the sound and to classify a speech mode based on the utterance (para 0050-51, 0059, 0064-65: voice classifier 510/516 processes the audio signal from a person uttering a command and from which voice features are determined, thus an utterance is detected; the classifiers determine voice features from the received audio signal having the utterance from which attributes characterizing the voice are determined to provide a classification such as the user is whispering, shouting, excited, etc. [speech mode]);
a decision component configured to determine one or more cues, each cue corresponding to an observed condition of an environment of the appliance, and to select a speech output mode from a plurality of speech output modes based on the one or more cues (para 0069-70, 0053, 0058-59, 0064-65, 0073-77: In conjunction components 510/514/518/516, 504, 506, 520/526/528 utilized to determine multiple environment features [cues] pertaining to the environment around the user device, such as sounds, motion, lighting, distance from user device, each environment feature corresponding to a condition of the environment of the user device, such as being within a crowd, restaurant, automobile, quiet space, proximity, and/or motion; parameters for outputting a synthesized speech response are selected and set to match a type of response [speech output mode], such as whispering, shouting, excited, etc., by selecting an audio output template corresponding to the type of response, the parameters for the selected audio output template are based on the determined 
an output component configured to output synthesized speech according to the speech output mode (para 0073-78: utilizing 520/526 a synthesized speech response audio signal is generated and is outputted by using the determined parameters settings of the selected audio output template which corresponds to the type of response).

Regarding Claim 16, in addition to the elements stated above regarding claim 15, Ocampo further discloses:
further comprising: a speech synthesizer (fig. 5, para 0047, 0077: speech synthesizer 520) comprising a speech synthesis model (para 0073-74, 0076-77: the speech synthesizer uses a synthesizer technique [model] such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis, to generate the synthesized speech response), and configured to receive text and to generate synthesized speech from the text with the speech synthesis model according to the speech output mode (Abstract, para 0002-3, 0019, 0047, 0049-53, 0056, 0073-78: the user device of fig. 5 performs a TTS function in which a TTS response is generated from retrieved data in response to the audio signal command of the user’s voice, the retrieved data for the response being received text is implicitly taught since the user device utilizes TTS, which is text-to-speech which uses text as input to generate the speech response output utilizing 520/526 using the synthesizer technique a synthesized speech response audio signal is generated and is outputted by 

Claim 17 is rejected under the same grounds stated above for Claims 15 and 16.

Regarding Claim 18, in addition to the elements stated above regarding claim 16, Ocampo further discloses:
wherein the decision component is configured to select a speech synthesis model from a plurality of speech synthesis models corresponding to the speech output mode, and wherein the speech synthesizer is configured to generate the synthesized speech according to the selected speech synthesis model (para 0077: the speech synthesizer can use [select] any suitable audio synthesizer technique [model], such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis [from a plurality of speech synthesis models], to generate the synthesized speech response for outputting the synthesized speech response using the selected and set determined parameters to match the corresponding type of response [corresponding to speech output mode] based on parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch).

Regarding Claim 20, in addition to the elements stated above regarding claim 15, Ocampo further discloses:


Regarding Claim 21, Ocampo discloses:
A method for improving intelligibility of synthesized speech (para 0003-4, 0028: TTS operation executed on a user device to output TTS to a user by considering features of the user inputted voice and environmental attributes of the user device environment) comprising:
detecting an utterance in an audio signal (para 0050-51, 0059, 0064-65: the received audio signal is from a person uttering a command and from which voice features are determined, thus an utterance is detected);
classifying a speech mode of the utterance (para 0064-65: classifiers determine voice features from the received audio signal having the utterance from which attributes characterizing the voice are determined to provide a classification such as the user is whispering, shouting, excited, etc. [speech mode]);
identifying conditions associated with a listening environment (para 0028, 0069-70, 0058-59, 0075: multiple environment features are determined pertaining to the 
selecting a speech output mode from a plurality of speech output modes according to the classified speech mode and the identified conditions associated with the listening environment (para 0069-70, 0053, 0058-59, 0064-65, 0073-77: parameters for outputting a synthesized speech response are selected and set to match a type of response [speech output mode], such as whispering, shouting, excited, etc., by selecting an audio output template corresponding to the type of response, the parameters for the selected audio output template are based on the determined parameters of the user attributes/voice features and environment attributes [according to the classified speech mode and the identified conditions], which include amplitude, frequency, tone, pitch, the audio output template selected from a plurality of audio output templates); and
outputting synthesized speech according to the speech output mode (para 0073-78: utilizing 520/526 a synthesized speech response audio signal is generated and is outputted by using the determined parameters settings of the selected audio output template which corresponds to the type of response).

Claim 22 is rejected under the same grounds stated above for Claim 5.

Claim 23 is rejected under the same grounds stated above for Claim 20.

Claim 24 is rejected under the same grounds stated above for Claim 6.

Claim 25 is rejected under the same grounds stated above for Claim 7.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 8, 10, 13, 14, 19, 26 rejected under 35 U.S.C. 103 as being unpatentable over Ocampo in view of Raitio et al. (U.S. Patent Application Publication 2017/0358301) hereinafter Raitio.

Regarding Claim 8, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
wherein the instructions to select a speech output mode further comprise instructions to select one or more speech modification parameters based on the classified speech mode, and the one or more cues (para 0073-74: parameters used for causing a generated audio signal to make it sound like [modification], for example, a whispered speech response output, are selected and set to match the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch).
Ocampo does not explicitly teach the user device to modify synthesized speech according to the one or more speech modification parameters and to output the modified synthesized speech.
However, in a related field of endeavor (i.e. modify synthesized speech) Raitio teaches (para 0250-253, 0280-285) the speech mode being a whispered input determined from spectral characteristics of the input speech and from cues corresponding to the environment such as location and time, indicating a whispered speech response is to be output, and further teaches (para 0256-274) based on a whispered speech response is to be output, selecting filtering [modification] parameters that are applied to the intermediate speech signal output response [synthesized speech] from a speech synthesis module which modifies the intermediate speech signal output response to become the whispered speech response which is then outputted to the user (fig. 8A, 8D). It would have been obvious to one of ordinary skill in the art before the 

Regarding Claim 10, in addition to the elements stated above regarding claim 1, Ocampo does not explicitly disclose:
wherein selected speech output mode corresponds to a different speech mode than the speech mode of the utterance.
However, in a related field of endeavor (i.e. speech output mode corresponds to a different speech mode) Raitio teaches (para 0253) the whispered speech output response [selected speech output mode] corresponds to a different speech mode input of a non-whispered speech regular loud voice. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Raitio to Ocampo to allow the selected speech output mode to correspond to a different speech mode than the speech mode of the utterance, thus providing an enhanced user interface and user convenience by allowing the user to 

Regarding Claim 13, in addition to the elements stated above regarding claim 1:
wherein the instructions that cause the appliance to output the synthesized speech comprise instructions to process synthesized speech for output by a loudspeaker.
Ocampo teaches (para 0077-78) the synthesized speech response audio signal is generated by 520 using the determined parameters for output by speaker 530.
Ocampo does not explicitly disclose to process the synthesized speech output.
However, in a related field of endeavor (i.e. modify synthesized speech) Raitio teaches (para 0250-253, 0280-285) indicating a whispered speech response is to be output, and further teaches (para 0256-274) based on a whispered speech response is to be output, selecting filtering parameters that are applied to the intermediate speech signal output response from a speech synthesis module which processes the intermediate speech signal output response to become the whispered speech response which is then outputted to the user (fig. 8A, 8D). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Raitio to Ocampo to allow the synthesized speech output to be processed after generating synthesized speech, thus providing an enhanced listening experience, user convenience by allowing a wider variety of synthesized speech output types to be generated to accommodate user preferences while using the same speech 

Regarding Claim 14, in addition to the elements stated above regarding claim 1, Ocampo further discloses:
further comprising instructions that, when executed by the processor, cause the appliance to receive the audio signal from the microphone transducer (para 0050, 0059: the user device receives the audio signal from microphones 506 as a user command), process the audio signal (para 0051: the audio signal command is classified [processed] as a knowledge query requiring a response) and receive text, wherein the received text corresponds to a response (Abstract, para 0002-3, 0019, 0047, 0049-53, 0056: the user device of fig. 5 performs a TTS function in which a TTS response is generated from retrieved data in response to the audio signal command of the user’s voice, the retrieved data for the response being received text is implicitly taught since the user device utilizes TTS, which is text-to-speech which uses text as input to generate the speech response output).
Ocampo does not explicitly disclose to request speech recognition on the audio signal command classified as a knowledge query as the basis to generate the text response.
However, in a related field of endeavor (i.e. using speech recognition) Raitio teaches (para 0209-210, 0235-236) using speech recognition on speech input of the user request in order to generate a text response based on the recognized speech. In addition, Ocampo at least suggests (figs. 2A, 2B) using speech recognition in order to 

Regarding Claim 19, in addition to the elements stated above regarding claim 15, Ocampo further discloses:
wherein the decision component is configured to select [examiner notes the following limitations are claimed in the alternative] a playback volume, one or more speech modification parameters, or both, corresponding to the speech output mode (para 0073-74: parameters used for causing a generated audio signal to make it sound like [modification], for example, a whispered speech response output, are selected and set to match the corresponding type of response [corresponding to speech output mode] based on parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch). 
Ocampo does not explicitly teach the user device to modify synthesized speech according to the one or more speech modification parameters prior to outputting the modified synthesized speech.
However, in a related field of endeavor (i.e. modify synthesized speech prior to outputting) Raitio teaches (para 0250-253, 0280-285) the speech output mode being a whispered speech response determined from spectral characteristics of the input 

Claim 26 is rejected under the same grounds stated above for Claim 8.

Conclusion


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on 571-272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID SIEGEL/Examiner, Art Unit 2653                                                                                                                                                                                                        
/FAN S TSANG/Supervisory Patent Examiner, Art Unit 2653