DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Regarding the request for interview (see page 9 of applicant's remarks filed Apr 13, 2022), an interview was held May 4, 2022.
Regarding the rejection of claim 26 under 35 U.S.C. 112(b), applicant argues (see page 9 of applicant's remarks filed Apr 13, 2022) the claim has been amended to address the issues. Applicant’s remarks have been fully considered. Accordingly, the rejection is withdrawn.
Regarding claims 1 and 15 rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Ocampo (U.S. Patent Application Publication 2018/0122361), applicant argues (see pages 9-10 of applicant's remarks) Ocampo does not disclose the additional features of amended claims 1 and 15 of weighting of environmental information and classification of speech mode, the relative weights, and determining a speech output mode for the synthesized speech based on the claimed weights and weighting the weight associated with the cues of the environment more heavily than the weight associated with the classification of speech mode of the utterance when determining the speech output mode for the synthesized speech. Applicant’s remarks have been fully considered and are persuasive, accordingly the prior art rejections of amended claims 1 and 15 are withdrawn.
Regarding claim 21 rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Ocampo (U.S. Patent Application Publication 2018/0122361), applicant argues (see pages 10-11 of applicant's remarks) Ocampo does not disclose or suggest making output speech intelligible, let alone that making the output speech intelligible in the listening environment can be of higher priority than matching the speech mode of the user’s utterance, and that Ocampo does not disclose the additional features of amended claim 21 of “wherein the selected speech output mode (for outputting synthesized speech) includes at least one of a pitch, a rate, an energy, or a spectral tilt selected for intelligibility in the identified conditions associated with the listening environment”. Applicant’s remarks have been fully considered and are not persuasive. Examiner points out the distinction presented in applicant’s remarks, of making the output speech intelligible in the listening environment can be of higher priority than matching the speech mode of the user’s utterance, is not required by the claim. In addition, Ocampo (para 0025-28, Fig. 1B) at least suggests selecting a speech output mode for synthesized speech for intelligibility by selecting a speech output mode that increases the loudness [energy] of the audio output [synthesized speech] based on identifying that the user is farther away from the audio device [identified conditions of the listening environment] in order to ensure the user is able to hear [intelligibility] the audio output when being farther away from the audio device. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Ocampo to select an audio output template [speech output mode], to match a type of response based on the determined parameters of the user attributes/voice features (Ocampo para 0073) [according to the classified speech mode], such as pitch, amplitude [energy] (Ocampo para 0073), and increasing the loudness [energy] of the audio output based on identifying that the user is farther away from the audio device [intelligibility in the identified conditions of the listening environment], thus providing an enhanced listening experience by ensuring the user is able to hear [intelligibility] the audio output when being farther away from the audio device. In addition, Ocampo (para 0076) teaches the audio output template selected for outputting synthesized speech may be selected based on any combination of the user attributes and environment attributes and thus is enabled to select the template for outputting synthesized speech that considers both a speech mode of the user’s utterance and intelligibility. See below rejection.
Regarding the dependent claims (22 – 26), applicant presents similar arguments, see page 11 of applicant's remarks, to those of claim 21. Examiner directs applicant to the above response for claim 21. See below rejections.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 21 - 25 rejected under 35 U.S.C. 103 as being unpatentable over Ocampo (U.S. Patent Application Publication 2018/0122361).

Regarding Claim 21, Ocampo discloses:
A method for improving intelligibility of synthesized speech (para 0003-4, 0028: TTS operation executed on a user device to output TTS to a user by considering features of the user inputted voice and environmental attributes of the user device environment) comprising:
detecting an utterance in an audio signal (para 0050-51, 0059, 0064-65: the received audio signal is from a person uttering a command and from which voice features are determined, thus an utterance is detected);
classifying a speech mode of the utterance (para 0064-65: classifiers determine voice features from the received audio signal having the utterance from which attributes characterizing the voice, such as pitch and amplitude, are determined to provide a classification [speech mode] such as the user is whispering, shouting, excited, etc. and user mood);
identifying conditions associated with a listening environment (para 0028, 0062, 0069-70, 0058-59, 0075: multiple environment features are determined pertaining to the listening environment around the user device, such as sounds, motion, lighting, distance from user device, each environment feature corresponding to a condition of the listening environment of the user device, such as distance from user device, being within a crowd, restaurant, automobile, quiet space, proximity, and/or motion);
selecting a speech output mode from a plurality of speech output modes according to the classified speech mode and the identified conditions associated with the listening environment, wherein the selected speech output mode includes at least one of a pitch, a rate, an energy, or a spectral tilt (para 0069-70, 0053, 0058-59, 0064-65, 0073-77: an audio output template [speech output mode] is selected to match a type of response, such as, shouting [an energy], excited, etc., based on the determined parameters of the user attributes/voice features [according to the classified speech mode] such as pitch, amplitude [energy] by selecting an audio output template corresponding to the type of response, and also based on the environment attributes in which the user device is being used to sense the voice of the user and in response to output audio [according to the classified speech mode and the identified conditions], which include amplitude [energy], frequency, tone, pitch; the audio output template selected from a plurality of audio output templates);
outputting synthesized speech according to the speech output mode (para 0073-78: utilizing 520/526 a synthesized speech response audio signal is generated and is outputted by using the determined parameters settings of the selected audio output template which corresponds to the type of response).
Ocampo does not explicitly disclose selected for intelligibility in the identified conditions associated with the listening environment.
However, Ocampo (para 0025-28, Fig. 1B) suggests selecting a speech output mode for synthesized speech for intelligibility in the identified conditions by selecting a speech output mode that can increase the loudness [energy] of the audio output [synthesized speech] based on identifying that the user is farther away from the user audio device [identified conditions of the listening environment] in order to ensure the user is able to hear [intelligibility] the audio output when being farther away from the audio device. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Ocampo to select an audio output template [speech output mode], to match a type of response based on the determined parameters of the user attributes/voice features (Ocampo para 0073) [according to the classified speech mode], such as pitch, amplitude [energy] (Ocampo para 0073), and to increase the loudness [energy] of the audio output based on identifying that the user is farther away from the user audio device [for intelligibility in the identified conditions of the listening environment], thus providing an enhanced listening experience by ensuring the user is able to hear [intelligibility] the audio output when being farther away from the user audio device. In addition, Ocampo (para 0076) teaches the audio output template selected for outputting synthesized speech may be selected based on any combination of the user attributes and environment attributes and thus is enabled to select the template for outputting synthesized speech that considers both a speech mode of the user’s utterance and intelligibility in the identified conditions associated with the listening environment.

Regarding Claim 22, in addition to the elements stated above regarding claim 21, the combination further discloses:
wherein the act of selecting a speech output mode further comprises selecting a playback volume according to [examiner notes the following limitations related to selection of playback volume are claimed in the alternative] the classified speech mode, the conditions associated with the listening environment, or a combination thereof, and  outputting the synthesized speech at the selected playback volume (para 0064-65, 0073-77, 0035, 0037-38: the determined parameters selected and set for outputting the synthesized speech response - which is according to the determined attributes characterizing the voice to provide the classification such as the user is shouting, excited, etc. [classified speech mode], and the environment features such as the distance the user is from the user device [conditions associated with the listening environment] - determines the output volume/amplitude at which the synthesized speech response is output).

Regarding Claim 23, in addition to the elements stated above regarding claim 21, the combination further discloses:
wherein the act of classifying a speech mode of the utterance further comprises classifying the utterance as [examiner notes the following limitations are claimed in the alternative] a whisper mode, a normal mode, or a Lombard-effect mode (para 0064-65: the classification is determined from determined features of the user’s voice and can be classified as whispering, happy [normal]) according to at least one of [examiner notes the following limitations are claimed in the alternative] a pitch, a number of formants, spectral tilt, speech rate, energy content (para 0064-65: determined features of the user’s voice include pitch of the voice, amplitude [energy content] of the voice).

Regarding Claim 24, in addition to the elements stated above regarding claim 21, the combination further discloses:
wherein the act of selecting a speech output mode comprises selecting one or more speech synthesis parameters (para 0073-74: parameters for outputting a synthesized speech response are selected and set to match the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch), the method further comprising generating synthesized speech according to a speech synthesis model and the selected one or more speech synthesis parameters (para 0073-74, 0076-77: the speech synthesizer uses a synthesizer technique [model] such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis, to generate the synthesized speech response for outputting a synthesized speech response using the selected and set to match determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch, to generate by 520 a synthesized speech response audio signal).

Regarding Claim 25, in addition to the elements stated above regarding claim 21, the combination further discloses:
wherein the act of selecting a speech output mode comprises selecting a speech synthesis model from a plurality of speech synthesis models (para 0077: the speech synthesizer can use [select] any suitable audio synthesizer technique [model], such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis [from a plurality of speech synthesis models], to generate the synthesized speech response for outputting a synthesized speech response using the selected and set to match determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch, to generate by 520 a synthesized speech response audio signal), the method further comprising generating synthesized speech according to the selected speech synthesis model (para 0073-74, 0076-77: the speech synthesizer can use [select] any suitable audio synthesizer technique [model], such as concatenation synthesis, formant synthesis, articulatory synthesis, and hidden Markov model (HMM)-based synthesis [from a plurality of speech synthesis models], to generate the synthesized speech response for outputting a synthesized speech response using the selected and set to match determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch, to generate by 520 a synthesized speech response audio signal).

Claim 26 rejected under 35 U.S.C. 103 as being unpatentable over Ocampo in view of Raitio et al. (U.S. Patent Application Publication 2017/0358301) hereinafter Raitio.

Regarding Claim 26, in addition to the elements stated above regarding claim 21, the combination further discloses:
wherein the act of selecting a speech output mode comprises selecting one or more speech modification parameters based on the classified speech mode, and one or more cues (para 0073-74: parameters used for causing a generated audio signal to make it sound like [modification], for example, a speech response output matching the determined user’s voice attributes, are selected and set to match the determined parameters of the user attributes/voice features and environment attributes, which include amplitude, frequency, tone, pitch).
The combination does not explicitly teach the user device modifying synthesized speech according to the one or more speech modification parameters and to output the modified synthesized speech.
However, in a related field of endeavor (i.e. modify synthesized speech) Raitio teaches (para 0250-253, 0280-285) the speech mode being an input of regular speech determined from spectral characteristics of the input speech and from cues corresponding to the environment such as location and time, indicating a whispered speech response is to be output, and further teaches (para 0256-274) based on a whispered speech response is to be output, selecting filtering [modification] parameters that are applied to the intermediate speech signal output response [synthesized speech] from a speech synthesis module which modifies the intermediate speech signal output response to become the whispered speech response which is then outputted to the user (fig. 8A, 8D). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teaching of Raitio to the combination to allow the synthesized speech output to be modified after generating synthesized speech, thus providing an enhanced listening experience, user convenience by allowing a wider variety of synthesized speech output types to be generated while using the same speech content (Raitio para 0257) further simplifying the implementation for allowing for both the whispered response and non-whispered response to accommodate user preferences, such as allowing the user preference option of not to receive a whispered speech response if desired and rather receiving the same speech response only not whispered (Raitio para 0255), as well as to receive a whispered speech response.
Allowable Subject Matter
Claims 1 – 8, 10 – 20 and 27 have no prior art rejection, as the independent claims are not taught by or obvious over the prior art and are allowable. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID F SIEGEL whose telephone number is (571)272-5715. The examiner can normally be reached M-W 6:30am - 3pm, Th-F 7am-3:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached on 571-272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DAVID SIEGEL/Examiner, Art Unit 2653                                                                                                                                                                                                        
/FAN S TSANG/Supervisory Patent Examiner, Art Unit 2653