DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 6/21/2021, the applicant has submitted an amendment, filed 10/21/2021, amending claims 1 and 11, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but are moot with respect to teachings of Ishii et al. (US 2003/0055653).
Response to Arguments
Following a broad overview of the latest amendments on page 5 first two ¶’s, the remainder of page 5 and all of pages 6 and parts of page 7 are devoted to why Ishii fails to teach the latest amendments.
Please visit the new office action for further details.
On page 6 the 4th ¶, as well as page 7 the 3rd ¶,  it is asserted that “dependent claims” “are patentable by virtue of their respective direct and ultimate dependencies from allowable independent claims 1 and 11”.
Since applicants have not argued the merits of these dependent claims, but assert patentability solely through their dependence on the allegedly patentable parent 
On page 7 the last ¶ last 2 lines it is “request[ed]” “the [ODP] rejections be reconsidered in light of the amendments”.
Unfortunately the said amendments did not alter the claims’ scope to overcome the ODP.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-7, 9, 11-17, 19 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ishii et al. (US 2003/0055653).
Regarding claim 1, Ishii et al. do teach a computer-implemented method for providing a synthesized speech response to a voice input (page 9, 2nd column, line 10 above the bottom: “A program for causing a computer to perform a robot control process for controlling a robot” (a computer implemented method for controlling a robot response to a voice input through the “MICROPHONE” “15”, for generating a synthesized speech by the robot using the “SPEECH SYNTHESIZER” “55” in Fig. 10), 

receiving the voice input (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (a voice input received, e.g. ¶ 0056 last line “user’s verbal contact, and other speeches”));
calculating at least one prosodic metric of the voice input (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of speech data” (voice input analyzed by a prosodic analyzer) “extracts” “pitch” (to calculate one prosodic metric)) ; 
determining a response to the voice input (¶ 0056 last 2 sentences: “synthesized sound is supplied to the loudspeaker” “Thus, the loudspeaker 18 outputs the robot’s voice” (a determined response) “various requests such as” “I’m hungry” (an example of the response) “to the user, responses such as” “what?” (another response) “in response to user’s verbal contact, and other speeches” (to the user voice input));
generating the synthesized speech response, wherein the synthesized speech response comprises prosodic characteristics based on the determined response to the voice input, and on the at least one prosodic metric (¶ 0075 last 2 sentences: “The text generator 31 extracts information” “for rule-based speech synthesis” (the synthesized speech response) “The information required for rule-based speech synthesis includes” “prosodic information” (comprises prosodic characteristics since “prosodic information” includes “pitch frequency and power” (¶ 0091 line 4)), e.g. according to ¶ 0091 sentence 1 “prosodic analyzer” “extracts” “pitch” (including the prosodic metric); furthermore 
 and
causing to be output the synthesized speech response (¶ 0082 lines 2+: “output controller” “supplies the synthesized sound from the speech synthesizer 55 to the loudspeaker” (outputting the synthesized speech response)).

Regarding claim 2, Ishii et al. do teach the method of claim 1, further comprising determining an emotion metric based on the voice input, wherein the prosodic characteristics are further based on the emotion metric (¶ 0129 lines 3-5: “prosody extracted” (e.g. the “prosodic information” (prosodic characteristics) of “pitch and 

Regarding claim 3, Ishii et al. do teach the method of claim 1, wherein:
the voice input comprises a plurality of words (¶ 0037 lines 4+: “The speech recognizer 50A reports the speech recognition result, which is a command, such as” “chase the ball” (a voice input comprising of plurality of words)), 
and each prosodic metric of the at least one prosodic metric corresponds to at least one word of the plurality of words ( ¶ 0092: “Specifically, the prosodic analyzer 42 assumes that one frame is a period longer than a normal human pitch period, such as 32 ms, and obtains a pitch frequency” (the prosodic metric corresponds to) “and power of speech data” (“speech data” or at least one word of the plurality of words)).

Regarding claim 4, Ishii et al. do teach the method of claim 1, wherein the at least one prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, rhythm, and any combination thereof (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (receiving voice input) “and extracts prosodic information such as a pitch frequency” (and extracting a prosodic metric of “pitch”)).


and wherein the prosodic characteristics of the synthesized speech response are further based on a relationship between the at least one prosodic metric and the plurality of reference responses (¶ 0114 lines 3-7: “the sound generator 43 uses the power p(j) and the pitch frequency f(j)” (the prosodic characteristics) “to generate echo back speech y(t)” (are based on at least one of the reference responses) “in accordance with” (according to the following relationship) “y(t)=log(P(j))sin(2ΠNf(j)t)” (which is also a relationship between the “pitch frequency f(j)” (the prosodic metric) and “y(t)” (one of the plurality of reference responses); ¶ 0075 last 2 sentences: “The information required for rule-based speech synthesis includes” “prosodic information” (comprises prosodic characteristics e.g. “pitch frequency and power” (¶ 0091 line 4)), i.e. the “pitch frequency and power” (the prosodic characteristics) also depend on “speech synthesis” (“synthesized sound” (the other reference response))).

Regarding claim 6, Ishii et al. do teach the method of claim 5, further comprising identifying which of the plurality of reference responses comprise respective reference 

Regarding claim 7, Ishii et al. do teach the method of claim 5, further comprising identifying which of the plurality of reference responses is most closely related to the at least one prosodic metric, wherein the prosodic characteristics of the synthesized speech response are further based on the identified reference response (¶ 0087: “when” “the synthesized sound” “competes with the outputting of the echo back” “the output controller” “gives priority” (identifying) “to the outputting of the synthesized sound” (the “synthesized sound” (one of the reference responses) as the response to the voice input compared to “echo” (the other reference response), where “speech synthesis includes” “prosodic information” (“pitch frequency and power” (¶ 0091 line 

Regarding claim 9, Ishii et al. do teach the method of claim 1, wherein the prosodic characteristics are further based on at least one selected from the group comprising user voice input history, user language, user characteristics, user location, user preferences, and metadata tags associated with the user (¶ 0068: “Specifically, the acoustic model storage unit 24 stores an acoustic model indicating acoustic features of each phoneme or each syllable in the language of speech which is subjected to speech recognition” (the “acoustic model” used to determine “pitch frequency and power” (prosodic characteristics) depends on the “language of speech” (user language and voice input))).

Regarding claim 11, Ishii et al. do teach a system for providing a synthesized speech response to a voice input (page 9, 2nd column, line 10 above the bottom: “A program for causing a computer to perform a robot control process for controlling a robot” (a computer implemented method for controlling a “robot” (a system) response to a voice input through the “MICROPHONE” “15”, for generating a synthesized speech by the robot using the “SPEECH SYNTHESIZER” “55” in Fig. 10), 
the system comprising:

receiving the voice input (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (a voice input received, e.g. ¶ 0056 last line “user’s verbal contact, and other speeches”));
calculating at least one prosodic metric of the voice input (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of speech data” (voice input analyzed by a prosodic analyzer) “extracts” “pitch” (to calculate one prosodic metric)) ; 
determining a response to the voice input (¶ 0056 last 2 sentences: “synthesized sound is supplied to the loudspeaker” “Thus, the loudspeaker 18 outputs the robot’s voice” (a determined response) “various requests such as” “I’m hungry” (example of the response) “to the user, responses such as” “what?” (another response) “in response to user’s verbal contact, and other speeches” (to the user voice input));
generating the synthesized speech response, wherein the synthesized speech response comprises prosodic characteristics based on the response, and on the at least one prosodic metric (¶ 0075 last 2 sentences: “The text generator 31 extracts information” “for rule-based speech synthesis” (the synthesized speech response) “The information required for rule-based speech synthesis includes” “prosodic information” (comprises prosodic characteristics since “prosodic information” includes “pitch frequency and power” (¶ 0091 line 4)), e.g. according to ¶ 0091 sentence 1 “prosodic analyzer” “extracts” “pitch” (including the prosodic metric); furthermore according to ¶ 
 and
an output device for outputting the synthesized speech response (¶ 0082 lines 2+: “output controller” “supplies the synthesized sound from the speech synthesizer 55 to the loudspeaker” (outputting the synthesized speech response through an output device)).

Regarding claim 12, Ishii et al. do teach the system of claim 11, wherein the control circuitry is further configured to determine an emotion metric based on the voice input, wherein the prosodic characteristics are further based on the emotion 

Regarding claim 13, Ishii et al. do teach the system of claim 11, wherein:
the voice input comprises a plurality of words (¶ 0037 lines 4+: “The speech recognizer 50A reports the speech recognition result, which is a command, such as” “chase the ball” (a voice input comprising of plurality of words)), 
and each prosodic metric of the at least one prosodic metric corresponds to at least one word of the plurality of words (¶ 0092: “Specifically, the prosodic analyzer 42 assumes that one frame is a period longer than a normal human pitch period, such as 32 ms, and obtains a pitch frequency” (the prosodic metric corresponds to) “and power of speech data” (“speech data” or at least one word of the plurality of words)).

Regarding claim 14, Ishii et al. do teach the system of claim 11, wherein the at least one prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, rhythm, and any combination thereof (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (receiving voice input) “and extracts prosodic information such as a pitch frequency” (and extracting a prosodic metric of “pitch”)).

Regarding claim 15, Ishii et al. do teach the system of claim 11, wherein a plurality of reference responses are associated with the voice input (¶ 0087: “the synthesized sound” (one reference response which also corresponds to the synthesized speech response) “competes with the outputting of the echo back” (a second reference response to the voice input)),
and wherein the prosodic characteristics of the synthesized speech response are further based on a relationship between the at least one prosodic metric and the plurality of reference responses (¶ 0114 lines 3-7: “the sound generator 43 uses the power p(j) and the pitch frequency f(j)” (the prosodic characteristics) “to generate echo back speech y(t)” (are based on at least one of the reference responses) “in accordance with” (according to the following relationship) “y(t)=log(P(j))sin(2ΠNf(j)t)” (which is also a relationship between the “pitch frequency f(j)” (the prosodic metric) and “y(t)” (one of the plurality of reference responses); ¶ 0075 last 2 sentences: “The information required for rule-based speech synthesis includes” “prosodic information” (comprises prosodic characteristics e.g. “pitch frequency and power” (¶ 0091 line 4)), i.e. the “pitch frequency and power” (the prosodic characteristics) also depend on “speech synthesis” (“synthesized sound” (the other reference response))).



Regarding claim 17, Ishii et al. do teach the system of claim 15, wherein the control circuitry is further configured to identify which of the plurality of reference responses is most closely related to the at least one prosodic metric, wherein the prosodic characteristics of the synthesized speech response are further based on the identified reference response (¶ 0087: “when” “the synthesized sound” “competes with the outputting of the echo back” “the output controller” “gives priority” (identifying) “to 

Regarding claim 19, Ishii et al. do teach the system of claim 11, wherein the prosodic characteristics are further based on at least one selected from the group comprising user voice input history, user language, user characteristics, user location, user preferences, and metadata tags associated with the user (¶ 0068: “Specifically, the acoustic model storage unit 24 stores an acoustic model indicating acoustic features of each phoneme or each syllable in the language of speech which is subjected to speech recognition” (the “acoustic model” used to determine “pitch frequency and power” (prosodic characteristics) depends on the “language of speech” (user language and voice input))).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 8, 10, 18, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ishii et al.
Regarding claim 8, Ishii et al. default embodiment do teach the method of claim 1, wherein synthesizing the synthesized speech response comprises:
determining predicted prosodic characteristics of the response using a model (¶ 0091: “The prosodic analyzer 42 performs an acoustic analysis of the speech data, which is input thereto, in units of appropriate frames and extracts prosodic information such as a pitch frequency and power of the speech data” (the “pitch frequency and power” (predicted prosodic characteristics) are obtained by “an acoustic analysis”, where the “acoustic analysis” is conducted according to ¶ 0069 sentence 1 using an “acoustic model” (a model)). 
Ishii et al. default embodiment do not specifically disclose:
and modifying the predicted prosodic characteristics to generate the prosodic characteristics of the synthesized speech response.
Ishii et al. alternative embodiment does teach:
and modifying the predicted prosodic characteristics to generate the prosodic characteristics of the synthesized speech response (¶ 0088 lines 1-3: “Alternatively, the output controller 57 can give priority to the outputting of the echo back speech” 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the same operations on “p(j)” and “f(j)” for the “synthesized sound” (synthesized speech in the default embodiment) as it is done for “echo” (synthesized speech in the alternative embodiment), so as to help generate “back speech which sounds like the robot’s voice and which is listenable” as disclosed in Ishii et al. ¶ 0117.

Regarding claim 10, Ishii et al. default embodiment do not specifically disclose the method of claim 1, wherein the prosodic characteristics are further based on an interpolation operation affecting transitions in the synthesized speech response.
Ishii et al. alternative embodiment do teach the method of claim 1, wherein the prosodic characteristics are further based on an interpolation operation affecting transitions in the synthesized speech response (¶ 0119: “when echo back speech is generated” (in the alternative embodiment where the “echo” instead of “synthesized sound” is “priorit[ized] (¶ 0087) as synthesized speech response) “the power p(j) and the pitch frequency f(j) can be interpolated” (an interpolation operation is applied to the 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “interpolat[ion]” operation applied to “echo” to also be applied to “synthesized sound” so as to generate “back speech sounds natural to the user” as disclosed in Ishii et al. ¶ 0120 last sentence.

Regarding claim 18, Ishii et al. default embodiment do teach the system of claim 11, wherein the control circuitry is further configured to synthesize the synthesized speech response by:
determining predicted prosodic characteristics of the response using a model (¶ 0091: “The prosodic analyzer 42 performs an acoustic analysis of the speech data, which is input thereto, in units of appropriate frames and extracts prosodic information such as a pitch frequency and power of the speech data” (the “pitch frequency and power” (predicted prosodic characteristics) are obtained by “an acoustic analysis”, where the “acoustic analysis” is conducted according to ¶ 0069 sentence 1 using an “acoustic model” (a model)). 
Ishii et al. default embodiment do not specifically disclose:

Ishii et al. alternative embodiment does teach:
and modifying the predicted prosodic characteristics to generate the prosodic characteristics of the synthesized speech response (¶ 0088 lines 1-3: “Alternatively, the output controller 57 can give priority to the outputting of the echo back speech” (synthesized speech response is the “echo”); ¶ 0116: “the echo back speech” (synthesized speech response) “is generated by non-linearizing the power p(j)” “and multiplying the pitch frequency f(j) by N” (is obtained by modifying “p(i)” and “f(j)” (the predicted prosodic characteristics))).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the same operations on “p(j)” and “f(j)” for the “synthesized sound” (synthesized speech in the default embodiment) as it is done for “echo” (synthesized speech in the alternative embodiment), so as to help generate “back speech which sounds like the robot’s voice and which is listenable” as disclosed in Ishii et al. ¶ 0117.
Regarding claim 20, Ishii et al. default embodiment do not specifically disclose the system of claim 11, wherein the prosodic characteristics are further based on an interpolation operation affecting transitions in the synthesized speech response.

It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “interpolat[ion]” operation applied to “echo” to also be applied to “synthesized sound” so as to generate “back speech sounds natural to the user” as disclosed in Ishii et al. ¶ 0120 last sentence.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal 
Claim 1, 4, 3+5, 9, 10, 5, 11, 14, 13+15, 19, 20, 15 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 4,  7, 8, 9, 11, 12, 14, 17, 18, 19 of copending Application No. 15/931,261 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other because:
15/931,074
1.    (Original) A computer-implemented method for providing a synthesized speech response to a voice input, the method comprising:



receiving the voice input;
calculating at least one prosodic metric of the voice input; 

determining a response to the voice input;

generating the synthesized speech response, wherein the synthesized speech response comprises prosodic characteristics based on the determined response to the voice input, and on the at least one prosodic metric; and causing to be output the synthesized speech response.

1.    (Original) A computer-implemented method for training a model to provide information used to provide a synthesized speech response to a voice input, the method comprising:

receiving a plurality of voice inputs, each associated with at least one respective voice input prosodic metric;

responses, each associated with at least one respective response prosodic metric; and

training the model based on the plurality of voice inputs, the plurality of responses, the voice input prosodic metrics, and the response prosodic metrics such that the model outputs information used to generate the synthesized speech response to the voice input.

In re Karlson, 136 USPQ 184: “Omission of an element and its function in a combination where the remaining elements perform the same functions as before involves only routine skill in the art”.

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Farzad Kazeminezhad/
Art Unit 2657
November 8th 2021.