DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the office action from 7/6/2021, the applicant has submitted an amendment, filed 10/21/2021, amending claims 2, 8, 12, 18, while arguing to traverse the prior art rejections. Applicant’s arguments have been fully considered but the previous grounds of rejections are maintained for the reasons explained in the response to arguments. 
Response to Arguments
In what follows applicant’s arguments will be addressed in the order presented with each argument presented in a given ¶, to be followed by one or more ¶’s of respective examiner’s responses.
Following broad overviews of the latest amendments on page 7 ¶’s 1-2, in the 3rd ¶ the previous claim objections are discussed.
Due to the latest amendments the said objections are withdrawn.
On page 7 ¶ 4, the previous 112(b) rejections are discussed.
Due to the latest amendments the said rejections are withdrawn.
nd ¶ on page 8 it is asserted: “Ishii is silent as to training or any related concept” “either explicitly or inherently”.
It is unclear what feature if any of the rejection the applicant does not believe maps to the claim limitation’s main features namely “training” and/or the “model” and why. 
The claim is silent on the scopes of both the “model” as well its “training”. According to specification ¶ 0005 lines 2+: “model” “e.g., an algorithmic model, a neural network model, or any other suitable model”; ¶ 0081 sentence 2: “The model may, for example, include the results of a training model (e.g., training model 370 of Fig. 3), which may include correlations, probabilities, confidences, and other values indicative of the model”; ¶ 0011 sentence 1: “In an embodiment” “the system” “trains the model further based on the emotion metrics of the voice input”; ¶ 0023 sentence 2: “For example, the list of features used for training may cover basic prosodic features that affect the sound of a voice input or a response” “model may be multi-labelled to 
From these passages, it is unclear which if any of the above listed models and/or trainings have been employed by the disclosure let alone the claims. Furthermore none are limiting and none are specifically claimed.
Nonetheless, Ishii Eq. 3 is to “generate echo back speech” using “power” and “pitch”, where “echo” which is a “synthesized sound” is generated by “echo back unit 56 for outputting echo” (Ishii ¶ 0035 last sentence). According to Ishii ¶ 0129: “sound generator 43 generates prosody-controlled echo back speech which is obtained by controlling the prosody extracted from the user’s speech based on emotions, instincts, and growth states expressed by the emotion models, instinct models, and growth models”. This very clearly explains that the process of “echo” “generat[ion]” also explained by Eq. 3 is governed by at least “emotion models” and other “models” and does teach that “explicitly”. Furthermore, the “echo” being a “synthesized sound” requires a “synthesizer” which “inherently” is known to require a speech synthesis model. Still further “echo” being “generated” by “controlling the prosody” and/or “pitch” amounts to a training operation, and adjusting “pitch” and “prosody”  “cover prosodic features” as required by any “training” according to the quoted applicant disclosure ¶ 0023 above.

As regards to 103 rejections pertaining to mainly dependent claims, it is asserted on page 8 paragraph before last: they “are considered allowable for at least the above reasons regarding claims 1 and 11”
Since applicants have not argued the merits of these dependent claims, but assert patentability solely through their dependence on the allegedly patentable parent claims, they stand or fall with said parent claims and hence no further response to applicant’s arguments is necessary.
Finally as regards to the double patenting rejections, it is suggested that “The claims of the ‘074 application do not recite training or a model. Accordingly, reconsideration and withdrawal of the double patenting rejections are in order”.
Respectfully the double patenting rejection did not rely on ‘074 for the modeling and/or the training. It was an ODP with a secondary reference Ishii that was used for that feature. Furthermore it is incorrect that “074” claims do not “recite” e.g. “model”. Claim 8 of that application recites: “determining” “prosodic characteristics” “using a model”; “modifying the prosodic characteristics” which amounts to a training.
Claim Rejections - 35 USC § 102

A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-4, 7-14, 17-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Ishii et al. (US 2003/0055653).
Regarding claim 1, Ishii et al. do teach a computer-implemented method for training a model to provide information used to provide a synthesized speech response to a voice input (page 9, 2nd column, line 10 above the bottom: “A program for causing a computer to perform a robot control process for controlling a robot” (a computer implanted method for controlling a robot response to a voice input through the “MIRCROPHONE” “15”, for generating “echo” and/or synthesized speech by the robot using the “SPEECH SYNTHESIZER” “55” in Fig. 10 and using Eq. 3 in ¶ 0114 (by a training model since the “echo” “generation” is “based on” “emotions” “expressed by the emotion models” (¶ 0129))), 
the method comprising:
receiving a plurality of voice inputs, each associated with at least one respective voice input prosodic metric (¶ 0037 lines 4+: “The speech recognizer” “reports the speech” “which is a command” “walk” “down” “chase the ball” (receiving a plurality of voice inputs); ¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of 
receiving a plurality of responses, each associated with at least one respective response prosodic metric (¶ 0075 last 2 sentences: “The text generator 31 extracts information” “for rule-based speech synthesis” (first response received) “The information required for the rule-based speech synthesis includes” “prosodic information” (comprises “pitch” (the prosodic metric) “and power”(another prosodic metric)) (¶ 0091 line 4)); ¶ 0087 lines 1+: “when” “synthesized sound” “competes with” “echo back speech” (receiving a second response), where “echo back speech” is generated according to formula in Eq. 3 ¶ 0114 which depends on “power p(j)” and “pitch” “f(j)” (the response prosodic metrics)); and
training the model based on the plurality of voice inputs, the plurality of responses, the voice input prosodic metrics, and the response prosodic metrics such that the model outputs information used to generate the synthesized speech response to the voice input (¶ 0116: “In accordance with equation (3)” (the training model) “the echo back speech y(t) is generated by non-linearizing the power p(j)” “and multiplying the pitch frequency f(j) by N” (is trained based on “p(j)” and “f(j)” (the prosodic metrics of the voice inputs) as well as “non-lineariz[ed]” “power” and “f(j)” times “N” (the prosodic metrics of the “echo” (response prosodic metrics and responses) to obtain 

Regarding claim 2, Ishii et al. do teach the method of claim 1, wherein:
the at least one voice input prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, and rhythm (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (based on the voice input) “and extracts” “pitch frequency” (to extract pitch as prosodic metric))); and
the at least one response prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, and rhythm (the “echo” (response) according to ¶ 0116 depends on “non-lineariz[ed]” “power P(J)” and “pitch frequency f(j)” (the pitch)).

Regarding claim 3, Ishii et al. do teach the method of claim 1, wherein each respective voice input of the plurality of voice inputs comprises a first plurality of words, and wherein each respective response of the plurality of responses comprises a second plurality of words (¶ 0037 lines 4+: “The speech recognizer” “reports the speech” “which is a command” “walk” “down” “chase the ball” (each voice input comprises of a first plurality of words, their respective “synthesized sound” “compet[ing] with” “echo 
further comprising, for each respective voice input and for each respective response:
receiving one or more first word transition metrics among words of the first plurality of words (¶ 0119: “when” “echo” “generated” “the power p(j) and the pitch frequency f(j) can be interpolated” (a first transition metric obtained) “or decimated”);
receiving one or more second word transition metrics among words of the second plurality of words (¶ 0119: “when” “echo” “generated” “the power p(j) and the pitch frequency f(j) can be interpolated” “or decimated” (a second transition metric obtained)); and
training the model further based on each first word transition and each second word transition (the power “p(j)” and frequency “f(j)” used in Eq. “3” (the model) are according to ¶ 0119 lines 3+ “interpolated or decimated” (are trained based on “interpolat[ion]” (first word transition) and/or “decimat[ion]” (second word transition)) “thereby generating echo back speech having a duration longer than or shorter” (causing a transition) “than that of the speech section of the user’s speech” (in the voice inputs)).

Regarding claim 4, Ishii et al. do teach the method of claim 1, wherein:

the at least one respective voice input prosodic metric is associated with one or more words of the first plurality of words (¶ 0092: “Specifically, the prosodic analyzer 42 assumes that one frame is a period longer than a normal human pitch period, such as 32 ms, and obtains a pitch frequency” (the prosodic metric corresponds to) “and power of speech data” (“speech data”, e.g. “walk” (at least one word of the first plurality of words));
each respective response of the plurality of responses comprises a second plurality of words (the respective “synthesized sound” “compet[ing] with” “echo back speech” (the plurality of responses (¶ 0087)) are thus comprised of also a second plurality of corresponding words)) ; 
and the at least one respective response prosodic metric is associated with one or more words of the second plurality of words (¶ 0092: “Specifically, the prosodic analyzer 42 assumes that one frame is a period longer than a normal human pitch period, such as 32 ms, and obtains a pitch frequency” (the prosodic metric corresponds to) “and power of speech data” (e.g. of the “echo back”, e.g. “echo” of the uttered user command “walk” (at least one word of the second plurality of words))).

Regarding claim 7, Ishii et al. do teach the method of claim 1, further comprising:
receiving user profile information selected from the group comprising user voice input history, user language, user characteristics, user location, user preferences, and metadata tags associated with the user (¶ 0068: “Specifically, the acoustic model storage unit 24 stores an acoustic model indicating acoustic features of each phoneme or each syllable in the language of speech which is subjected to speech recognition” (a user’s “language of speech” (a user profile) is received via the “acoustic model”, where the “acoustic model” is further used according to ¶ 0081 to determine “pitch frequency and power” (i.e., a metadata tag))); 
and training the model further based on the user profile information (¶ 0116: “In accordance with equation (3)” (the training model) “the echo back speech y(t) is generated by non-linearizing the power p(j)” (is based on a meta data tag (user profile information)) “and multiplying the pitch frequency” “by N” (and “pitch” which is specific to the “language of speech” (user profile as well)).

Regarding claim 8, Ishii et al. do teach the method of claim 1, further comprising:
receiving respective interpolation metrics among a transition of words of each voice input of the plurality of voice inputs (¶ 0119: “when echo back speech is generated” “the power p(j) and the pitch frequency f(j) can be interpolated” (one 
and
training the model further based on each respective interpolation metric (to “interpolat[e]” or “decimat[e]” “power” and/or “pitch” (using the interpolation metrics) will alter Eq. 3 (cause further training of the model)).

Regarding claim 9, Ishii et al. do teach the method of claim 1, wherein each voice input of the plurality of voice inputs is linked with a respective set of responses of the plurality of responses (¶ 0037 lines 4+: “The speech recognizer” “reports the speech” “which is a command” “walk” “down” “chase the ball” (receiving a plurality of voice inputs); ¶ 0087 lines 1+: “when” “synthesized sound” “competes with” “echo back speech” (receiving a set of respective plurality of responses i.e. “synthesized sound” and “echo back” of each of the plurality of the above “command[s]” (voice inputs)).

Regarding claim 10, Ishii et al. do teach the method of claim 1, wherein the information used to generate the synthesized speech response comprises prosodic characteristics (¶ 0075 last 2 sentences: “The text generator 31 extracts information” 

Regarding claim 11, Ishii et al. do teach a system for training a model to provide information used to provide a synthesized speech response to a voice input (page 9, 2nd column, line 10 above the bottom: “A program for causing a computer to perform a robot control process for controlling a robot” (a system for controlling a robot response to a voice input through the “MIRCROPHONE” “15”, for generating “echo” and/or synthesized speech by the robot using the “SPEECH SYNTHESIZER” “55” in Fig. 10 and using Eq. 3 in ¶ 0114 (by a training model since the said “SYNTHESIZER” for generating “echo” relies on “emotion models” “instinct models” and “growth models” (¶ 0129))), 
the method comprising:
a control circuitry (the robot) 
configured to:

receive a plurality of responses, each associated with at least one respective response prosodic metric (¶ 0075 last 2 sentences: “The text generator 31 extracts information” “for rule-based speech synthesis” (first response received) “The information required for the rule-based speech synthesis includes” “prosodic information” (comprises “pitch” (the prosodic metric) “and power”(another prosodic metric)) (¶ 0091 line 4)); ¶ 0087 lines 1+: “when” “synthesized sound” “competes with” “echo back speech” (receiving a second response), where “echo back speech” is generated according to formula in Eq. 3 ¶ 0114 which depends on “power p(j)” and “pitch” “f(j)” (the response prosodic metrics)); and
train the model based on the plurality of voice inputs, the plurality of responses, the voice input prosodic metrics, and the response prosodic metrics such that the model outputs information used to generate the synthesized speech response to the voice input (¶ 0116: “In accordance with equation (3)” (the training model) “the echo back speech y(t) is generated by non-linearizing the power p(j)” “and multiplying the pitch 
and a storage device for storing the information (¶ 0099: “The output unit 44 stores” (storing) “the echo back speech” (the information) “data from the sound generator” “in a memory” (in a storage device)).

Regarding claim 12, Ishii et al. do teach the system of claim 11, wherein:
the at least one voice input prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, and rhythm (¶ 0091 sentence 1: “prosodic analyzer” “performs an acoustic analysis of the speech data” (based on the voice input) “and extracts” “pitch frequency” (to extract pitch as prosodic metric))); and
the at least one response prosodic metric is selected from the group comprising pitch, note, duration, prominence, timbre, rate, and rhythm (the “echo” (response) according to ¶ 0116 depends on “non-lineariz[ed]” “power P(J)” and “pitch frequency f(j)” (the pitch)).


And wherein the control circuitry is further configured to, for each respective voice input and for each respective response:
receive one or more first word transition metrics among words of the first plurality of words (¶ 0119: “when” “echo” “generated” “the power p(j) and the pitch frequency f(j) can be interpolated” (a first transition metric obtained) “or decimated”);
receive one or more second word transition metrics among words of the second plurality of words (¶ 0119: “when” “echo” “generated” “the power p(j) and the pitch frequency f(j) can be interpolated” “or decimated” (a second transition metric obtained)); and
train the model further based on each first word transition and each second word transition (the power “p(j)” and frequency “f(j)” used in Eq. “3” (the model) are according to ¶ 0119 lines 3+ “interpolated or decimated” (are trained based on 

Regarding claim 14, Ishii et al. do teach the system of claim 11, wherein:
each respective voice input of the plurality of voice inputs comprises a first plurality of words (¶ 0037 lines 4+: “The speech recognizer” “reports the speech” “which is a command” “walk” “down” “chase the ball” (each voice input comprises of a first plurality of words));
the at least one respective voice input prosodic metric is associated with one or more words of the first plurality of words (¶ 0092: “Specifically, the prosodic analyzer 42 assumes that one frame is a period longer than a normal human pitch period, such as 32 ms, and obtains a pitch frequency” (the prosodic metric corresponds to) “and power of speech data” (“speech data”, e.g. “walk” (at least one word of the first plurality of words));
each respective response of the plurality of responses comprises a second plurality of words (the respective “synthesized sound” “compet[ing] with” “echo back speech” (the plurality of responses (¶ 0087)) are thus comprised of also a second plurality of corresponding words)) ; 


Regarding claim 17, Ishii et al. do teach the system of claim 11, wherein the control circuitry is further configured to:
receive user profile information selected from the group comprising user voice input history, user language, user characteristics, user location, user preferences, and metadata tags associated with the user (¶ 0068: “Specifically, the acoustic model storage unit 24 stores an acoustic model indicating acoustic features of each phoneme or each syllable in the language of speech which is subjected to speech recognition” (a user’s “language of speech” (a user profile) is received via the “acoustic model”, where the “acoustic model” is further used according to ¶ 0081 to determine “pitch frequency and power” (i.e., a metadata tag))); 
and train the model further based on the user profile information (¶ 0116: “In accordance with equation (3)” (the training model) “the echo back speech y(t) is generated by non-linearizing the power p(j)” (is based on a meta data tag (user profile 

Regarding claim 18, Ishii et al. do teach the system of claim 11, wherein the control circuitry is further configured to:
receive respective interpolation metrics among a transition of words of each voice input of the plurality of voice inputs (¶ 0119: “when echo back speech is generated” “the power p(j) and the pitch frequency f(j) can be interpolated” (one interpolation metric received) “or decimated” (a second interpolation metric received) “thereby generating echo back speech” “having a duration longer than or shorter than that of the speech section of the user’s speech” (causing a transition from the voice inputs by altering e.g. their “duration”)); 
and
train the model further based on each respective interpolation metric (to “interpolat[e]” or “decimat[e]” “power” and/or “pitch” (using the interpolation metrics) will alter Eq. 3 (cause further training of the model)).

Regarding claim 19, Ishii et al. do teach the system of claim 11, wherein each voice input of the plurality of voice inputs is linked with a respective set of responses of the plurality of responses (¶ 0037 lines 4+: “The speech recognizer” “reports the 


Regarding claim 20, Ishii et al. do teach the system of claim 11, wherein the information used to generate the synthesized speech response comprises prosodic characteristics (¶ 0075 last 2 sentences: “The text generator 31 extracts information” “for rule-based speech synthesis” (synthesized speech response or the information used to generate the speech response) “The information required for the rule-based speech synthesis includes” “prosodic information” (comprises “pitch” “and power”(prosodic characteristics) (¶ 0091 line 4)); ¶ 0087 lines 1+: “when” “synthesized sound” “competes with” “echo back speech” (receiving a second response), where “echo back speech” is generated according to formula in Eq. 3 ¶ 0114 which depends on “power p(j)” and “pitch” “f(j)” (“echo” (another synthesized (information used to generated) speech response) also depends on “power” and “pitch” (the prosodic characteristics)).

Claim Rejections - 35 USC § 103

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ishii et al., and further in view of Nakanishi et al. (US 2020/0135197).
Regarding claim 5, Ishii et al. do not specifically disclose the method of claim 1, wherein the plurality of voice inputs and the plurality of responses are associated in a database, the method further comprising retrieving the plurality of voice inputs and the plurality of responses from the database.
Nakanishi et al. do teach the method of claim 1, wherein the plurality of voice inputs and the plurality of responses are associated in a database, the method further comprising retrieving the plurality of voice inputs and the plurality of responses from the database (¶ 0034 lines 1+: “Each response generation module refers to the speech database 210” (a database used) “to generate” (to retrieve stored) “the response speech” (responses) “matching the response type” (associated with stored voice inputs); ¶ 0006 lines 5+: “input speech belongs to a plurality of respective classified classes previously defined as types of speech contents” (“type” corresponds to “input speech”)).


Regarding claim 15, Ishii et al. do not specifically disclose the system of claim 11, wherein the plurality of voice inputs and the plurality of responses are associated in a database, and wherein the control circuitry is further configured to retrieve the plurality of voice inputs and the plurality of responses from the database.
Nakanishi et al. do teach the system of claim 11, wherein the plurality of voice inputs and the plurality of responses are associated in a database, and wherein the control circuitry is further configured to retrieve the plurality of voice inputs and the plurality of responses from the database (¶ 0034 lines 1+: “Each response generation module refers to the speech database 210” (a database used) “to generate” (to retrieve stored) “the response speech” (responses) “matching the response type” (associated with stored voice inputs); ¶ 0006 lines 5+: “input speech belongs to a plurality of 
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the “database” of Nakanishi et al. into the robot “OUTPUT CONTROLLER” of Ishii et al. (Fig. 3) would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Ishii et al. “selection variation of the output speech for the input speech is increased, so that dialogs can be diverse and unexpected” as disclosed in Nakanishi et al. ¶ 0006 last sentence.

Claims  6, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ishii et al.
Regarding claim 6, Ishii et al. default embodiment do teach the method of claim 1, further comprising: 
receiving a first emotion metric for each respective voice input (¶ 0129 lines 3-5: “prosody extracted pitch and power from the user’s speech” (from each voice input) “based on emotions” (receiving a first emotion metric));
receiving a second emotion metric for each respective response corresponding to the respective voice input (¶ 0129: “The sound generator” “generates prosody controlled echo back speech which is obtained by” “prosody extracted from the user’s 
Ishii et al. default embodiment do not specifically disclose:
And training the model further based on each first emotion metric and each second emotion.
Ishii et al. alternative embodiment do teach:
And training the model further based on each first emotion metric and each second emotion metric(¶ 0047 lines 4+ “the model storage unit 51 increases or decreases” (training the) “the values of the emotion models” (emotion models based on the [emotion] “values” (the first and second emotion metrics))).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the function of “emotion models” of the alternative embodiment of Ishii et al. into default embodiment associated with Eq. 3 of Ishii et al. would enable the combined systems and their associated methods to perform in combination as they do separately and to further enable Ishii et al. default embodiment using the “emotion model” by the “sound generator” to “generate” “echo back speech” “by controlling the prosody extracted form user’s speech based on emotions” as disclosed in Ishii et al. ¶ 0129.


Receive  a first emotion metric for each respective voice input (¶ 0129 lines 3-5: “prosody extracted pitch and power from the user’s speech” (from each voice input) “based on emotions” (receiving a first emotion metric));
receive a second emotion metric for each respective response corresponding to the respective voice input (¶ 0129: “The sound generator” “generates prosody controlled echo back speech which is obtained by” “prosody extracted from the user’s speech based on emotions” (“emotions” (second emotion metric) of “echo” (respective responses)) “expressed by the emotion models” (obtained by emotion models)).
Ishii et al. default embodiment do not specifically disclose:
And train the model further based on each first emotion metric and each second emotion.
Ishii et al. alternative embodiment do teach:
And train the model further based on each first emotion metric and each second emotion metric(¶ 0047 lines 4+ “the model storage unit 51 increases or decreases” (training the) “the values of the emotion models” (emotion models based on the [emotion] “values” (the first and second emotion metrics))).
It would have therefore been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the function of “emotion .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claim 1, 2, 4,  7, 8, 9, 11, 12, 14, 17, 18, 19  are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 4, 3+5, 9,   of copending Application No. 15/931,074 in view of Ishii et al. 

15/931,074
1.    (Original) A computer-implemented method for providing a synthesized speech response to a voice input, the method comprising:



receiving the voice input;
calculating at least one prosodic metric of the voice input; 

determining a response to the voice input;

generating the synthesized speech response, wherein the synthesized and causing to be output the synthesized speech response.

1.    (Original) A computer-implemented method for training a model to provide information used to provide a synthesized speech response to a voice input, the method comprising:


receiving a plurality of voice inputs, each associated with at least one respective voice input prosodic metric;

receiving a plurality of responses, each associated with at least one respective response prosodic metric; and

outputs information used to generate the synthesized speech response to the voice input.



15/931,074 claims do not specifically disclose:
training the model based on the plurality of voice inputs, the plurality of responses, the voice input prosodic metrics.
Ishii et al. do teach:
training the model based on the plurality of voice inputs, the plurality of responses, the voice input prosodic metrics (¶ 0116: “In accordance with equation (3)” (the training model) “the echo back speech y(t) is generated by non-linearizing the power p(j)” “and multiplying the pitch frequency f(j) by N” (is trained based on “p(j)” and “f(j)” (the prosodic metrics of the voice inputs) as well as “non-lineariz[ed]” “power” and “f(j)” times “N” (the prosodic metrics of the “echo” (response prosodic metrics and responses) to obtain “y(t)” (output information used to generate “echo back” (synthesized) speech), where “echo” is generated based on “emotion models” “instinct models” and “growth models” (¶ 0129))).

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
This is a provisional nonstatutory double patenting rejection.
Conclusion


Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARZAD KAZEMINEZHAD whose telephone number is (571)270-5860. The examiner can normally be reached 10:30 am to 11:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, DANIEL C WASHBURN can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in 





/Farzad Kazeminezhad/
Art Unit 2657
November 6th 2021.