DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the Amendments and Arguments filed on 16 April 2021. The Applicants’ amendment and remarks have been carefully considered, but they are not persuasive. Hence, this Action has been made FINAL. 
Any rejections of the previous office action not addressed in this action are considered resolved and no longer pertain to the prosecution of this application.

Response to Amendments and Arguments
The 101 rejection for claim 5 is removed.
The Means Plus Function Interpretation for claims 1-4 and 7 has been removed.
The applicant argues that the cited references do not teach or fairly suggest at least "a processor programmed to estimate a topic of the voice interaction and detect a change in the topic that has been estimated; and detect, in response to detecting the change in the topic by the processor, the user's voice as ask-again by the user based on prosodic information on the user's voice," (emphasis added) as recited in amended independent claim 1. Firstly, the applicant argues that Skantze, para [0073], merely discloses estimating, from non-linguistic information, whether or not the user wants to change the topic, not that there has been a change in the topic. The examiner notes, though, that Skantze, para [0062], states “When the topic inducement unit 130 receives a topic change instruction from the topic continuation determination unit 110, it generates a transition question as a response. Specifically, the topic inducement unit 130 generates response voice data indicating a transition question by using the transition question database 132.” This same paragraph of Skantze further states, “For example, the topic inducement unit 130 generates a "transition question" response that is unrelated to the user speech but prompts the user to provide the next topic such as "How is your rehabilitation going?". When the topic continuation determination unit 110 determines that the topic should be changed, the topic inducement unit 130 generates a transition question as described above. Therefore, it is possible to change a topic at an appropriate timing without giving the user a feeling of wrongness.” Thus, the Skantze does teach whether or not there has been a change in the topic, as recited in claim 1.
The examiner also observes that further explanation of the second limitation of claim 1 – “in response to detecting the change in the topic by the processor, the user’s voice as ask-again by the user based on prosodic information on the user’s voice” – would be required to provide a more meaningful, concrete claim interpretation. The claim limitation implies that the user repeats (asks again) a previously asked question. However, the claim makes no mention of how such a question is being used within the context of the claim. It is not clear how the “ask-again” is related to the change in the topic.   

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are 

Claim(s) 1-3 and 5-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20180122377, hereinafter referred to as Skantze et al., in view of US 9014365, hereinafter referred to as Castiglione.
Regarding claim 1 (currently amended), Skantze et al. discloses a voice interaction system that performs a voice interaction with a user (“FIG. 1 shows a hardware configuration of a voice interaction apparatus according to a first embodiment,” Skantze et al., para [0018].), the system comprising:  
a processor programmed to estimate a topic of the voice interaction and detect[[ing]] a change in the topic that has been estimated (“Meanwhile, when the user wants to change the topic, there is a strong tendency that, for example, an intention that the user wants to change the topic appears in non-linguistic information such as the rhythm of user speech or the duration of the same topic (i.e., the duration of one topic). Therefore, the voice interaction apparatus 1 according to this embodiment determines whether the topic should be continued or should be changed by using the non-linguistic information analysis result for user speech without using the syntactic analysis result,” Skantze et al., para [0073].).

Skantze et al. does not disclose in response to detectingprocessor, the user's voice as ask-again by the user based on prosodic information on the user's voice.

Castiglione is cited to disclose ask-again detection means for detecting, when the change in the topic has been detected by the topic detection means, the user's voice as ask-again by the user based on prosodic information on the user's voice (“The voice recognition module 115 can also include a voice stress analyzer to sense the amount of stress, frustration or anger in an audio signal associated with user 102….The speech synthesizer can also used to identify whether user 102 is presenting a prior-asked question. Here, CSR 111 can respond to user 102 by noting the continued problem and assuring the user the question will be carefully addressed. In some embodiments, presence detection module 107 is used in combination with voice recognition module 115 to identify a webpage of a current session for user 102,” Castiglione, col., 5 line 59 – col. 6, line 9.). Castiglione benefits Skantze et al. by the integration of a web-enabled system and a phone-based system that can be used to apprise a company representative of the browsing state of someone accessing a company's webpages (Castiglione, col. 2, lines 16-21). Therefore, it would be obvious for one skilled in the art to combine the teachings of Skantze et al. with those of Castiglione to improve the voice interaction experience of Skantze et al.  
 
As to claim 5, method claim 5 and system claim 1 are related as system and method of using the same, with each claimed element’s function corresponding to the system step. Accordingly claim 5 is similarly rejected under the same rationale as applied above with 

As to claim 6, CRM claim 6 and system claim 1 are related as system and CRM of using the same, with each claimed element’s function corresponding to the system step. Accordingly claim 6 is similarly rejected under the same rationale as applied above with respect to system claim. Also, Skantze et al., para [0038] teaches a CPU, memroy and CRM.

As to claim 7, system claim 7 and system claim 1 are related as system and system of using the same, with each claimed element’s function corresponding to the system step. Accordingly claim 7 is similarly rejected under the same rationale as applied above with respect to system claim. Also, Skantze et al., para [0038] teaches a CPU, memroy and CRM.

Regarding claim 2 (currently amended), Skantze et al., as modified by Castiglione, discloses the voice interaction system according to Claim 1, wherein the processor is further programmed to 

Note that the non-linguistic information is information that is different from the linguistic information (the character string) of user speech to be processed and includes at least one of prosodic information (or rhythm information) on the user speech and response history information,” Skantze et al., para [0046].), 
in response to detecting determining that the amount of change in the prosody detected by the processor On the other hand, when the topic duration D1 is shorter than the threshold Dth1 (Yes at step S202), the topic continuation determination unit 110 determines whether or not a maximum value max(f0.sub.z500) of a value f0.sub.z500 that is obtained by normalizing the fundamental frequency f0.sub.500 in 500 msec at the phrase end of the user speech is smaller than a predetermined threshold Mth1 (step S206). Specifically, the topic continuation determination unit 110 calculates the maximum value max(f0.sub.z500) from the non-linguistic information analysis result (the feature vector) and compares the calculated maximum value max(f0.sub.z500) with the threshold Mth1. Note that the calculation of the maximum value max(f0.sub.z500) may be performed by the non-linguistic information analysis unit 106,” Skantze et al., para [0077]. The fundamental frequency of a user’s voice is prosodic information.).  

Regarding claim 3 (currently amended), Skantze et al., as modified by Castiglione, discloses the voice interaction system according to claim 1, wherein the processor is further programmed to

receive the prosodic information and output[[tin]]g an ask-again detection, and machine learning a relation between the prosodic information and the ask-again detection (Skantze et al., para [0073]), and

in response to detecting receiving the prosodic information on the user's voice ting the ask-again detection, the user's voice as ask-again by the user (Skantze et al., para [0073]).


Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20180122377, hereinafter referred to as Skantze et al., in view of US 9014365, hereinafter referred to as Castiglione, and further in view of US 5357596, hereinafter referred to as Takebayashi et al.

Regarding claim 4 (currently amended), Skantze et al., as modified by Castiglione, discloses the voice interaction system according to claim 1, wherein the processor is further programmed to: 

generate, when the ask-again has been detected by the ask-again detection means, a response sentence for the ask-again in response to the ask-again, based on a response sentence responding to the user before the ask-again (“The voice recognition module 115 can also include a voice stress analyzer to sense the amount of stress, frustration or anger in an audio signal associated with user 102….The speech synthesizer can also used to identify whether user 102 is presenting a prior-asked question. Here, CSR 111 can respond to user 102 by noting the continued problem and assuring the user the question will be carefully addressed. In some embodiments, presence detection module 107 is used in 

Skantze et al., though, does not disclose wherein the response sentence generation means generates, when the response sentence includes a word whose frequency of appearance in a history of voice interaction with the user is equal to or smaller than a first predetermined value, the response sentence for the ask-again formed of only this word or the response sentence for the ask-again in which this word is emphasized in the response sentence.

Takebayashi et al. is cited to disclose wherein the response sentence generation means generates, when the response sentence includes a word whose frequency of appearance in a history of voice interaction with the user is equal to or smaller than a first predetermined value, the response sentence for the ask-again formed of only this word or the response sentence for the ask-again in which this word is emphasized in the response sentence (“In this first embodiment, the human character image information is given in a form shown in FIG. 20, which contains the labels of the system state and the user state at a time the semantic response representation supplied to the response generation unit 13 is generated in the dialogue management unit 12, the number of repetition N for a repeated part of the dialogue such as a part requiring the repeated questioning or confirmation, the emphasizing term in the semantic response representation which is determined by the dialogue management unit 12 to be emphasized in order to urge the firm confirmation to the user,” Takebayashi et al., col. 17, lines 52-63. Here, the word of the query is emphasized in the response sentence.). Takebayashi et al. benefits Skantze et al. by helping to insure that the voice interaction system properly understands a user’s request (Takebayashi et al., col. 17, lines 52-63). Therefore, it would be obvious for one skilled in the art to combine the teachings of Skantze et al. with those of Takebayashi et al. to improve the voice interaction experience of Skantze et al.  




Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNE L THOMAS-HOMESCU whose telephone number is (571)272-0899.  The examiner can normally be reached on Mon-Fri 8-6.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANNE L THOMAS-HOMESCU/Primary Examiner, Art Unit 2656