DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1 to 7 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor, at the time the application was filed, had possession of the claimed invention.
Independent claims 1 and 6 to 7 set forth new limitations that include several instances of new matter.  Mainly, Applicant’s Specification, as originally filed, fails to support the new limitation of “an output controller configured to output the first output sentence as text data to a first output unit and to output the second output sentence as voice data to a second output unit.”  Generally, Applicant’s Specification only describes 
Additionally, these independent claims set forth a limitation of “to generate, when the voice is not determined to be the predetermined voice, a second output sentence in which at least one word selected among words included in the notification is not replaced with another word”, which limitation is maintained to present issues of new matter.  The Specification, ¶[0062], does describe an embodiment where a word in a second output sentence is not replaced when a voice is classified into a second voice V1B instead of a first voice V1A.  However, this is not necessarily the same as generating “when the voice is not determined to be the predetermined voice”.  That is, a voice is still determined to be a predetermined voice V1B.  Here, voice V1B is still a predetermined voice, e.g., not a default voice, even if it is not voice V1A.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have 

Claims 1, 3, and 6 to 7 are rejected under 35 U.S.C. 103 as being unpatentable over Tseretopoulous et al. (U.S. Patent Publication 2019/0103127) in view of Subramanian et al. (U.S. Patent Publication 2011/0184721).
 Concerning independent claims 1 and 6 to 7, Tseretopoulous et al. discloses a system, software, and computer implemented method for conversational interface personalization based on input context, comprising:
“a voice classifying unit configured to analyze a voice spoken by a user and acquired by a voice acquiring unit to determine whether the voice is a predetermined voice” – a received input can be analyzed to identify a particular input personality type (“a predetermined voice”); when the received input is associated with a voice input, auditory factors can be considered including a pitch of the voice, a length of the sounds, a loudness of the voice, and a timber of the voice (“to analyze a voice spoken by the user”) (¶[0036]); received input is analyzed to identify an intent of the input and information on the personality of the input (¶[0038]: Figure 1); one or more operations can be performed to identify a particular personality of the received input; scores can related to a formality of the input, a politeness of the input, usage of particular regional phrases, an accent or particular phrasing associated with the input, a level of sarcasm within the input, and an emotional state associated with the input; personality input types can be associated with predetermined values or ranges of values for different input types that include ‘scared’, ‘calm, inquisitive’, ‘calm, instructional’, ‘angry’, and ‘angry, unsure’ (“determine whether a voice is a predetermined voice”) (¶[0050] - ¶[0051]: Figure 1); digital assistant 103 may be any interactive or virtual intelligence 
“an intention analyzing unit to detect intention information indicating what kind of information is wished to be acquired by the user” – a conversational input is analyzed to determine an intent (Abstract); a received input at conversational analysis system 102 is analyzed to identify an intent of the input (¶[0038]: Figure 1); conversational analysis system 102 can analyze the received input to determine an intent of the input, e.g., a particular question, query, comment (¶[0040]: Figure 1); natural language processing (NLP) engine 208 includes an intent deciphering module 210 (¶[0067]: Figure 2A); NLP engine 208 determines an intent of the conversational input, where the intent can represent a question, query, or information associated with conversational input 205, and can be determined by natural language understanding (NLU) using intent deciphering module 210 (¶[0068]: Figure 2A); 
“a notification-information acquiring unit configured to acquire notification information to be notified to the user based on the intention information” – a set of responsive content is determined and includes a set of initial tokens representing an initial response (Abstract); based on a determined intent (“based on the intention information”), conversational analysis system 102 can generate a corresponding 
“an output-content generating unit configured to generate, when the voice is determined to be the predetermined voice, a first output sentence in which at least one word selected among words included in the notification information is replaced with another word” – at least one token associated with a similar lexical personality score is replaced with a determined synonym token to generate a modified version of the response content that is then transmitted to a device in response to the input (Abstract); natural language generation engine (NLG) 124 generates a sentence (“a first output sentence”) or detailed response to be provided via conversational interface 108 (¶[0052]: Figure 1); once an initial set of response content is available, lexical personality filter module 128 can be used to determine and apply the appropriate modification to the response content to be used to generate a personalized response; a personalized response can be generated by identifying one or more synonyms from 
“to generate, when the voice is not determined to be the predetermined voice, a second output sentence in which at least one word selected among words included in the notification information is not replaced with another word” – NLG engine 228 can replace at least one token from the initial token set with the at least one suitable synonym; one embodiment provides that only one of the initial tokens may be replaced or substituted with a synonym token while in others multiple tokens or the entire set of tokens may be replaced (¶[0073]: Figure 2); where the conversational input is a spoken 
 “an output controller configured to output the first output sentence as text data to a first output unit and to output the second output sentence as voice data to a second output unit” – conversational interfaces may output responses in various formats including textual output and auditory outputs (¶[0003]); conversational interface 108 manages and conducts conversations and interactions via auditory or textual methods (¶[0048]: Figure 1); digital assistant 186 may work and interact via text or voice (¶[0063]: Figure 1).
Concerning independent claims 1 and 6 to 7, Tseretopoulous et al. generally discloses the concept of replacing words of a sentence according to a voice of a persona.  Implicitly, if a persona is determined to be a first persona, then words in a Tseretopoulous et al. actually discloses an embodiment where only one of the initial tokens may be replaced or substituted with a synonym token while in others multiple tokens or the entire set of tokens may be replaced.  (¶[0073]: Figure 2)  Given that a number of words and specific words may not be replaced, or are replaced with different words, for a determined persona, it is logical that there would be at least one word that is not replaced for a sentence of a second persona.  Tseretopoulous et al., then, discloses the limitation of “to generate, when the voice is not determined to be the predetermined voice, a second output sentence in which at least one word selected among words included in the notification information is not replaced with another word.”  Literally, this limitation only requires that one word is not replaced for a second persona as compared to words replaced for a first persona.  Additionally, Tseretopoulous et al. discloses that a conversational interface provides output of text or auditory voice.  The only element not clearly disclosed by Tseretopoulous et al. is “an output controller configured to output the first output sentence as text data to a first output unit and to output the second output sentence as voice data to a second output unit.”  Still, Tseretopoulous et al. discloses both text and auditory output, so that one skilled in the art could understand that text output and auditory output could be applied to different sentences.
Subramanian et al. teaches a similar idea of communicating across voice and text channels with emotion preservation.  (Abstract)  Emotion markup component 210 may specify particular words, phrases, sentences, and passages in a communication for emotion analysis.  (¶[0037]: Figure 2)  A speaker profile specifies a speaker’s dialect and geographic region, and also personality attributes that define the uniqueness of the speaker’s communications.  These attributes are used for modifying the dictionary definitions for words and speech patterns that the speaker uses to convey emotion.  (¶[0048]: Figure 2)  Voice analyzer 232 attempts to identify a speaker by comparing voice patterns in a conversation with voice patterns from identified speakers.  If voice analyzer 232 recognizes a speaker’s voice from its voice patterns, context analyzer 230 is notified which then selects a context profile for the speaker from profile database 212.  (¶[0050]: Figure 2)  Text and emotion translation architecture 272 translates text into a different dialect than the original communication, and can convert emotion data expressed in one culture to another culture using a set of emotion definitions in emotion to emotion dictionary 255.  The culture adjusted emotion metadata is then used to modify the translated text with emotion words and text patterns that is common to the culture of the language.  The translated emotion metadata might be used directly in textual communication of emails and instant messages.  If voice is desired, the translated emotion metadata is fed into speech and emotion synthesis architecture 270 which modulates the text into audible word sounds and adjusts the delivery with emotion using the translated emotion metadata.  (¶[0077]: Figure 5)  Text translator 252 can text mine emotion-text/phrase dictionary 220 for words and phrases that convey the emotion, but for the culture of the Subramanian et al., then, is similar to Tseretopoulous et al. as directed to analyzing a voice to classify it into an emotion, and then modifying words in sentences according to definitions in an emotion to emotion dictionary that reflect a culture of an identified voice.  Additionally, Subramanian et al. teaches that a decision can be made to output a communication with modified words as text or as synthesized voice in Step 826 of Figure 8B.  Subramanian et al., then, teaches “an output controller configured to output the first output sentence as text data to a first output unit and the second output sentence as voice data to a second output unit” because any given sentence can be output as text or synthesized voice.  An objective is to communicate across channels while preserving emotional content of a communication.  (¶[0005])  It would have been obvious to one having ordinary skill in the art to output a first sentence as text data and to output a second sentence as voice data as taught by Subramanian et al. to perform synonym replacement of words based on a detected accent or mood of Tseretopoulous et al.

Tseretopoulous et al. discloses “the output-content generation unit is further configured to read relationship information including information of a first word that is a predetermined word and a second word that is associated with the first word, and to replace a word included in the notification information with the second word when the word included in the notification information matches with the first word” because one or more synonyms of tokens included in the initial response are replaced with synonyms having a matching or similar set of characteristics as those determined to be associated with received input using personality-based response module 130.  Each of the base words 138 may be associated with a plurality of synonyms 140, where each synonym 140 is associated with a set of one or more predefined lexical personality scores 142.  These scores 142 are compared to the scores associated with the received input, and one or more of synonyms 140 is identified as appropriate for substitution based on their match.  (¶[0055]: Figure 1)  That is, “relationship information” is an association between words that are synonyms, so that a base word 138 and a synonym 140 for a base word are “information of a first word that is a predetermined word and a second word that is associated with the first word”.  Then synonym replacement based on score to determine a match is “to replace a word included in the notification information with the second word to generate the first output sentence when the word included in the notification information matches with the first word.”

Claims 2 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over Tseretopoulous et al. (U.S. Patent Publication 2019/0103127) in view of Subramanian et al. (U.S. Patent Publication 2011/0184721) as applied to claim 1 above, and further in view of Aoyama et al. (U.S. Patent Publication 2017/0337921).
Concerning claim 2, Tseretopoulous et al. discloses “the notification-information acquiring unit is further configured to acquire, as the notification information, content information to be notified to the user” and “the output-content generating unit is further configured to select the at least one word among the words included into the notification information based on a word included in the content information”.  That is, Tseretopoulous et al. broadly discloses content information that may include a response to a query and replacing at least one word in this response to the query that includes content information.  However, Tseretopoulous et al. does not disclose “type information indicating a type of the content information” and at least one word to be replaced with the other word based “the type information that is associated with the content information”.  Here, Tseretopoulous et al. does not provide “type information”.  
Concerning claim 2, Aoyama et al. teaches controlling output of a response to speech of a user in accordance with acquired information regarding a speech state of the user.  (Abstract)  User state estimation unit 15 is configured to estimate various states of the user on the basis of various types of acquired information including the acquired sound input.  User state estimation unit 15 calculates a ‘degree of composure’ which is a parameter for determining a psychological state of the user, e.g., whether the user is calm on the basis of the analysis result of the acquired sound input.  (¶[0102]: Figure 2)  Response parameter generation unit 16 is configured to generate a response parameter on the basis of the information regarding the detected user state and speech style of the user.  (¶[0115]: Figure 2)  Then response generation unit 17 replaces the Aoyama et al. teaches “type information” at least for a name of a person or a time expression, where a name of a person “Taro Yamada” can be replaced with “Mr. Yamada” or “Taro” and time “13:00” can be replaced with “1 pm” based on a ‘speech style of the user’.  (¶[0138] - ¶[0140]: Figure 7)  Figure 7 illustrates a data structure of ‘personal name data’ and ‘date-time expression pattern’, which are “type information indicating a type of the content information” for “content information” of “You have a meeting with Mr. Yamada tomorrow at 1 pm in room A” or “You have a meeting with Taro at 13:00 10/1/2014.”  (Compare Specification, ¶[0023] and ¶[0030]: Figures 4 to 5, which almost identically describes requesting schedule information for a meeting at a certain date and time with Mr. Yamada.)  An objective is to control a response to sound input in a preferred mode corresponding to a change in a situation of a user to make a user feel more comfortable.  (¶[0004] - ¶[0007])  It would have been obvious to one having ordinary skill in the art to replace words in an output sentence based on type information as taught by Aoyama et al. in a conversational interface that is personalized based on input context of Tseretopoulous et al. for a purpose of making a user feel more comfortable with a response.

Concerning claim 4, Tseretopoulous et al. discloses “the output-content generating unit is further configured to select the at least one word among the words included in the notification information” when words are replaced by their synonyms.  However, Tseretopoulous et al. omits “the relationship further includes information of type information indicating a type of the first word” and selecting a word as a word to be replaced “when the type information of the word included in the notification information matches with the type information of the first word and the word included in the notification information matches with the first word”.  Still, Aoyama et al. teaches “type information” at least for a name of a person or a time expression, where a name of a person “Taro Yamada” can be replaced with “Mr. Yamada” or “Taro” and time “13:00” can be replaced with “1 pm” based on a ‘speech style of the user’.  (¶[0138] - ¶[0140]: Figure 7)  Figure 7 illustrates a data structure of ‘personal name data’ and ‘date-time expression pattern’, which are “type information” for information content of “You have a meeting with Mr. Yamada tomorrow at 1 pm in room A” or “You have a meeting with Taro at 13:00 10/1/2014.”  Here, Aoyama et al. teaches “relationship information” that is not necessarily an association between synonyms, but is a data structure for replacing names and times based on “type information” of ‘personal name data’ and ‘date-time expression pattern’.  (¶[0121] - ¶[0126]: Figure 7)  That is, Aoyama et al. teaches that notification information of a person “Taro Yamada” matches with a notification of a name by response generation unit 17, and replaces “Taro Yamada” with “Mr. Yamada” or “Taro” based on “type information” of ‘personal name data’.


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Tseretopoulous et al. (U.S. Patent Publication 2019/0103127) in view of Subramanian et al. (U.S. Patent Publication 2011/0184721) as applied to claim 1 above, and further in view of Schuster et al. (U.S. Patent Publication 2017/0366662).
Tseretopoulous et al. does not expressly disclose “the voice classifying unit is further configured to determine the voice to be the predetermined voice when the voice spoken by the user is a whisper”, but discloses something close to it.  Specifically, Tseretopoulous et al. discloses received input can be a voice input, and auditory factors can be considered including a loudness of the voice.  (¶[0036])  Even if this limitation of “to determine the voice to be the predetermined voice when the voice spoken by the user is a whisper” is omitted by Tseretopoulous et al., it is taught by Schuster et al.  Generally, Schuster et al. teaches privacy mode detection and response over a voice activated interface to determine based on user input whether the user desires the device to enter a private mode.  While in the private mode, the device may alter the manner in which it communicates to the user.  (Abstract)  An utterance is received from a user, and it is determined whether the utterance is a whispered utterance.  Communications from the device are altered to the user if the user has whispered the utterance.  Normally audible communications from the device are instead played at reduced volume or are displayed and not audibly played at all in the altered condition.  (¶[0008])  A signal received at a microphone is analyzed to detect a whispered microphone input, where ‘whisper’ or ‘whispered’ refer to unvoiced or substantially unvoiced speech and hushed speech.  Cepstral analysis may be used to detect Schuster et al. to provide conversational personalization of Tseretopoulous et al. for a purpose of avoiding disturbing other people or to keep an interaction confidential.

Response to Arguments
Applicant’s arguments filed 01 March 2021 have been considered but are moot in view of new grounds of rejection as necessitated by amendment.
Applicant’s amendments overcome the objections to the title and to the Specification.  
Applicant amends the claims and presents arguments directed against the prior rejection of the independent claims as being obvious under 35 U.S.C. §103 over Tseretopoulous et al. (U.S. Patent Publication 2019/0103127) in view of Aoyama et al. (U.S. Patent Publication 2017/0337921).  Generally, Applicant amends the independent claims to set forth new limitations directed to “when the voice is determined to be the predetermined voice . . . to generate, when the voice is not determined to be the Tseretopoulous et al. and Aoyama et al. do not disclose or teach these new limitations.
Applicant’s amendments necessitate new grounds of rejection.  Firstly, these amendments raise issues of new matter under 35 U.S.C. §112(a).  Applicant alleges that support for these amendments is provided at ¶[0049], ¶[0067], and ¶[0077] of the Specification.  Here, these passages do not actually correspond to the paragraphs of the Specification, which actually concludes with ¶[0075].  So, Applicant could be referring to the corresponding paragraphs of the published application, U.S. Patent Publication 2019/0279631 (Naganuma).  Still, even this published application does not appear to support the claimed subject matter.
Specifically, independent claims 1 and 6 to 7 are amended to set forth “an output controller configured to output the first output sentence as text data to a first output unit and to output the second output sentence as voice data to a second output unit.”  It is clear that this embodiment is not contemplated by the originally-filed Specification.  At best, ¶[0069] of the Specification (or, ¶[0077] of the corresponding published application) describes an embodiment where a first output sentence can be output as text and not as voice.  However, Applicant’s Specification, as originally filed, does not contemplate that a first sentence is output as text data and a second sentence is output as voice data.  Instead, Applicant’s Specification, ¶[0069], only provides that a first 
Additionally, independent claims 1 and 6 to 7 are amended to set forth a new limitation of “to generate, when the voice is not determined to be the predetermined voice, a second output sentence in which at least one word selected among words included in the notification information is not replaced with another word”, which may similarly raise issues of new matter due to the limitation of “when the voice is not determined to be the predetermined voice”.   The Specification, ¶[0062] (or, ¶[0070] of the corresponding published application), does describe an embodiment where a word is not replaced in a second output sentence.  Still, it is not actually disclosed that outputting a second sentence without replacing a word is because “the voice is not determined to be the predetermined voice”.  Instead, this embodiment is actually classifying voice V1 into a second voice V1B, which is still a “predetermined voice”.  Even if voice V1 is classified into a second voice V1B instead of a first voice V1A, second voice V1B is a predetermined voice, i.e., the predetermined voice is second voice V1B.  This may appear to be a minor semantic difference, but the claim language implies some default scenario when a voice is not classified into any voice, so that no 
New grounds of rejection as set forth as directed to independent claims 1 and 6 to 7 as being obvious under 35 U.S.C. §103 over Tseretopoulous et al. (U.S. Patent Publication 2019/0103127) in view of Subramanian et al. (U.S. Patent Publication 2011/0184721).  Here, Subramanian et al. is being substituted in the rejection of the independent claims for Aoyama et al.  The rejection of some of the dependent claims continues to rely upon Aoyama et al. and Schuster et al.  Generally, the new limitations necessitate the new grounds of rejection.
Tseretopoulous et al. can be construed to disclose the new limitations of “an output-content generating unit configured to generate, when the voice is determined to be the predetermined voice, a first output sentence in which at least one word selected among words included in the notification information is replaced with another word, to generate, when the voice is not determined to be the predetermined voice, a second output sentence in which at least one word selected among words included in the notification information is not replaced with another word”.  Generally, Tseretopoulous et al. is directed to replacing words to generate sentences based on analysis of voice input.  If a certain personality input type is detected based on analysis of voice input, then words are replaced in an output sentence to reflect that personality type.  Tseretopoulous et al. discloses that there are a plurality of personality input types, e.g., scared, calm, angry, and words of a sentence are replaced accordingly.  Implicitly, a voice of a first personality input type can be construed to correspond to “the predetermined voice” and a voice of a second personality input type can be construed Tseretopoulous et al.  Specifically, Tseretopoulous et al., ¶[0055], ¶[0057], ¶[0073], and ¶[0081], discloses ‘replacing one or more of those base portions with synonyms’, ‘replacing one or more based words or tokens’, ‘replace at least one token from the initial token set’, and ‘at least one token from a set of initial token can be replaced’.  Broadly, “at least one word selected among words” in a sentence is not replaced by Tseretopoulous et al.  That is, it is generally the case that different word replacements apply to different personality input types, and not every word is replaced the same way for a given personality input type.  Tseretopoulous et al., then, reasonably discloses these limitations of the independent claims in accordance with the embodiments of the Specification.
Moreover, Tseretopoulous et al. and Subramanian et al. render obvious the limitation of “an output controller configured to output the first output sentence as text data to a first output unit and to output the second output sentence as voice data to a second output unit.”  Tseretopoulous et al., ¶[0003], ¶[0048], and ¶[0063], actually states that a conversational interface can output either text or auditory voice.  So, Tseretopoulous et al. discloses “an output controller configured to output . . . text data to a first output unit and to output . . . as voice data to a second output unit.”  Moreover, Subramanian et al., ¶[0102]: Figure 8B: Step 826, teaches that a check is made whether to synthesize the text into audio.  Subramanian et al., then, provides an output controller that can differentially output some sentences as text and some sentences as Subramanian et al. is directed to a similar problem as Tseretopoulous et al., to analyze a voice of a user, and to modify words in text to reflect emotional characteristics determined from the analysis of the voice.  Subramanian et al. does this with an emotion-text/phrase dictionary 220 that conveys emotion in words of the culture corresponding to the speaker’s voice.  Tseretopoulous et al. and Subramanian et al., then, render obvious all of the limitations of the independent claims.
Applicant’s amendments necessitate these new grounds of rejection.  Accordingly, this rejection is properly FINAL.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure.
Naganuma is a corresponding patent publication.
Dowlatkhah, Barton et al., Aoyagi et al., Teague et al., Ishii et al., Christian et al., and Endo et al. disclose related prior art.

Applicant's amendment necessitated the new grounds of rejection presented in this Office Action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP §706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached on Monday-Thursday 8:30 AM-6:00 PM.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private 




/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        March 18, 2021