DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 8, 11, 20, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Annan et al. (U.S. Patent No. 9,375,845) in view of Di Fabbrizio et al. (U.S. Patent Publication 2015/0340033).
Concerning independent claims 1, 8, 11, 20, and 23, Annan et al. discloses a method, system, apparatus, and computer program for synchronizing robot motion with social interaction, comprising:
“an utterance receiving step in which an input part receives a user utterance performed by the user” – a social robot 120 may be a companion to an elderly man living on his own; the man may speak to social robot 120, and robot 120 may ‘hear’ the voice of the man using a microphone that is one if its sensors 126 (column 6, lines 21 to 24: Figure 1); robot dialogue 123 and robot motion script 124 may be triggered for play 
“a first presentation step in which a presentation part presents a dialogue-establishing utterance, wherein the dialogue-establishing utterance is presented to the user utterance, the dialogue-establishing utterance is a nodding, and no utterances are presented between the user utterance and the dialogue-establishing utterance” – a robot motion script is based on a robot dialogue, where a keyword may be mapped to a plurality of robot gestures that include a nod of a head towards the addressed person; a robot may synchronize a keyword ‘you’ with a head nod; a robot may parse a human dialogue directed to it and analyze the syntax of the human dialogue to select an appropriate gesture in response to being spoken to by the person; the robot motion script may include occasional head nods to indicate listening; if a human statement is a complaint, a robot motion script may include a roll of the robot eyes combined with a head nod (column 3, lines 19 to 65); robot motion scripts 156 are synchronized with dialogue spoken by a person and sensed by sensors 126, e.g., a microphone; motion enactments may compose nodding the head of a second social robot 140 to show continued interest or at least attention (column 8, lines 3 to 11: Figure 2); here, “a dialogue-establishing utterance” is defined as simply “a nodding”; broadly, “a dialogue-establishing utterance” does not necessarily even have to be spoken by a robot as audio for this claim limitation, as an ‘utterance’ may be construed as per se a ‘nodding’ within the scope of the claim language; however, synchronizing a robot motion of ‘nodding’ with playing aloud a dialogue by a robot necessarily includes that “the dialogue-establishing utterance is a nodding”, and that “no utterances are presented 
“a second presentation step in which the presentation part presents, after the dialogue-establishing utterance, a second utterance [which is an utterance determined with a predetermined rule] that considers words included in a generation target utterance, wherein the generation target utterance is one of (1) the user utterance [and (ii) the user utterance and one or more utterances performed before the user utterance,] and no utterances are presented between the dialogue-establishing utterance and the second utterance” – a robot dialogue comprises words and intonations for speaking those words; synchronizing robot motion with social interaction comprises playing aloud the dialogue by the robot and performing the robot motion script by the robot in synchronization with the playing aloud of the dialogue (Abstract); a robot may parse a human dialogue directed to it, and analyze the syntax of the human dialogue (“that considers words included in a generation target utterance, wherein the generation target utterance is one of (1) the user utterance”) (column 3, lines 19 to 65); based on robot dialogue 111, motion script application 108 executing on computer 102 may generate a robot motion script 112 to be synchronized with robot dialogue 111 when it is spoken by a social robot 120 (column 4, lines 55 to 59: Figure 1); processor 122 may execute an application that analyzes a voice signal received by a microphone, and may select one of robot dialogues 123 to play back through speaker 120; at the same time that robot dialog 123 is played back through speaker 130, one of the motion scripts 124 may be enacted by social robot 120 by controlling one or more of actuators 128 to move (column 6, lines 24 to 33: Figure 1); broadly, if “the dialogue-establishing utterance is a 
Concerning independent claims 1, 8, 11, 20, and 23, Annan et al. can be broadly construed to disclose all of the limitations of these independent claims with the exception of “a second utterance which is an utterance determined with a predetermined rule”.  That is, Annan et al. does not expressly disclose using “rules” to generate a second utterance from a user utterance.  Here, there are a variety of ways in the prior art of describing how a response is generated to a user utterance in an automated dialogue, and one way of describing this is in terms of “rules”.  Moreover, Annan et al. may be construed as disclosing both “a first presentation step” of “a dialogue-establishing utterance is a nodding” when a robot nods its head and “a second presentation step” of “a second utterance” that is generated by analyzing “a user utterance” because there is a synchronization of a robot nodding its head at the same time as playing aloud a robot dialogue.  That is, “a first presentation step” of “a dialogue-establishing utterance is a nodding” when a robot nods its head and “a second presentation step” of a robot playing a dialogue as “a second utterance” can both be construed as met by Annan et al.
Annan et al., this is taught by Di Fabbrizio et al.  Generally, Di Fabbrizio et al. teaches context interpretation in natural language processing using previous dialog actions, where subsequent user utterances can be interpreted using context information.  Interpretations of subsequent user utterances can be merged with interpretations of prior user utterances using a rule-based framework (“an utterance is determined with a predetermined rule that considered words included in . . . (ii) the user utterance and one or more utterances performed before the user utterance”).  Rules may be defined to determine which interpretations may be merged and under what circumstances they may be merged.  (Abstract)  A multi-turn dialogue may include a request to search for flights from a particular departure location to a particular destination location, e.g., “Search for flights from Los Angeles to Chicago” or “Search for flights from Los Angeles to Chicago in the morning on Friday”.  (¶[0012])  A rule-based approach to interpreting the context (semantic representations of prior user and system turns) in multi-turn interactions can improve natural language understanding accuracy by providing a framework in which to interpret a current user utterance in view of saved prior interpretations and dialog actions.  The rules define or facilitate determination of which dialog acts to trigger in response to user utterances, depending on which previous utterances the user has made and/or which dialog acts are previously triggered in a current multi-turn interaction.  (¶[0015])  A user may say, “Chicago” or “Go to Chicago” or “Let’s try Chicago” or “Destination Chicago”, and a natural language generation (NLG) module 210 may produce a response that is sent to client device 300 as a e.g., “Searching for flights to Chicago”.  (¶[0035] - ¶[0036]: Figure 2)  Di Fabbrizio et al., then, teaches “a second presentation step” of presenting “a second utterance” of “Searching for flights to Chicago” that is “determined with a predetermined rule” and that “considers words included in a generation target utterance”, i.e., “Chicago”.  Fabbrizio et al.’s “generation target utterance” can include “(ii) the user utterance and one or more utterances performed before the user utterance” because it considers previous user utterances in a multi-turn dialog.  Applicants’ “a second presentation step” of presenting “a second utterance” can be construed as a synchronized dialogue played aloud by Annan et al., or as a subsequent response to a user utterance by a speech processing system of Fabbrizio et al.  An objective is to enhance an ability of speech processing systems to naturally engage in and accurately manage multi-turn dialogue interactions with users.  (¶[0010])  It would have been obvious to one having ordinary skill in the art to determine an utterance with a predetermined rule that considers utterances before a current user utterance as taught by Fabbrizio et al. for synchronizing a robot motion of nodding with a dialogue that is played aloud in Annan et al. for a purpose of enhancing an ability of speech processing systems to naturally engage in and manage multi-turn dialogue interactions with users.  

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Annan et al. (U.S. Patent No. 9,375,845) in view of Di Fabbrizio et al. (U.S. Patent Publication 2015/0340033) as applied to claim 1 above, and further in view of Alkolkar et al. (U.S. Patent Publication 2015/0019228).
Annan et al. discloses “the first presentation step and the second presentation step are executed”, but omits “where an utterance candidate is generated in associated with the generation target utterance does not meet a prescribed standard, wherein the prescribed standard is whether or not the utterance candidate has inappropriate contents as a reply to the user utterance.”  Here, Applicants’ Specification, ¶[0047], is the only occurrence of a term “inappropriate”, and it is not entirely clear what it means for an utterance candidate to have content that is “inappropriate”.  Mainly, it appears that this is intended to mean only that any potential reply that is going to be generated by the system might be incorrect because a user utterance does not meet the prescribed standard.  That is, ¶[0047] of the Specification only equates “inappropriate content” as an utterance candidate that does not meet the prescribed standard so that an appropriate reply cannot be generated.  
Concerning claim 3, however, Alkolkar et al. teaches automated confirmation and disambiguation in voice applications, where a disambiguation/confirmation requirement may be generated when a degree of certainty (“a prescribed standard”) in a top candidate of a set of candidates (“an utterance candidate generated in associated with the generation target utterance”) is below a first predetermined threshold (“does not meet a prescribed standard”) or a difference between the degree of certainty in the top candidate and another candidate of the set of candidates is below a second predetermined threshold.  (¶[0013])  The disambiguation/confirmation requirement may be generated when the measure of certainty in the candidate is below a predetermined threshold.  (¶[0022])  The disambiguation/confirmation requirement may be generated when a degree of certainty in a top candidate of the set of candidates is below a first Alkolkar et al., then, teaches that “an utterance candidate” is generated as a disambiguation question if “the generation target utterance”, i.e., a user utterance, “does not meet a prescribed standard”.  Given what is described for “inappropriate contents” in Applicants’ Specification, this is simply equivalent to “the generation target utterance” of “the user utterance” not meeting the prescribed standard.  An objective is to address a problem when an automatic speech recognition system has difficulty understanding what a user said.  (¶[0006] - ¶[0007])  It would have been obvious to one having ordinary skill in the art to provide a disambiguation question when an utterance does not meet a prescribed standard as taught by Alkolkar et al. for synchronizing a robot motion with playing aloud of a dialogue in Annan et al. for a purpose of addressing a problem when an automatic speech recognition system has difficulty understanding what a user said.

Claim 19/1 is rejected under 35 U.S.C. 103 as being unpatentable over Annan et al. (U.S. Patent No. 9,375,845) in view of Di Fabbrizio et al. (U.S. Patent Publication 2015/0340033) as applied to claim 1 above, and further in view of Yamada et al. (U.S. Patent Publication 2002/049805).
Annan et al. discloses a dialogue by a robot that includes “the first presentation step” for a robot (“an agent”) that “presents the dialogue-establishing utterance” of nodding the head of the robot with a robot motion script 124 and “the second presentation step” for a robot that “presents the second utterance” of playing aloud a dialogue through speakers from a robot dialogue 123.  However, Annan et al. omits “the dialogue system comprises a plurality of agents”, i.e., a plurality of robots, and “any one agents of the plurality of agents presents the dialogue-establishing utterance” and “an agent different from the agent presenting the dialogue-establishing utterance of the plurality of agents presents the second utterance.”  That is, Annan et al. omits a first agent presenting a first utterance and a second agent presenting a second utterance.  However, this is taught by Yamada et al.  Specifically, Yamada et al. teaches a user support system having a specialized server that responds to a user utterance, where “the dialogue system comprises a plurality of agents” including local agent 152, chat agent 156, and recipe agent 160.  (¶[0101] - ¶[0104]: Figures 8 to 11)  Here, local agent 152 and chat agent 156 present utterances of “Welcome! Let’s chat” and “Hello. I am a chat agent.  Call me Peako”, which are “a dialogue-establishing utterance” to provide “in the first presentation step, any one agent of the plurality of agents presents the dialogue-establishing utterance”.  Then, recipe agent 160 is “an agent different from the Yamada et al. in a robot that plays a dialogue of Annan et al. for a purpose of obtaining a quick and proper response to a wide range of user utterances and requests.

Claim 19/3 is rejected under 35 U.S.C. 103 as being unpatentable over Annan et al. (U.S. Patent No. 9,375,845) in view of Di Fabbrizio et al. (U.S. Patent Publication 2015/0340033) and Alkolkar et al. (U.S. Patent Publication 2015/0019228) as applied to claims 1 and 3 above, and further in view of Yamada et al. (U.S. Patent Publication 2002/049805).
Similar considerations apply to claim 19/3 as apply to claim 19/1 in combination with Alkolkar et al.

Response to Arguments
Applicants’ arguments filed 27 January 2021 have been considered but are moot in view of new grounds of rejection, as necessitated by amendment.

Applicants amend the independent claims, and present arguments addressing the prior rejection for new matter under 35 U.S.C. §112(a).  Specifically, Applicants address two limitations that were cited as setting forth new matter in the prior Office Action.  Applicants state that they have amended the independent claims to change “wherein the generation target utterance includes one of (1) at least the user utterance” to “wherein the generation target utterance is one of (1) the user utterance . . . .”  Additionally, Applicants state that they have amended the independent claims to delete the limitation of “immediately before” and “an utterance that establish a dialogue even though the user utterance immediately before is any type of utterance”.  However, Applicants instead introduce new limitations directed to “no utterances are presented between the user utterance and the dialogue-establishing utterance” and “no utterances are presented between the dialogue-establishing utterance and the second utterance”.  Applicants allege that these amendments are supported by ¶[0042] and ¶[0045]: Figure 2 of the Specification.
Applicants’ amendments overcome the prior rejection for new matter under 35 U.S.C. §112(a), and this rejection is being withdrawn.  
Applicants amend the independent claims and provide arguments addressing the prior rejection under 35 U.S.C. §102(a)(1) over Yamada et al. (U.S. Patent Publication 2002/0049805).  Generally, Applicants delete some limitations from and add some limitations to the independent claims.  Specifically, Applicants delete limitations directed to a user utterance “immediately before” and “an utterance that establishes a dialogue Yamada et al.
Applicants’ amendments overcome the prior rejection of the independent claims as being anticipated under 35 U.S.C. §102(a)(1) by Yamada et al., but new grounds of rejection are set forth as directed to the independent claims as being obvious under 35 U.S.C. §103 over Annan et al. (U.S. Patent No. 9,375,845) in view of Di Fabbrizio et al. (U.S. Patent Publication 2015/0340033).  Mainly, Applicants’ arguments are moot given the new grounds of rejection as necessitated by the significant amendments to their independent claims.  Generally, Annan et al. discloses a dialogue-establishing utterance that includes a nodding by a robot as a response to a user utterance, and Di Fabbrizio et al. teaches a second utterance that is determined with a predetermined rule that considers words in the user utterance and/or one or more utterances performed before the user utterance.  All of the user utterance, a dialogue-establishing utterance, and a second utterance would immediately follow one another without any intervening utterances in Annan et al. and Di Fabbrizio et al.  That is, Annan et al. generates a nodding and playing aloud of a robot dialogue immediately after the user utterance, and Di Fabbrizio et al. provides a system utterance that is generated with a rule immediately after the user utterance.  Annan et al. and Di Fabbrizio et al. can be combined by construing Annan et al. as providing both a first presentation step and a second Di Fabbrizio et al., or Annan et al. can be construed as providing a first presentation step of nodding with a robot dialogue and Di Fabbrizio et al. can be construed as providing a second presentation with a rule.  
New grounds of rejection are set forth as directed to dependent claim 3 as being obvious under 35 U.S.C. §103 over Alkolkar et al.  Here, Applicants present arguments directed against the prior rejection that are relevant to an interpretation of these new grounds of rejection.  Specifically, Applicants appear to interpret the limitation of “where an utterance candidate generated in association with the generation target utterance does not meet the prescribed standard” as being different from a user utterance not meeting a prescribed standard.  However, this argument does not appear to be persuasive given the language of dependent claim 3.  It is understood that there may be a difference between determining if a user utterance meets a prescribed standard or if a robot/system utterance meets prescribed standard.  But the claim language recites this in terms of “the generation target utterance does not meet the prescribed standard”.  Specifically, the claim language defines that “the generation target utterance is one of (i) the user utterance”.  If the claim language defines that generation target utterance as being the user utterance, then the most reasonable way of making sense of this is that the user utterance is the generation target utterance because this user utterance is an utterance that is used for generating a target utterance, i.e., a second utterance.  Still, the generation target utterance is not itself an utterance that is going to be output by the robot/system, but is defined instead as the user utterance.  It is maintained that what is actually intended by this claim language is not clearly described by Applicants’ i.e., if a system does not clearly understand what a user said, then “a reply to the user utterance” is likely to be “inappropriate”.  Applicants’ Specification only includes one occurrence of the term “inappropriate” at ¶[0047]. 
Allowable subject matter is indicated for dependent claims 4 to 5.
Applicants’ amendments necessitate these new grounds of rejection.  This Office Action is NON-FINAL.

Allowable Subject Matter
Claims 4 to 5 and 19/4 to 19/5 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
  
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Nagisa et al., Wang et al., and Owada disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair.  Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        March 2, 2021