Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement  
Acknowledgement is made of applicant’s amendment made on 12/07/2021. Applicant’s submission filed has been entered and made of record.
Status of the Claims
Claims 1-16 are pending. 
Response to Applicant’s Argument
Rejection under 35 USC 101 has been withdrawn in view of amendment to claim 14. 
In response to “On the other hand, the present claims recite a particular sequence that is not disclosed by Happ. As agreed on in the interview, Happ does not disclose, inter alia, "capturing, with said machine interface, a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor; detecting, with said machine interface, a termination of said utterance from said human interlocutor; capturing, with said machine interface only after detecting the termination, a second intention indicator based on a body movement of said human interlocutor; determining, with said machine interface, whether said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding ”.
In view of the amendments to Claims 1 and 10, previous grounds of rejections are vacated. Upon further search and consideration, please see details regarding new grounds of rejections below. 
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-6, 8-11, and 13-16 are rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) in view of Sugiyama et al. (“Estimating Response Obligation in Multi-Party Human-Robot Dialogues”).
Regarding Claim 1, Happ discloses a method of detecting a cession of speaking turn by a human interlocutor in a dialog with a machine interface (Figs. 1-2 and Col 3, Rows 51-55, a complex computer conversational interface), said method comprising: 
capturing, with said machine interface, a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor (Col 3, Rows 57-64, implementing an audio input interface comprising microphone 24 and sound card 8); 
detecting, with said machine interface, a termination of said utterance from said human interlocutor (Col 3, Rows 57-61 and Col 4, Rows 8-12 and Rows 51-53, processor 30 implements a synthesized actor that controls the conversation and gather information; Col 4, Rows 60-66, gathering verbal information such as end user / client’s speech and changing the voice to signal completion), 
capturing, with said machine interface at the time of detecting the termination, a second intention indicator based on a body movement of said human interlocutor (Col 4, Rows 58-66, synthesized actor / processor 30 gathers information consist of verbal behavior and non-verbal behavior such as head, eye, facial, hand and body position movement to cue start, continuation, or end of statement / question where non-verbal behavior can be combined with verbal behavior; in view of Col 4, Rows 35-45 and Fig 2, implementing conversation flow chart showing client’s or end user’s turn 46 passing control to synthesized actor’s turn 44); 
determining, with said machine interface, whether said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog (Col 4, Rows 62-66, non-verbal behavior can be combined with verbal behavior such as looking towards the target of speech and changing the voice to signal completion), and 
responding, with said machine interface based on determining that said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog to said human interlocutor (Fig. 2, end user’s turn 46 passing control to synthesized actor’s turn 44 when signaling completion; in view of Col 4, Rows 46-50, synthesized actor listens until the client or end user speaks or otherwise undertakes an initiating action; i.e., when end user signals completion, synthesized actor initiates an action / response according to conversation flow of Fig. 2). 
Happ does not disclose capturing the second intention indicator only after detecting the termination. 
Sugiyama discloses a machine interface / robot interacting with human interlocutor (Abstract) capturing a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor (p. 167, III, Estimating Response Obligation, “when a user asks a robot…it detects an input sound segment and estimates the response obligation to it, i.e., classifies the sound as ought-to-respond or ought-not-to-respond”), termination of said utterance from said human interlocutor (p. 167, “ought-to-respond” means end of user asking the robot and robot should respond), and capturing a second intention indicator based on a body movement of said human interlocutor only after detecting the termination (p. 167, “(c) The user’s motion and face direction after the sound segment: we use the user’s whole body motion after the sound segment to exploit typical user behaviors in human robot interaction…if a user stops moving after his/her utterance, the case will likely be ought-to-respond”) to determine whether said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog to cede control of dialog to human interlocutor (p. 167, “A. Overview of our proposed method…If it is classified as ought-to-respond, the robot responds to the speaker”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to capture the second intention indicator only after detecting the termination of utterance from human interlocutor corresponding to the first intention indicator in order to use human interlocutor’s acoustic information and motions / postures Sugiyama, Abstract).
Regarding Claim 3, Happ discloses wherein said second intention indicator comprises one or more of a determination of an orientation of a gaze of said human interlocutor (Col 4, Row 65, collect information corresponding to looking towards the target of the speech), a detection of a degree of physical proximity of said human interlocutor with respect to a focal point of said dialog, a detection of an orientation of the body of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head, facial, and body position movement), and/or a detection of an orientation of a specified body part of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head position and body position movements). 
Regarding Claim 4, Happ discloses wherein said determination of the orientation of the gaze of said human interlocutor comprises a determination that the gaze of said human interlocutor has reverted to a focal point of said dialog (Col 4, Row 65, collect information corresponding to looking towards the target of the speech). 
Regarding Claim 5, Happ discloses wherein said first intention indicator comprises one or more of an analysis of filler sound from said human interlocutor (Col 4, Row 66, gathering verbal behavior comprising changing the voice to signal completion), a detection of a pitch of sound from said human interlocutor, or a semantic component of said utterance. 
Regarding Claim 6, Happ discloses wherein said first intention indicator is based predominantly on said speech characteristic towards the termination of said utterance (Col 4, Rows 66, verbal behavior comprises changing the voice to signal completion). 
Regarding Claim 8, Sugiyama discloses wherein capturing said second intention indicator of said human interlocutor is performed for a predetermined duration (p. 168, “For (c), we obtained the feature set from interval (ek, ek + α). Here, a constant α denotes the duration to collect the user’s motion and face direction after the sound segment”).
Regarding Claim 9, Happ discloses determining that said first intention indicator and said second intention indicator are not together consistent with said human interlocutor ceding control of said dialog, reverting to said step of detecting the termination of an utterance from said human interlocutor (Col 4, Row 66 – Col 5, Row 4, synthesized actor gathers information consist of verbal and non-verbal behaviors corresponding to interrupting video actor (e.g., raising of eyebrows to signal speech onset), control is return back to the client or end user).  
Regarding Claim 10, Happ discloses a system for detecting a cession of a speaking turn by a human interlocutor during a dialog with the human interlocutor (Fig. 1, computer apparatus 2), said system comprising: 
an input receiving a representation of a communication channel bearing an utterance from said human interlocutor (Fig. 1 and see Col 3, Rows 63-64, audio input interface is generated by microphone 24 and sound card 8); 
an output for conveying a representation of a communication channel (Fig 1 and see Col 3, Rows 57-64, output interface comprising speaker 22 and graphical user interface / video display screen 10);
Fig. 1 and see Col 3, Rows 57-61 and Col 4, Rows 1-10, processor 30 for implementing a synthesized environment showing a synthesized actor with text to speech capability; Col 4, Rows 35-45 and Fig 2, implementing conversation flow chart showing client’s or end user’s turn 46 passing control to synthesized actor’s turn 44) by:
 capturing a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor (Col 3, Rows 57-64, implementing an audio input interface comprising microphone 24 and sound card 8); 
detecting a termination of said utterance from said human interlocutor (Col 3, Rows 57-61 and Col 4, Rows 8-12 and Rows 51-53, processor 30 implements a synthesized actor that controls the conversation and gather information; Col 4, Rows 60-66, gathering verbal information such as end user / client’s speech and changing the voice to signal completion), 
capturing, at the time of detecting the termination, a second intention indicator based on a body movement of said human interlocutor (Col 4, Rows 58-66, synthesized actor / processor 30 gathers information consist of verbal behavior and non-verbal behavior such as head, eye, facial, hand and body position movement to cue start, continuation, or end of statement / question where non-verbal behavior can be combined with verbal behavior; in view of Col 4, Rows 35-45 and Fig 2, implementing conversation flow chart showing client’s or end user’s turn 46 passing control to synthesized actor’s turn 44); 
determining whether said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog (Col 4, Rows 62-66, non-verbal behavior can be combined with verbal behavior such as looking towards the target of speech and changing the voice to signal completion), and 
responding, based on determining that said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog to said human interlocutor (Fig. 2, end user’s turn 46 passing control to synthesized actor’s turn 44 when signaling completion; in view of Col 4, Rows 46-50, synthesized actor listens until the client or end user speaks or otherwise undertakes an initiating action; i.e., when end user signals completion, synthesized actor initiates an action / response according to conversation flow of Fig. 2). 
Happ does not disclose capturing the second intention indicator only after detecting the termination. 
Sugiyama discloses a machine interface / robot interacting with human interlocutor (Abstract) capturing a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor (p. 167, III, Estimating Response Obligation, “when a user asks a robot…it detects an input sound segment and estimates the response obligation to it, i.e., classifies the sound as ought-to-respond or ought-not-to-respond”), termination of said utterance from said human interlocutor (p. 167, “ought-to-respond” means end of user asking the robot and robot should respond), and capturing a second intention indicator based on a body movement of said human interlocutor only after detecting the termination (p. 167, “(c) The user’s motion and face direction after the sound segment: we use the user’s whole body motion after the sound segment to exploit typical user behaviors in human robot interaction…if a user stops moving after his/her utterance, the case will likely be ought-to-respond”) to p. 167, “A. Overview of our proposed method…If it is classified as ought-to-respond, the robot responds to the speaker”).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to capture the second intention indicator only after detecting the termination of utterance from human interlocutor corresponding to the first intention indicator in order to use human interlocutor’s acoustic information and motions / postures during sound segment as features in order to estimate a response obligation; i.e., whether an input sound should be responded to by the robot or not (Sugiyama, Abstract).
Regarding Claim 11, Happ discloses wherein said system comprises a focal point perceivable by said interlocutor, and a detector capable of determining an aspect of said body movement relative said focal point as said second intention indicator (Fig. 1, Col 4, Rows 3-6, video equipment / environment 16 is for pictures of real people in a real setting; Col 4, Rows 51-53, synthesized actor / processor 30 gathers information; e.g., using video equipment 16 to gather non verbal behavior such as end user / client looking towards the target of the speech and corresponding head, eye, facial, hand, and body position movement per Col 4, Rows 60-66). 
Regarding Claim 13, Happ discloses wherein said first intention indicator comprises one or more of an analysis of filler sound from said human interlocutor (Col 4, Row 66, gathering verbal behavior comprising changing the voice to signal completion), a detection of a pitch of sound from said human interlocutor, or a semantic component of said utterance. 
Regarding Claim 14, Happ discloses a computer program comprising non-transitor computer readable medium having instructions adapted to implement the method of claim 1 (Col 4, Rows 20-27, implementing a computer apparatus and speech recognition software for programming such computer apparatus; e.g., col 1, Rows 59-62, IBM Human Center).
Regarding Claims 15-16, Sugiyama discloses wherein said capturing the second intention indicator is performed in a predetermined time window after detecting the termination (p. 168, “C. Formulation”, “Response obligation is estimated for each input sound segment k, whose start and end times are denoted as sk and ek…For (c), we obtained the feature set from interval (ek, ek + α). Here, a constant α denotes the duration to collect the user’s motion and face direction after the sound segment”).  
Claim 2 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) and Sugiyama et al. (“Estimating Response Obligation in Multi-Party Human-Robot Dialogues”) as applied to claim 1, in further view of Strubbe et al. (US 6721706 B1).
Regarding Claim 2, Happ does not disclose capturing, with said machine interface based on detecting the termination, a third intention indicator based on a second speech characteristic of said human interlocutor.
Strubbe teaches a conversation simulator as a suitable companion to a user (Col 10, Rows 1-10) to determine whether a first intention indicator based on a first speech characteristic of the user (Col 10, Rows 40-42 and Rows 50-55, looking for cues of the particular user indicating the end of his/her response has been reached by feeding machine learning process inputs such as a first one of loudness pattern, pitch pattern, or specific words like “well…?” indicating that a particular user is growing impatient waiting for the conversation simulator to respond) and a second intention indicator based on a body movement of the user taken together are consistent with the user ceding control of a dialog (Col 10, Rows 59-67, feeding visual cues like user’s head orientation and eyes orientation) by determining whether the first intention indicator, the second intention indicator, and a third intention based on a second speech characteristic of the user taken together are consistent with the user ceding control of the dialog (Col 10, Rows 40-42 and Rows 50-55, looking for cues of the particular user indicating the end of his/her response has been reached by feeding machine learning process inputs such as a second one of loudness pattern, pitch pattern, or specific words like “well…?” indicating that a particular user is growing impatient waiting for the conversation simulator to respond).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to use multiple speech characteristics as intention indicators in addition to an intention indicator based on body movement of said human interlocutor in order to give the conversation simulator / synthesized actor a more reliable indicator of when it should speak (Strubbe, Col 10, Rows 45-47).
Claim 7 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) and Sugiyama et al. (“Estimating Response Obligation in Multi-Party Human-Robot Dialogues”) as applied to claim 1, in further view of Sharifi (US 8843369 B1).
Regarding Claim 7, Happ does not disclose wherein determining the termination of the utterance is based on a duration of a pause in the utterance being detected to have exceeded a predetermined threshold duration. 
Sharifi teaches a computing device to determine that an utterance is terminated based on a duration of a pause in the utterance being detected to have exceeded a predetermined Col 4, Rows 19-36, computing device 124 identifies ending points in utterances based on durations of pauses between utterances where if a duration of a pause satisfies a threshold, then the computing device identifies the beginning of the pause as an ending point; e.g., threshold may be two seconds and any pause greater than two seconds will signal the end).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine that an utterance terminates in a case where the duration of a pause in the utterance exceeded a predetermined threshold duration in order to identify ending points in the utterance (Sharifi, Col 4, Rows 19-20; Happ, Col 4, Rows 65-66, to signal completion).
Claim 12 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) and Sugiyama et al. (“Estimating Response Obligation in Multi-Party Human-Robot Dialogues”) as applied to claim 1, in further view of Luo (US 2020/0061822 A1).
Regarding Claim 12, Happ discloses wherein said second intention indicator comprises one or more of a determination of the orientation of the gaze of said human interlocutor (Col 4, Rows 65-66, gather information / non-verbal behavior such as looking towards the target of speech), a detection of a degree of physical proximity of said human interlocutor with respect to a focal point of said dialog, a detection of an orientation of the body of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head, facial, and body position movement within the context of determining if non-verbal behavior includes looking towards target of speech (i.e., determining that the head, facial, or body position orientates toward the target)), and/or a detection of an orientation of a specified body part of said Col 4, Rows 60-61, collecting information about head position and body position movements within the context of determining if non-verbal behavior includes looking towards target of speech).
Happ does not teach said system further comprises a video input transducer and a gaze tracker adapted to determine the orientation of the gaze of said human interlocutor. 
Luo teaches a robotic system comprising a video input transducer (¶20, a plurality of cameras functioning as eyes of the robot) and a gaze tracker adapted to determine an orientation of a gaze of a human interlocutor having a conversation with the robot (¶80-82, a capturing and calculating module 402 configured to capture user’s gaze direction; e.g., ¶79, when robot has a voice or video communication with the user, robot judges that the user is chatting with the robot when the robot captures that the sight line of the user falls on the head of the robot).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement at least one camera as video input transducer and a gaze tracker to determine an orientation of a gaze of a human interlocutor in order to determine or judge whether the end user / client interlocutor is looking towards a target of speech when signaling completion (Happ, Col 4, Rows 65-66; Luo, ¶79).
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Y. Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        03/07/2022