Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103 is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Acknowledgement of Priority
Acknowledgement is made of applicant’s claim for domestic priority based on PCT Application PCT / EP2018 / 081442 filed on 11/15/2018 and foreign priority based on EPO 17306593.9 filed on 11/16/2017. Certified copy of the foreign priority document has been received. 
Claim Rejections - 35 USC § 101
35 U.S.C. §101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim 14 is rejected under 35 USC 101 for reciting subject matter not within the statutory category of process, machine, manufacture, or composition of matter. 
Products that do not have a physical or tangible form, such as information (often referred to as "data per se") or a computer program per se (often referred to as "software per se") when claimed as a product without any structural recitations are not within the statutory category of 35 USC 101. See MPEP 2106.03. 

Claim Rejections - 35 USC § 102	
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
(a) NOVELTY; PRIOR ART.—A person shall be entitled to a patent unless— 
(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention; or 
(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. 

    PNG
    media_image1.png
    18
    19
    media_image1.png
    Greyscale
(b) EXCEPTIONS.— 
(1) DISCLOSURES MADE 1 YEAR OR LESS BEFORE THE EFFECTIVE FILING DATE OF THE CLAIMED INVENTION.—A disclosure made 1 year or less before the effective filing date of a claimed invention shall not be prior art to the claimed invention under subsection (a)(1) if— 
(A) the disclosure was made by the inventor or joint inventor or by another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or 
(B) the subject matter disclosed had, before such disclosure, been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor. 
(2) DISCLOSURES APPEARING IN APPLICATIONS AND PATENTS.—A disclosure shall not be prior art to a claimed invention under subsection (a)(2) if— 
(A) the subject matter disclosed was obtained directly or indirectly from the inventor or a joint inventor;
(B) the subject matter disclosed had, before such subject matter was effectively filed under subsection (a)(2), been publicly disclosed by the inventor or a joint inventor or another who obtained the subject matter disclosed directly or indirectly from the inventor or a joint inventor; or
(C) the subject matter disclosed and the claimed invention, not later than the effective filing date of the claimed invention, were owned by the same person or subject to an obligation of assignment to the same person.

Claims 1, 3-6, 9-11, and 13-14 are rejected under 35 USC 102(a)(1)-(a)(2) as being anticipated by Happ (US 6199043 B1).
Regarding Claim 1, Happ discloses a method of detecting the cession of speaking turn by a human interlocutor in a dialog with a machine interface (Figs. 1-2), said method comprising: 
said machine interface capturing a first intention indicator based on a first speech characteristic of said human interlocutor during an utterance from said human interlocutor (Col 3, Rows 57-64, implementing an audio input interface comprising microphone 24 and sound card 8), said machine interface detecting the termination of said utterance from said human interlocutor (Col 3, Rows 57-61 and Col 4, Rows 8-12 and Rows 51-53, processor 30 implements a synthesized actor that controls the conversation and gather information; Col 4, Rows 60-66, gathering verbal information such as end user / client’s speech and changing the voice to signal completion), 
when the termination of an utterance from said human interlocutor is determined, said machine interface capturing a second intention indicator based on a body movement of said human interlocutor (Col 4, Rows 50-62, synthesized actor / processor 30 gathers information consist of verbal behavior and non-verbal behavior such as head, eye, facial, hand and body position movement; in view of Col 4, Rows 35-45 and Fig 2, implementing conversation flow chart showing client’s or end user’s turn 46 passing control to synthesized actor’s turn 44), 
and said machine interface determining whether said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog (Col 4, Rows 62-66, non-verbal behavior can be combined with verbal behavior such as looking towards the target of speech and changing the voice to signal completion), and 
when it is determined that said first intention indicator and said second intention indicator taken together are consistent with said human interlocutor ceding control of said dialog, said machine interface responding to said human interlocutor (Fig. 2, end user’s turn 46 passing control to synthesized actor’s turn 44 when signaling completion; in view of Col 4, Rows 46-50, synthesized actor listens until the client or end user speaks or otherwise undertakes an initiating action; i.e., when end user signals completion, synthesized actor initiates an action / response according to conversation flow of Fig. 2). 
Regarding Claim 3, Happ discloses wherein said second intention indicator comprises one or more of a determination of the orientation of the gaze of said human interlocutor (Col 4, Row 65, collect information corresponding to looking towards the target of the speech), a detection of a degree of physical proximity of said human interlocutor with respect to a focal point of said dialog, a detection of an orientation of the body of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head, facial, and body position movement), a detection of an orientation of a specified body part of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head position and body position movements). 
Regarding Claim 4, Happ discloses wherein said determination of the orientation of the gaze of said human interlocutor comprises a determination that the gaze of said human interlocutor has reverted to a focal point of said dialog (Col 4, Row 65, collect information corresponding to looking towards the target of the speech). 
Regarding Claim 5, Happ discloses wherein said first intention indicator or said third intention indicator comprises one or more of, an analysis of filler sound from said human interlocutor (Col 4, Row 66, gathering verbal behavior comprising changing the voice to signal completion), a detection of the pitch of sound from said human interlocutor, or a semantic component of said utterance. 
Regarding Claim 6, Happ discloses wherein said first intention indicator is based predominantly on said speech characteristic towards the termination of said utterance (Col 4, Rows 66, verbal behavior comprises changing the voice to signal completion). 
Regarding Claim 9, Happ discloses wherein when at said step of determining whether said first intention indicator and said second intention indicator are consistent with said human interlocutor ceding control of said dialog, it is determined that said first intention indicator and said second intention indicator are not together consistent with said human interlocutor ceding control of said dialog, said method reverts to said step of detecting the termination of an utterance from said human interlocutor (Col 4, Row 66 – Col 5, Row 4, synthesized actor gathers information consist of verbal and non-verbal behaviors corresponding to interrupting video actor (e.g., raising of eyebrows to signal speech onset), control is return back to the client or end user).  
Regarding Claim 10, Happ discloses a system for detecting the cession of speaking turn by a human interlocutor a dialog with a human interlocutor (Fig. 1), said system comprising: 
an input receiving a representation of a communication channel bearing an utterance from said human interlocutor (Fig. 1 and see Col 3, Rows 63-64, audio input interface is generated by microphone 24 and sound card 8), an output for conveying a representation Fig 1 and see Col 3, Rows 57-64, output interface comprising speaker 22 and graphical user interface / video display screen 10),
 a processor adapted to process said representation to detect the termination of said utterance (Fig. 1 and see Col 3, Rows 57-61 and Col 4, Rows 1-10, processor 30 for implementing a synthesized environment showing a synthesized actor with text to speech capability; Col 4, Rows 35-45 and Fig 2, implementing conversation flow chart showing client’s or end user’s turn 46 passing control to synthesized actor’s turn 44), 
said processor being further adapted in a case where the termination of an utterance from said human interlocutor is determined (Col 4, Rows 50-53, the synthesized actor / processor controls the conversation and gathers information; i.e., Fig. 2, control the passing of control between end user / client’s turn 46 and synthesized actor’s turn 44), to capture a first intention indicator based on a first speech characteristic of said human interlocutor and a second intention indicator based on a body movement of said interlocutor (Col 4, Rows 55-67, gathering information consists of verbal (e.g., speech) and non-verbal (e.g., head, eye, facial, hand and body positive movement) behaviors used to pass control to the next party in the conversation), and 
determine whether said one or more intention indicators are consistent with said human interlocutor ceding control of said dialog (Col 4, Rows 62-67, non-verbal behavior can be combined with verbal behavior (e.g., looking towards the target of speech and changing the voice) to signal completion), and in a case where it is determined that said one or more intention indicators are consistent with said human interlocutor ceding control of said dialog, initiating a response to said human interlocutor (Fig. 2, end user’s turn 46 passing control to synthesized actor’s turn 44 when signaling completion; in view of Col 4, Rows 46-50, synthesized actor listens until the client or end user speaks or otherwise undertakes an initiating action; i.e., when end user signals completion, synthesized actor initiates an action / response according to conversation flow of Fig. 2).
Regarding Claim 11, Happ discloses wherein said system comprises a focal point perceivable by said interlocutor, and a detector capable of determining an aspect of said interlocutor's body movement relative said focal point as said second intention indicator (Col 4, Rows 51-53, synthesized actor / processor 30 gathers information; e.g., an inherent detector to gather non verbal behavior such as end user / client looking towards the target of the speech). 
Regarding Claim 13, Happ discloses wherein said first intention indicator or said third intention indicator comprises one or more of, an analysis of filler sound from said human interlocutor (Col 4, Row 66, gathering verbal behavior comprising changing the voice to signal completion), a detection of the pitch of sound from said human interlocutor, or a semantic component of said utterance. 
Regarding Claim 14, Happ discloses a computer program comprising instructions adapted to implement the steps of claim 1 (Col 4, Rows 20-27, implementing a computer apparatus and speech recognition software for programming such computer apparatus; e.g., col 1, Rows 59-62, IBM Human Center). 
Claim Rejections - 35 USC § 103
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 103 that form the basis for the rejections under this section made in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) in view of Strubbe et al. (US 6721706 B1).
Regarding Claim 2, Happ does not disclose an additional step of , when the termination of an utterance from said human interlocutor is determined, capturing a third intention indicator based on a second speech characteristic of said human interlocutor.
Strubbe teaches a conversation simulator as a suitable companion to a user (Col 10, Rows 1-10) to determine whether a first intention indicator based on a first speech characteristic of the user (Col 10, Rows 40-42 and Rows 50-55, looking for cues of the particular user indicating the end of his/her response has been reached by feeding machine learning process inputs such as a first one of loudness pattern, pitch pattern, or specific words like “well…?” indicating that a particular user is growing impatient waiting for the conversation simulator to respond) and a second intention indicator based on a body movement of the user taken together are consistent with the user ceding control of a dialog (Col 10, Rows 59-67, feeding visual cues like user’s head orientation and eyes orientation) by determining whether the first intention indicator, the second intention indicator, and a third intention based on a second speech characteristic of the user taken together are consistent with the user ceding control of the dialog (Col 10, Rows 40-42 and Rows 50-55, looking for cues of the particular user indicating the end of his/her response has been reached by feeding machine learning process inputs such as a second one of loudness pattern, pitch pattern, or specific words like “well…?” indicating that a particular user is growing impatient waiting for the conversation simulator to respond).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to use multiple speech characteristics as intention indicators in addition Strubbe, Col 10, Rows 45-47).
Claim 7 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) in view of Sharifi (US 8843369 B1).
Regarding Claim 7, Happ does not disclose wherein an utterance is determined to terminate only in a case where the duration of a pause in the utterance is detected to have exceeded a predetermined threshold duration. 
Sharifi teaches a computing device to determine that an utterance is terminated only in a case where a duration of a pause in the utterance is detected to have exceeded a predetermined threshold duration (Col 4, Rows 19-36, computing device 124 identifies ending points in utterances based on durations of pauses between utterances where if a duration of a pause satisfies a threshold, then the computing device identifies the beginning of the pause as an ending point; e.g., threshold may be two seconds and any pause greater than two seconds will signal the end).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to determine that an utterance terminates in a case where the duration of a pause in the utterance exceeded a predetermined threshold duration in order to identify ending points in the utterance (Sharifi, Col 4, Rows 19-20; Happ, Col 4, Rows 65-66, to signal completion).
Claim 8 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) in view of Beaumont et al. (US 2015/0088515 A1).
Regarding Claim 8, Happ does not disclose wherein said step of capturing said second intention indicator of said human interlocutor, is performed for a predetermined duration.
Beaumont teaches capturing a visual feature of a human interlocutor for a predetermined duration (¶32, video data 330 containing a pattern of visual features associated with a primary speaker’s speech may contain a time stamp which may be matched with a time stamp of corresponding audio data 320; ¶38, visual feature in video data should reveal that their mouth is moving, lips are moving, etc.; ¶39, matching audio data having a human speaker recognized along with video data containing visual features associated with speech to identify a speak in a situation where two or more human speakers take turns talking).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to capture second intention indicator of said human interlocutor (e.g., visual features corresponding to user moving head, eye, facial (i.e., mouth and lip are moving), hand, and body position movement) for a predetermined duration defined by timestamps matching corresponding audio feature timestamps as taught by Beaumont in order to identify a speaker who is currently speaking in a situation where two or more human speakers take turns talking (Beaumont, ¶39; Happ, Fig. 2, block 42 video actor’s turn and block 46 for client’s / end user’s turn).
Claim 12 is rejected under 35 USC 103(a) as being unpatentable over Happ (US 6199043 B1) in view of Luo (US 2020/0061822 A1).
Regarding Claim 12, Happ discloses wherein said second intention indicator comprises one or more of a determination of the orientation of the gaze of said human Col 4, Rows 65-66, gather information / non-verbal behavior such as looking towards the target of speech), a detection of a degree of physical proximity of said human interlocutor with respect to a focal point of said dialog, a detection of an orientation of the body of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head, facial, and body position movement within the context of determining if non-verbal behavior includes looking towards target of speech (i.e., determining that the head, facial, or body position orientates toward the target)), a detection of an orientation of a specified body part of said human interlocutor with respect to a focal point of said dialog (Col 4, Rows 60-61, collecting information about head position and body position movements within the context of determining if non-verbal behavior includes looking towards target of speech).
Happ does not teach said system further comprises a video input transducer and a gaze tracker adapted to determine the orientation of the gaze of said human interlocutor. 
Luo teaches a robotic system comprising a video input transducer (¶20, a plurality of cameras functioning as eyes of the robot) and a gaze tracker adapted to determine an orientation of a gaze of a human interlocutor having a conversation with the robot (¶80-82, a capturing and calculating module 402 configured to capture user’s gaze direction; e.g., ¶79, when robot has a voice or video communication with the user, robot judges that the user is chatting with the robot when the robot captures that the sight line of the user falls on the head of the robot).
It would’ve been obvious to one ordinarily skilled in the art before the effective filing date of the invention to implement at least one camera as video input transducer and a gaze tracker to determine an orientation of a gaze of a human interlocutor in order to determine or Happ, Col 4, Rows 65-66; Luo, ¶79).
Conclusion
Prior art made of record and not relied upon is considered pertinent to applicant's disclosure: 
US 2017/0310928 A1 discloses a conversation communication system determining a facial image of the communicator, a distance between the communicator and a display, and a direction of the face of the communicator in order to determine whether to continue conversation communication.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner Richard Z. Zhu whose telephone number is 571-270-1587 or examiner’s supervisor King Poon whose telephone number is 571-272-7440. Examiner Richard Zhu can normally be reached on M-Th, 0730:1700.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHARD Z ZHU/Primary Examiner, Art Unit 2675                                                                                                                                                                                                        09/10/2021