Detailed Action
Notice of Pre-AIA  or AIA  Status
	
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/27/2021 has been entered.


Response to Arguments

	Applicant’s amendments to the claims have overcome some of the objections and rejections previously set forth in the Non-Final Office Action mailed June 22nd, 2020. Applicant’s amendments to claims 1, 8, 10, 13, 14, 15, and 17 as described on pages 12-18 have been deemed sufficient to overcome the previous objections, 35 USC § 112(b), and 35 USC § 103 art rejections through the addition of the “the first action including the apparatus requesting the target person to execute a predetermined task, when the processor decides, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, again causing the apparatus to execute the first action;...” as supported by the specification paragraph [0084]. However, as the change the scope of the claim, new rejections have been changed the scope of the previously rejected claims, new art rejections for claims 1, 15, and 17 have been added below. All other rejections under 35 USC § 103 have additionally been maintained and are included below with minor changes to reflect minor amendments.

Claim Objections

	Claim 17 objected to because of the following informalities:  Claim 17 recites “the robot” in line 3 and instead should recite “a robot” during the first instance.  Appropriate correction is required.

Claim Rejections - 35 USC § 103

	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective 

	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

	Claims 1, 8-9, 12-15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Sumida et al. (US Pre-Granted Publication No. US 2009/0149991 A1 hereinafter “Sumida”) in view of Howard et al. (US Pre-Granted Publication No. US 2015/0120043 A1 hereinafter “Howard”) even further in view of Sanchez et al. (US Pre-Granted Publication No. US 2012/0185090 A1 hereinafter “Sanchez”) further in view of Roman et al. (US Pre-Granted Publication No. US 2018/0302285 A1 hereinafter “Roman”).

	Regarding claim 1 Sumida discloses:

	An apparatus that communicates with a target person by executing a predetermined action, the apparatus comprising: (“A robot has been conventionally known which determines its behavior based on an external situation and its internal status in order to enhance the amusement property for a pet robot (e.g. refer to Unexamined Japanese Patent Application Publication No. 2004-283958, paragraphs 0136-0148, FIG. 18). A robot disclosed in Unexamined Japanese Patent Application Publication No. 2004-283958 has a plurality of behavior describing modules (schemes) in which actions having objectives, such as "exploring", "eating", and "playing" are described. Each scheme calculates, by using a predetermined function, an activation level AL of an action, which is the degree of execution priority of the scheme, in response to a change in the internal status of the robot or an external stimulus. Generally, the robot selects a scheme that has the highest AL, and expresses an action that corresponds to the scheme.” Sumida [0005] lines 1-16) a camera that captures an image around the apparatus; a microphone that acquires sounds around the apparatus; a processor; a speaker; (“The cameras (also referred to as a "vision sensor") C, C, capture digital data on images in the proceeding direction ahead of the robot R in digital data, and a color CCD (Charge-coupled Device) may be used as the cameras C, C. The cameras C, C are disposed on the right and left sides pair at the same height level, and output captured images to the image processor 10. The cameras C, C, the speaker S and the microphones MC, MC (audio input unit) are provided in the head R1. The speaker S (also referred to as an "audio output means") utters predetermined voices synthesized in the audio processor 20.” Sumida [0059] lines 1-12) and a driver that moves the apparatus; (“In the embodiment, the robot R is the autonomously-movable robot capable of two-leg walk, however, the present invention is not limited to this, and may be applied to an autonomously-movable robot that can move by its wheels. The autonomously-movable robot that can move by its wheels according to the present invention must have the same advantages as that of the robot R in the embodiment, except that its movable parts that corresponds to "legs" of the robot R are "wheels".” Sumida [0152] lines 1-9) wherein the processor: causes the apparatus to execute a third action (“In FIG. 1, the robot control system A includes plural robots Ra, Rb, Rc (hereinafter referred to simply as the "robot R" unless otherwise stated), and each robot R executes a task in accordance with an execution plan of the task (task schedule) that is predefined for each robot R through the management computer 3.” Sumida [0034] lines 1-6) as an initial action, the initial action being executed for communication with the target person (“When the priority comparing means 131 determines that the degree of priority of the detected rule is higher than that of the rule being executed, the parameter comparing means 132 compares an initial value of the degree of interest (action inducing parameter) contained in the detected rule and the present value of the degree of interest (action inducing parameter) set in the rule being executed. In the embodiment, the parameter comparing means 132 determines whether or not the degree of interest (initial value) of the selected rule is greater than the degree of interest (present value) of the rule being executed.” Sumida [0127] lines 1-11) according to at least one of the captured image or an acquired sound, (Sumida Fig. 9 rule contents column wherein the robot hears a sound and faces the source or a nearby person) a second action being one-level higher than the third action, a first action being one-level higher than the second action; (“In the aforementioned communication robot, the rule database includes a priority level in each of the plurality of rules, and the action inducing parameter setting unit includes a priority level comparing unit for comparing a priority level contained in the detected rule and a priority level contained in a rule being executed, a parameter comparing unit for comparing the initial value of the action inducing parameter contained in the detected rule and a present value of the action inducing parameter contained in the rule being executed when the priority level contained in the detected rule is greater than the priority level of the rule being executed, a rule changing unit for setting the initial value of the action inducing parameter contained in the detected rule in the situation database when the initial value of the action inducing parameter contained in the detected rule is equal to or greater than the present value of the action inducing parameter of the rule being executed.” Sumida [0013] lines 1-17 wherein the office interprets the ability to set a priority of tasks to be equivalent to setting a first task a step above a second, a second above a third, and so on.) … the first action includes the apparatus executing a predetermined task, (“A robot has been conventionally known which determines its behavior based on an external situation and its internal status in order to enhance the amusement property for a pet robot (e.g. refer to Unexamined Japanese Patent Application Publication No. 2004-283958, paragraphs 0136-0148, FIG. 18). A robot disclosed in Unexamined Japanese Patent Application Publication No. 2004-283958 has a plurality of behavior describing modules (schemes) in which actions having objectives, such as "exploring", "eating", and "playing" are described. Each scheme calculates, by using a predetermined function, an activation level AL of an action, which is the degree of execution priority of the scheme, in response to a change in the internal status of the robot or an external stimulus. Generally, the robot selects a scheme that has the highest AL, and expresses an action that corresponds to the scheme.” Sumida [0005] lines 1-16) the second action includes the speaker outputting a voice that talks to the target person, (“The human handling module can greet or talk about the weather in accordance with the situation or a person the robot R is talking to, regardless of whether or not a task is executed.” Sumida [0096] lines 1-4) …

	Sumida does not appear to disclose:
	
	when a first sound is acquired by the microphone after an execution of the initial action, causes the apparatus to execute the second action one-level higher than the third action; when a second sound is acquired by the microphone after an execution of the second action, causes the apparatus to execute the first action one-level higher than the second action, the first action including the apparatus requesting the target person to execute a predetermined task; when the processor decides, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, again causing the apparatus to execute the first action; when no sound is acquired by the microphone after the execution of the second action, determines whether a time elapsed from the execution of the second action is shorter than a threshold; when the time is shorter than the threshold, causes the apparatus to continue the second action; and when the time is equal to or longer than the threshold, causes the apparatus to execute the third action one-level lower than the second action, or and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person.  

	However, in the same field of endeavor of robotic controls Sanchez discloses:

when a first sound is acquired by the microphone after an execution of the initial action, (“If the robot is engaged by a user when in the user-directed state 243, the robot outputs information (e.g., a dialog) indicating that it is busy on a task. The dialog forces the user to either dismiss the dialog (thus canceling the engagement request) and allow the robot to proceed with its task, or cancel (or pause) the current user-directed task or application so the user can continue engagement.” Sanchez [0040] lines 1-7, fig. 3, wherein the robot starts in an autonomous condition that involves doing its own task until engaging with a user)  causes the apparatus to execute the second action one-level higher than the third action; (“As described below, the robot has a clear priority order of states as shown in the above table. A higher priority state can interrupt a lower priority state, and engagement (transitioning to the engaged state) can interrupt any self-directed or user-directed task.” Sanchez [0029] lines 1-5) when a second sound is acquired by the microphone after an execution of the second action, causes the apparatus to execute the first action one-level higher than the second action; (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes it’s task based on the voice command i.e. escalates to a new engagement level, see also [0040-0043])  when no sound is acquired by the microphone after the execution of the second action, determines whether a time elapsed from the execution of the second action is shorter than a threshold; when the time is shorter than the threshold, causes the apparatus to continue the second action; (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes it’s task based on the voice command i.e. escalates to a new engagement level, see also [0040-0043]) and when the time is equal to or longer than the threshold, causes the apparatus to execute the third action one-level lower than the second action,” (“Turning to another aspect, when the user's face is lost to the robot, the assumption is that the user has exited the robot's viewing area (frame) and will return shortly if the user wishes to continue engagement. However, the user may have inadvertently exited the frame but still want to interact. Note that the robot has tracking behaviors when engaged, and thus this may only happen when the user goes outside the interaction range, or moves too fast for the robot to track. Thus, a time duration (e.g., three seconds) is used before considering the user as having exited, e.g., leaving the camera in the same position to help the user return, after which the robot returns to its standard pre-engagement state. A similar situation is when the user has walked away from the robot and does not want to interact anymore; in this case the robot resumes its tasks, e.g., after a few seconds” Sanchez [0054] lines 1-15 wherein after a time after engagement with no further engagement the robot returns to the previous task).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the sound operated tasks of Sanchez because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user and complete tasks as commanded based on a hierarchy in order to correctly complete multiple tasks (Sanchez [0035] and [0044]).

	Additionally, Sumida in view of Sanchez do not appear to disclose:

	the first action including the apparatus requesting the target person to execute a predetermined task; when the processor decides, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, again causing the apparatus to execute the first action; or and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person

	However, in the same field of endeavor of robotic controls Howard discloses:

	 “and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person.” (“Referring again to FIG. 3, the robot 120 may determine if a task is presented to the robot 120 at block 320. In some embodiments, the task may correspond to an action of the user 140 relating to the electronic device 160. For example, in some embodiments, the task may be for the robot 120 to perform the same and/or a similar action to the action performed by the user 140 on the electronic device 160 (e.g., playing the game). If no task is presented to the robot 120, then the robot 120 may continue to receive task-solution pairs at block 300 and/or store task-solution pairs at block 310.” Howard [0086] lines 1-10). 

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the time determination and mimicking of Howard because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user in a social setting with greater control, such as by displaying emotions in a normal conversation cadence, and through following a user during tasks (Howard [0003], [0045], and [0086]).

	Additionally, Sumida in view of Sanchez and Howard do not appear to disclose:

	the first action including the apparatus requesting the target person to execute a predetermined task; when the processor decides, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, again causing the apparatus to execute the first action;

	However, in the same field of endeavor of robotic controls Roman discloses:

the first action including the apparatus requesting the target person to execute a predetermined task; when the processor decides, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, again causing the apparatus to execute the first action;” (“The assistant device can detect movements via one or more cameras. The movements can include body motion, face, and/or hand movements. The assistant device can further interpret the movements using a gesture recognition algorithm. The gesture recognition algorithm can include one or more of a skeletal-based algorithm, an appearance-based algorithm, and/or a 3D model-based algorithm. This feature allows for the user to interact with the assistant device by gestures. In at least one embodiment, the assistant device uses the combination of audio and video input to derive the user's communication, response and/or mood. For example, if the assistant device instructs the user to perform an action as part of the setup process, but the user throws her hands up in frustration, then this can be recognized as a gesture using video input. This gesture can be used to determine that the user is in a flustered mood. As another example, if the audio input includes words corresponding to frustration (e.g., "swear" words), then this can be recognized. If these situations are recognized, the assistant device can then repeat the instructions for the user to perform, slow down the verbal speech it provides using its speaker, etc. In at least one embodiment, user profiles can be built. The user profiles can include information about a user behavior. The user profile can include the user's typical movements. For example, a user profile can store that a user uses her hands to talk. The assistant device can use this profile information to eliminate false positives of frustration.” Roman [0053] lines 1-27).



	Regarding claim 8 Sumida in view of Sanchez and Howard and Roman discloses all of the limitations of claim 1 and further discloses:

	The apparatus according to Claim 1, wherein, the processor causes the apparatus to execute the third action as the initial action when an image of the target person is recognized from the captured image and a voice of the target person is not recognized from the acquired sounds, (“When executing a guide task, for example, the robot R guides a person H in a predetermined guide area (e.g. movement area such as an office or a hallway). In this example, the robot R irradiates light (e.g. infrared ray, ultraviolet ray, laser beam) and radio waves toward a circumference of the robot R, thereby to detect the person H wearing a tag T in the circumferential region, identify a position of the detected person H and approach to him or her so that the robot R executes a personal identification to find who the person H is, based on the tag H. This tag T receives infrared ray and radio waves transmitted from the robot R for the sake of identifying the position (distance and orientation) of the person H. Based on signals indicating a light-receiving orientation included in the infrared ray and the robot ID included in the received radio waves, the tag T generates a receiving report signal that includes the tag ID number, and sends this receiving report signal back to the robot R. When receiving the receiving report signal, the robot R recognizes the distance and orientation to the person H wearing the tag T, so that the robot R can approach this person H.” Sumida [0039] lines 1-20) and when the processor recognizes the image of the target person from the captured image and also recognizes the voice of the target person from the acquired sounds, the processor causes the apparatus to execute the first action. (“The situation DB storage means 34 stores information on the present situation (situation DB). In the embodiment, the situation DB stores data indicating a situation which includes, as a surrounding situation, a process result of the image processor 10 which processes images acquired by the cameras C, C, a process result of a voice recognition unit 21b which recognizes a voice input through the microphones MC, MC, and a recognition result of the tag T executed by the target detection unit 80. The information stored in the situation DB storage means 34 is used when selecting a rule stored in the rule DB storage means 33. Information on the selected rule is also written in the situation DB storage means 34. Specific example of the situation DB stored in the situation DB storage means 34 will be described later. The speech information storing means 35 stores information used for a speech of the robot R. The speech information storing means 35 stores communication information which is determined by scenarios corresponding to various behavior patterns. The communication information includes. for example, a fixed phrase for greeting "Hello, Mr. . . .” and a fixed phrase for confirmation "This is to be sent to Mr. . . . , right?". The speech information storing means 35 stores information of communication contents to be spoken during execution of a rule stored in the rule DB storage means 33. The information of communication contents to be spoken during execution of a rule includes, for example, a reply "Yes" and a fixed phrase indicating a time ". . . o'clock, . . . minutes". The information (communication data) is sent, for example, from the management computer 3.” Sumida [0088-0089] lines 1-29). 

	Regarding claim 9 Sumida in view of Sanchez and Howard and Roman discloses all of the limitations of claim 8 and further discloses:

	The apparatus according to Claim 8, wherein the first action includes the speaker outputting a second voice that indicates a start of communication with the target person.  (“The speech information storing means 35 stores information used for a speech of the robot R. The speech information storing means 35 stores communication information which is determined by scenarios corresponding to various behavior patterns. The communication information includes. for example, a fixed phrase for greeting "Hello, Mr. . . .” and a fixed phrase for confirmation "This is to be sent to Mr. . . . , right ?". The speech information storing means 35 stores information of communication contents to be spoken during execution of a rule stored in the rule DB storage means 33. The information of communication contents to be spoken during execution of a rule includes, for example, a reply "Yes" and a fixed phrase indicating a time ". . . o'clock, . . . minutes". The information (communication data) is sent, for example, from the management computer 3.” Sumida [0089] lines 1-15).

claim 12 Sumida in view of Sanchez and Howard and Roman disclose all of the limitations of claim 1 and further discloses:

	The apparatus according to Claim 1, wherein, when the processor recognizes the target person from the captured image and does not recognize a voice of the target person from the acquired sounds, the processor causes the apparatus to execute the third action as the initial action.  (“When executing a guide task, for example, the robot R guides a person H in a predetermined guide area (e.g. movement area such as an office or a hallway). In this example, the robot R irradiates light (e.g. infrared ray, ultraviolet ray, laser beam) and radio waves toward a circumference of the robot R, thereby to detect the person H wearing a tag T in the circumferential region, identify a position of the detected person H and approach to him or her so that the robot R executes a personal identification to find who the person H is, based on the tag H. This tag T receives infrared ray and radio waves transmitted from the robot R for the sake of identifying the position (distance and orientation) of the person H. Based on signals indicating a light-receiving orientation included in the infrared ray and the robot ID included in the received radio waves, the tag T generates a receiving report signal that includes the tag ID number, and sends this receiving report signal back to the robot R. When receiving the receiving report signal, the robot R recognizes the distance and orientation to the person H wearing the tag T, so that the robot R can approach this person H.” Sumida [0039] lines 1-20).

claim 13 Sumida in view of Howard and Sanchez and Roman discloses all of the limitations of claim 12 but does not appear to further disclose:
	wherein, when the processor recognizes, from the first captured image, that a head of the target person is inclined, the processor controls the driver to cause the apparatus to incline a top of the apparatus in a same direction and at a same angle as an inclination of the head as the third action.
	However, in the same field of endeavor of robotic controls Howard discloses:
	“wherein, when the processor recognizes, from the first captured image, that a head of the target person is inclined, the processor controls the driver to cause the apparatus to incline a top of the apparatus in a same direction and at a same angle as an inclination of the head as the third action.” (“Referring again to FIG. 3, the robot 120 may determine if a task is presented to the robot 120 at block 320. In some embodiments, the task may correspond to an action of the user 140 relating to the electronic device 160. For example, in some embodiments, the task may be for the robot 120 to perform the same and/or a similar action to the action performed by the user 140 on the electronic device 160 (e.g., playing the game). If no task is presented to the robot 120, then the robot 120 may continue to receive task-solution pairs at block 300 and/or store task-solution pairs at block 310.” Howard [0086] lines 1-10 wherein the robot is able to control its head in a head nod or head shake manner which could be mimicking a user [0072]).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the mimicking of Howard because one of ordinary skill would have been motivated to make this modification in order to provide a means 

	Regarding claim 14 Sumida in view of Sanchez and Howard and Roman discloses all of the limitations of claim 12 but Sumida does not appear to disclose:

	wherein, when the processor recognizes, from the first captured image, an operation matching a rhythm of the target person, the processor controls the driver to cause the apparatus to move according to the rhythm as the third action. 

	However, in the same field of endeavor of robotic controls Howard discloses:
	
	“wherein, when the processor recognizes, from the first captured image, an operation matching a rhythm of the target person, the processor controls the driver to cause the apparatus to move according to the rhythm as the third action.” (“Referring again to FIG. 3, the robot 120 may determine if a task is presented to the robot 120 at block 320. In some embodiments, the task may correspond to an action of the user 140 relating to the electronic device 160. For example, in some embodiments, the task may be for the robot 120 to perform the same and/or a similar action to the action performed by the user 140 on the electronic device 160 (e.g., playing the game). If no task is presented to the robot 120, then the robot 120 may continue to receive task-solution pairs at block 300 and/or store task-solution pairs at block 310.” Howard [0086] lines 1-10 wherein a rhythm is interpreted to be equivalent to a pattern or movement). 

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the mimicking of Howard because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user in a social setting with greater control, such as by displaying emotions in a normal conversation cadence, and through following a user during tasks (Howard [0003], [0045], and [0086]).
	
	Regarding claim 15 Sumida discloses:

	A method performed by an apparatus that communicates with a target person by executing a predetermined action, the method comprising: (“A robot has been conventionally known which determines its behavior based on an external situation and its internal status in order to enhance the amusement property for a pet robot (e.g. refer to Unexamined Japanese Patent Application Publication No. 2004-283958, paragraphs 0136-0148, FIG. 18). A robot disclosed in Unexamined Japanese Patent Application Publication No. 2004-283958 has a plurality of behavior describing modules (schemes) in which actions having objectives, such as "exploring", "eating", and "playing" are described. Each scheme calculates, by using a predetermined function, an activation level AL of an action, which is the degree of execution priority of the scheme, in response to a change in the internal status of the robot or an external stimulus. Generally, the robot selects a scheme that has the highest AL, and expresses an action that corresponds to the scheme.” Sumida [0005] lines 1-16) capturing images around the apparatus by a camera; acquiring sounds around the apparatus by a microphone; (“The cameras (also referred to as a "vision sensor") C, C, capture digital data on images in the proceeding direction ahead of the robot R in digital data, and a color CCD (Charge-coupled Device) may be used as the cameras C, C. The cameras C, C are disposed on the right and left sides pair at the same height level, and output captured images to the image processor 10. The cameras C, C, the speaker S and the microphones MC, MC (audio input unit) are provided in the head R1. The speaker S (also referred to as an "audio output means") utters predetermined voices synthesized in the audio processor 20.” Sumida [0059] lines 1-12) causing the apparatus to execute a third action as an initial action, (“In FIG. 1, the robot control system A includes plural robots Ra, Rb, Rc (hereinafter referred to simply as the "robot R" unless otherwise stated), and each robot R executes a task in accordance with an execution plan of the task (task schedule) that is predefined for each robot R through the management computer 3.” Sumida [0034] lines 1-6) the initial action being executed for communication with the target person (“When the priority comparing means 131 determines that the degree of priority of the detected rule is higher than that of the rule being executed, the parameter comparing means 132 compares an initial value of the degree of interest (action inducing parameter) contained in the detected rule and the present value of the degree of interest (action inducing parameter) set in the rule being executed. In the embodiment, the parameter comparing means 132 determines whether or not the degree of interest (initial value) of the selected rule is greater than the degree of interest (present value) of the rule being executed.” Sumida [0127] lines 1-11) according to at least one of a first captured image or an acquired sound, (Sumida Fig. 9 rule contents column wherein the robot hears a sound and faces the source or a nearby person) a second action being one-level higher than the third action, a first action being one-level higher than the second action; (“In the aforementioned communication robot, the rule database includes a priority level in each of the plurality of rules, and the action inducing parameter setting unit includes a priority level comparing unit for comparing a priority level contained in the detected rule and a priority level contained in a rule being executed, a parameter comparing unit for comparing the initial value of the action inducing parameter contained in the detected rule and a present value of the action inducing parameter contained in the rule being executed when the priority level contained in the detected rule is greater than the priority level of the rule being executed, a rule changing unit for setting the initial value of the action inducing parameter contained in the detected rule in the situation database when the initial value of the action inducing parameter contained in the detected rule is equal to or greater than the present value of the action inducing parameter of the rule being executed.” Sumida [0013] lines 1-17 wherein the office interprets the ability to set a priority of tasks to be equivalent to setting a first task a step above a second, a second above a third, and so on) … the second action includes the speaker outputting a voice that talks to the target person, (“The human handling module can greet or talk about the weather in accordance with the situation or a person the robot R is talking to, regardless of whether or not a task is executed.” Sumida [0096] lines 1-4) …

	Sumida does not appear to disclose:

	causing, when a first sound is acquired by the microphone after an execution of a the initial action, the apparatus to execute the second action one-level higher than the third action; causing, when a second sound is acquired by the microphone after an execution of the second action, the apparatus to execute the first action one-level higher than the second action; the first action including the apparatus requesting the target person to execute a predetermined task; causing based on a second captured image and when the target person is not executing the predetermined task after the execution of the first action, the apparatus to again execute the first action; determining, when no sound is acquired by the microphone after the execution of the second action, whether a time elapsed from the execution of the second action is shorter than a threshold; causing, when the time is shorter than the threshold, the apparatus to continue the second action; and causing, when the time is equal to or longer than the threshold, the apparatus to execute the third action one-level lower than the second action; or and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person.  
	
	However, in the same field of endeavor of robotic controls Sanchez discloses:

	“causing, when a first sound is acquired by the microphone after an execution of a the initial action, (“If the robot is engaged by a user when in the user-directed state 243, the robot outputs information (e.g., a dialog) indicating that it is busy on a task. The dialog forces the user to either dismiss the dialog (thus canceling the engagement request) and allow the robot to proceed with its task, or cancel (or pause) the current user-directed task or application so the user can continue engagement.” Sanchez [0040] lines 1-7, fig. 3, wherein the robot starts in an autonomous condition that involves doing its own task until engaging with a user)  the apparatus to execute the second action one-level higher than the third action; (“As described below, the robot has a clear priority order of states as shown in the above table. A higher priority state can interrupt a lower priority state, and engagement (transitioning to the engaged state) can interrupt any self-directed or user-directed task.” Sanchez [0029] lines 1-5) causing, when a second sound is acquired by the microphone after an execution of the second action, the apparatus to execute the first action one-level higher than the second action; (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes it’s task based on the voice command i.e. escalates to a new engagement level, see also [0040-0043]) determining, when no sound is acquired by the microphone after the execution of the second action, whether a time elapsed from the execution of the second action is shorter than a threshold; (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes its task based on the voice command causing, when the time is shorter than the threshold, the apparatus to continue the second action; and causing, when the time is equal to or longer than the threshold, the apparatus to execute the third action one-level lower than the second action;” (“Turning to another aspect, when the user's face is lost to the robot, the assumption is that the user has exited the robot's viewing area (frame) and will return shortly if the user wishes to continue engagement. However, the user may have inadvertently exited the frame but still want to interact. Note that the robot has tracking behaviors when engaged, and thus this may only happen when the user goes outside the interaction range, or moves too fast for the robot to track. Thus, a time duration (e.g., three seconds) is used before considering the user as having exited, e.g., leaving the camera in the same position to help the user return, after which the robot returns to its standard pre-engagement state. A similar situation is when the user has walked away from the robot and does not want to interact anymore; in this case the robot resumes its tasks, e.g., after a few seconds” Sanchez [0054] lines 1-15 wherein after a time after engagement with no further engagement the robot returns to the previous task).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the sound operated tasks of Sanchez because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user and complete tasks as commanded based on a hierarchy in order to correctly complete multiple tasks (Sanchez [0035] and [0044]).

	Additionally, Sumida in view of Sanchez do not appear to disclose:

the first action including the apparatus requesting the target person to execute a predetermined task; causing based on a second captured image and when the target person is not executing the predetermined task after the execution of the first action, the apparatus to again execute the first action; or and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person 

	However, in the same field of endeavor of robotic controls Howard discloses:

	 “and the third action includes the driver causing the apparatus to move in synchronization with a motion of the target person.” (“Referring again to FIG. 3, the robot 120 may determine if a task is presented to the robot 120 at block 320. In some embodiments, the task may correspond to an action of the user 140 relating to the electronic device 160. For example, in some embodiments, the task may be for the robot 120 to perform the same and/or a similar action to the action performed by the user 140 on the electronic device 160 (e.g., playing the game). If no task is presented to the robot 120, then the robot 120 may continue to receive task-solution pairs at block 300 and/or store task-solution pairs at block 310.” Howard [0086] lines 1-10). 

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the time determination and mimicking of Howard because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user in a social setting with greater control, such as by displaying emotions in a normal conversation cadence, and through following a user during tasks (Howard [0003], [0045], and [0086]).

	Additionally, Sumida in view of Sanchez and Howard does not appear to disclose:

	the first action including the apparatus requesting the target person to execute a predetermined task; causing based on a second captured image and when the target person is not executing the predetermined task after the execution of the first action, the apparatus to again execute the first action; 

	However, in the same field of endeavor of robotic controls Roman discloses:

	“the first action including the apparatus requesting the target person to execute a predetermined task; causing based on a second captured image and when the target person is not executing the predetermined task after the execution of the first action, the apparatus to again execute the first action;” (“The assistant device can detect movements via one or more cameras. The movements can include body motion, face, and/or hand movements. The assistant device can further interpret the movements using a gesture recognition algorithm. The gesture recognition algorithm can include one or more of a skeletal-based algorithm, an appearance-based algorithm, and/or a 3D model-based algorithm. This feature allows for the user to interact with the assistant device by gestures. In at least one embodiment, the assistant device uses the combination of audio and video input to derive the user's communication, response and/or mood. For example, if the assistant device instructs the user to perform an action as part of the setup process, but the user throws her hands up in frustration, then this can be recognized as a gesture using video input. This gesture can be used to determine that the user is in a flustered mood. As another example, if the audio input includes words corresponding to frustration (e.g., "swear" words), then this can be recognized. If these situations are recognized, the assistant device can then repeat the instructions for the user to perform, slow down the verbal speech it provides using its speaker, etc. In at least one embodiment, user profiles can be built. The user profiles can include information about a user behavior. The user profile can include the user's typical movements. For example, a user profile can store that a user uses her hands to talk. The assistant device can use this profile information to eliminate false positives of frustration.” Roman [0053] lines 1-27).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the teaching and repeat request of Roman with the robotic system of Sumida and Howard and Sanchez because one of ordinary skill would have been motivated to make this modification in order to provide a means for the robot to accurately interpret what the user is doing, and repeat a request for a necessary function in a clear manner to ensure the user understands what is requested of them without being frustrated (Roman [0047] [0053]).


	Regarding claim 17 Sumida discloses:

	A system that communicates with a target person by executing a predetermined action, the system comprising: (“A robot has been conventionally known which determines its behavior based on an external situation and its internal status in order to enhance the amusement property for a pet robot (e.g. refer to Unexamined Japanese Patent Application Publication No. 2004-283958, paragraphs 0136-0148, FIG. 18). A robot disclosed in Unexamined Japanese Patent Application Publication No. 2004-283958 has a plurality of behavior describing modules (schemes) in which actions having objectives, such as "exploring", "eating", and "playing" are described. Each scheme calculates, by using a predetermined function, an activation level AL of an action, which is the degree of execution priority of the scheme, in response to a change in the internal status of the robot or an external stimulus. Generally, the robot selects a scheme that has the highest AL, and expresses an action that corresponds to the scheme.” Sumida [0005] lines 1-16) a camera that captures images around the robot; a microphone that acquires sounds around the robot; a processor; a speaker; (“The cameras (also referred to as a "vision sensor") C, C, capture digital data on images in the proceeding direction ahead of the robot R in digital data, and a color CCD (Charge-coupled Device) may be used as the cameras C, C. The cameras C, C are disposed on the right and left sides pair at the same height level, and output captured images to the image processor 10. The cameras C, C, the speaker S and the microphones MC, MC (audio input unit) are provided in the head R1. The speaker S (also referred to as an "audio output means") utters predetermined voices synthesized in the audio processor 20.” Sumida [0059] lines 1-12) and a driver that moves the robot; (“In the embodiment, the robot R is the autonomously-movable robot capable of two-leg walk, however, the present invention is not limited to this, and may be applied to an autonomously-movable robot that can move by its wheels. The autonomously-movable robot that can move by its wheels according to the present invention must have the same advantages as that of the robot R in the embodiment, except that its movable parts that corresponds to "legs" of the robot R are "wheels".” Sumida [0152] lines 1-9) wherein the processor: causes the robot to execute a third action (“In FIG. 1, the robot control system A includes plural robots Ra, Rb, Rc (hereinafter referred to simply as the "robot R" unless otherwise stated), and each robot R executes a task in accordance with an execution plan of the task (task schedule) that is predefined for each robot R through the management computer 3.” Sumida [0034] lines 1-6) as an initial action, the initial action being executed for communication with the target person (“When the priority comparing means 131 determines that the degree of priority of the detected rule is higher than that of the rule being executed, the parameter comparing means 132 compares an initial value of the degree of interest (action inducing parameter) contained in the detected rule and the present value of the degree of interest (action inducing parameter) set in the rule being executed. In the embodiment, the parameter comparing means 132 determines whether or not the degree of interest (initial value) of the selected rule is greater than the degree of interest (present value) of the rule being executed.” Sumida [0127] lines 1-11) according to at least one of the captured image or an acquired sound, (Sumida Fig. 9 rule contents column wherein the robot hears a sound and faces the source or a nearby person) a second action being one-level higher than the third action, a first action being one-level higher than the second action; (“In the aforementioned communication robot, the rule database includes a priority level in each of the plurality of rules, and the action inducing parameter setting unit includes a priority level comparing unit for comparing a priority level contained in the detected rule and a priority level contained in a rule being executed, a parameter comparing unit for comparing the initial value of the action inducing parameter contained in the detected rule and a present value of the action inducing parameter contained in the rule being executed when the priority level contained in the detected rule is greater than the priority level of the rule being executed, a rule changing unit for setting the initial value of the action inducing parameter contained in the detected rule in the situation database when the initial value of the action inducing parameter contained in the detected rule is equal to or greater than the present value of the action inducing parameter of the rule being executed.” Sumida [0013] lines 1-17 wherein the office interprets the ability to set a …the first action includes the robot executing a predetermined task, (“A robot has been conventionally known which determines its behavior based on an external situation and its internal status in order to enhance the amusement property for a pet robot (e.g. refer to Unexamined Japanese Patent Application Publication No. 2004-283958, paragraphs 0136-0148, FIG. 18). A robot disclosed in Unexamined Japanese Patent Application Publication No. 2004-283958 has a plurality of behavior describing modules (schemes) in which actions having objectives, such as "exploring", "eating", and "playing" are described. Each scheme calculates, by using a predetermined function, an activation level AL of an action, which is the degree of execution priority of the scheme, in response to a change in the internal status of the robot or an external stimulus. Generally, the robot selects a scheme that has the highest AL, and expresses an action that corresponds to the scheme.” Sumida [0005] lines 1-16) the second action includes the speaker outputting a voice that talks to the target person, (“The human handling module can greet or talk about the weather in accordance with the situation or a person the robot R is talking to, regardless of whether or not a task is executed.” Sumida [0096] lines 1-4) …

	Sumida does not appear to disclose:
	
	when a first sound is acquired by the microphone after an execution of the initial action, causes the robot to execute the second action one-level higher than the third action; when a second sound is acquired by the microphone after an execution of the second action, causes the robot to execute the first action one-level higher than the second action, the first action including the robot requesting the target person to execute a predetermined task; when deciding, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, causes the apparatus to again execute the first action;   when no sound is acquired by the microphone after the execution of the second action, determines whether a time elapsed from the execution of the second action is shorter than a threshold; when the time is shorter than the threshold, causes the robot to continue the second action; and when the time is equal to or longer than the threshold, causes the robot to execute the third action one-level lower than the second action, or and the third action includes the driver causing the robot to move in synchronization with a motion of the target person.  

	However, in the same field of endeavor of robotic controls Sanchez discloses:

	“when a first sound is acquired by the microphone after an execution of the initial action, (“If the robot is engaged by a user when in the user-directed state 243, the robot outputs information (e.g., a dialog) indicating that it is busy on a task. The dialog forces the user to either dismiss the dialog (thus canceling the engagement request) and allow the robot to proceed with its task, or cancel (or pause) the current user-directed task or application so the user can continue engagement.” Sanchez [0040] lines 1-7, fig. 3, wherein the robot starts in an autonomous condition that involves doing its own task until engaging with a user)  causes the robot to execute the second action one-level higher than the third action; (“As described below, the robot has a clear priority order of states as shown in the above table. A higher priority state can interrupt a lower priority state, and engagement (transitioning to the engaged state) can interrupt any self-directed or user-directed task.” Sanchez [0029] lines 1-5) when a second sound is acquired by the microphone after an execution of the second action, causes the robot to execute the first action one-level higher than the second action; (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes it’s task based on the voice command i.e. escalates to a new engagement level, see also [0040-0043]) when no sound is acquired by the microphone after the execution of the second action, determines whether a time elapsed from the execution of the second action is shorter than a threshold; when the time is shorter than the threshold, causes the robot to continue the second action; and when the time is equal to or longer than the threshold, causes the robot to execute the third action one-level lower than the second action,” (“With respect to robot name recognition, the robot may have different reactions to recognizing its name that depend on the current context, e.g., while available for engagement, while offering engagement, and while engaged. For example, while available, the robot may pause, turn its head toward the sound source, and scan for a localized face. If no face is detected, the robot enters speech engagement and displays a keyword shortcut dialog, and the robot treats the recognition event as a "subsequent voice heard" step and proceeds with attraction operation, e.g., move toward the sound source, pause and scan for faces, returning to the previous task if no face is detected before a timeout time. If a face is detected, or if the robot is offering engagement, the robot proceeds to "nearby" engagement offer and acceptance operations as described below.” Sanchez [0045] wherein the robot changes its task based on the voice command i.e. escalates to a new engagement level, see also [0040-0043])

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the sound operated tasks of Sanchez because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user and complete tasks as commanded based on a hierarchy in order to correctly complete multiple tasks (Sanchez [0035] and [0044]).

	Additionally, Sumida in view of Sanchez do not appear to disclose:

	the first action including the robot requesting the target person to execute a predetermined task; when deciding, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, causes the apparatus to again execute the first action; or and the third action includes the driver causing the robot to move in synchronization with a motion of the target person

	However, in the same field of endeavor of robotic controls Howard discloses:

	 “and the third action includes the driver causing the robot to move in synchronization with a motion of the target person.” (“Referring again to FIG. 3, the robot 120 may determine if a task is presented to the robot 120 at block 320. In some embodiments, the task may correspond to an action of the user 140 relating to the electronic device 160. For example, in some embodiments, the task may be for the robot 120 to perform the same and/or a similar action to the action performed by the user 140 on the electronic device 160 (e.g., playing the game). If no task is presented to the robot 120, then the robot 120 may continue to receive task-solution pairs at block 300 and/or store task-solution pairs at block 310.” Howard [0086] lines 1-10). 

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida with the time determination and mimicking of Howard because one of ordinary skill would have been motivated to make this modification in order to provide a means for a robot to interact with a user in a social setting with greater control, such as by displaying emotions in a normal conversation cadence, and through following a user during tasks (Howard [0003], [0045], and [0086]).

	Additionally, Sumida in view of Sanchez and Howard do not appear to disclose:

	the first action including the robot requesting the target person to execute a predetermined task; when deciding, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, causes the apparatus to again execute the first action;   

	However, in the same field of endeavor of robotic controls Roman discloses:

the first action including the robot requesting the target person to execute a predetermined task; when deciding, based on a second captured image, that the target person is not executing the predetermined task after the execution of the first action, causes the apparatus to again execute the first action;” (“The assistant device can detect movements via one or more cameras. The movements can include body motion, face, and/or hand movements. The assistant device can further interpret the movements using a gesture recognition algorithm. The gesture recognition algorithm can include one or more of a skeletal-based algorithm, an appearance-based algorithm, and/or a 3D model-based algorithm. This feature allows for the user to interact with the assistant device by gestures. In at least one embodiment, the assistant device uses the combination of audio and video input to derive the user's communication, response and/or mood. For example, if the assistant device instructs the user to perform an action as part of the setup process, but the user throws her hands up in frustration, then this can be recognized as a gesture using video input. This gesture can be used to determine that the user is in a flustered mood. As another example, if the audio input includes words corresponding to frustration (e.g., "swear" words), then this can be recognized. If these situations are recognized, the assistant device can then repeat the instructions for the user to perform, slow down the verbal speech it provides using its speaker, etc. In at least one embodiment, user profiles can be built. The user profiles can include information about a user behavior. The user profile can include the user's typical movements. For example, a user profile can store that a user uses her hands to talk. The assistant device can use this profile information to eliminate false positives of frustration.” Roman [0053] lines 1-27).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the teaching and repeat request of Roman with the robotic system of Sumida .


	Claims 2-7 are rejected under 35 U.S.C. 103 as being unpatentable over Sumida and Sanchez and Howard and Roman as applied to claim 1 above, and further in view of Lee et al. (US Pre-Granted Application No. US 2017/0206900 A1 hereinafter “Lee”).

	Regarding claim 2 Sumida in view of Sanchez and Howard disclose all of the limitations of claim 1 and further discloses:

	The apparatus according to Claim 1, wherein an action one-level lower than the third action is a fourth action, an action one-level lower than the fourth action is a fifth action, P1004878the processor causes the apparatus to execute the fifth action (“In the aforementioned communication robot, the rule database includes a priority level in each of the plurality of rules, and the action inducing parameter setting unit includes a priority level comparing unit for comparing a priority level contained in the detected rule and a priority level contained in a rule being executed, a parameter comparing unit for comparing the initial value of the action inducing parameter contained in the detected rule and a present value of the action inducing parameter contained in the rule being executed when the priority level contained in the detected rule is greater than the priority level of the rule being executed, a rule changing unit for setting the initial value of the action inducing parameter contained in the detected rule in the situation database when the initial value of the action inducing parameter contained in the detected rule is equal to or greater than the present value of the action inducing parameter of the rule being executed.” Sumida [0013] lines 1-17 wherein the office interprets the ability to set a priority of tasks to be equivalent to setting a first task a step above a second, a second above a third, and so on.) 26  … the fourth action includes the driver causing the apparatus to perform a predetermined motion at a current position of the apparatus, (“The rule DB storage means 33 stores scenarios (acting scripts) corresponding to various behavior patterns, rules corresponding to various situations (rule DB), and specific action or speech contents for executing the rules (action DB). The rules define generation of actions expressed by the robot R. The scenarios include those regarding actions of, for example, stopping 1 meter ahead of a person or an obstacle (i.e. target object) when encountering this target object while walking, or lifting the arm R2 up to a predetermined position 10 seconds after stopping, as well as those regarding speech. The rule database storage means 33 stores scenarios predefined for specifying a gesture as a physical behavior of moving at least one of the head R1, the arms R2, the legs R3 and the body R4 when the robot R performs a predetermined speech. The action DB and rule DB stored in the rule DB storing means 33.” Sumida [0087] lines 1-16) …

	Sumida nor Howard nor Sanchez do not appear to disclose:

	when a third sound is acquired by the microphone after the execution of at least one of the third action or the second action, the third sound including a voice of the target person and the voice including a phrase included in a dictionary provided in the apparatus, or and the fifth action includes the apparatus stopping communication with the target person.  

	However, in the same field of endeavor of robotic controls Lee discloses:
	“when a third sound is acquired by the microphone after the execution of at least one of the third action or the second action, the third sound including a voice of the target person and the voice including a phrase included in a dictionary provided in the apparatus,” (“The processor 360 may detect a sound wave (for example, a voice generated by the first user) from a periphery by using a plurality of microphones 311, 312, 313, and 314. The processor 360 may determine whether the detected sound wave corresponds to a wakeup signal. The wakeup signal, for example, may include various voice signals, such as a voice signal including a specific word, a voice signal including a combination of words including a specific word, a voice signal of a specific type (for example, a specific sentence type (or pattern)), a voice signal related to a specific domain, or a voice signal of a specific user. For example, the processor 360 may determine that a wakeup signal has been received when the detected sound wave is similar to a specific waveform. According to an embodiment of the present disclosure, the processor 360 may perform a voice recognition for the detected sound wave, and may determine that the wakeup signal has been received when a specific word is included in the voice recognition result.” Lee [0056-0057] lines 1-19) and “and the fifth action includes the apparatus stopping communication with the target person.  (“While a command is received from the first user 10 or a function corresponding to the received command is executed, the electronic device 200 receives a wakeup signal from the second user 20 by using the array of microphones 230 in operation 5. The electronic device 200 may store information on the direction from which the wakeup signal is received, from the second user 20. If a conversation with the first user 10 has ended in operation 6, the electronic device 200 rotates the head 220 in the direction from which the wakeup signal is received from the second user 20, based on the stored information in operation 7. The electronic device 200 receives a command from the second user 20 in operation 8. The electronic device 200 may receive a command by using the microphone 232, which faces the second user 20, of a plurality of microphones provided in the array of microphones 232. The electronic device 200 may select the microphone 232 based on the stored information. The electronic device 200 executes a function corresponding to the command received from the second user 20 in operation 9.” Lee [0048] lines 1-20).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida and Howard and Sanchez with the voice controls and conversation termination of Lee because one of ordinary skill would have been motivated to make this modification in order to provide a means for interacting with a user following a user vocal input additional users when a first user task is satisfied (Lee [0007-0008]).

	Regarding claim 3 Sumida in view of Lee and Sanchez and Howard and Roman disclose all of the limitations of claim 2 and further discloses: 

	The apparatus according to Claim 2, wherein the fourth action includes the apparatus swinging rightward and leftward.  (“Different action IDs are also assigned to the action ID="9" and "10" since sound volumes of sounds synthesized by the sound synthesis unit 21a are different in the actions of the action IDs="9" and "10". Actions other than those shown in FIG. 10 may be included, which are, for example, "twisting the body at the waist", "swinging the arm", "closing and opening fingers", and "waiving goods such as a flag that the robot R grippes".” Sumida [0118] lines 1-8).

	Regarding claim 4 Sumida in view of Lee and Sanchez and Howard and Roman discloses all of the limitations of claim 2 and further discloses:

	The apparatus according to Claim 2, wherein the fourth action includes the apparatus spinning with a direction of a force of gravity taken as an axis.  (“Different action IDs are also assigned to the action ID="9" and "10" since sound volumes of sounds synthesized by the sound synthesis unit 21a are different in the actions of the action IDs="9" and "10". Actions other than those shown in FIG. 10 may be included, which are, for example, "twisting the body at the waist", "swinging the arm", "closing and opening fingers", and "waiving goods such as a flag that the robot R grippes".” Sumida [0118] lines 1-8 wherein the office interprets twisting at the waist as spinning with a direction of force of gravity as the axis).

	Regarding claim 5 Sumida in view of Lee and Sanchez and Howard and Roman disclose all of the limitations of claim 2 but Sumida does not appear to further discloses:
	
	wherein the fifth action includes the apparatus moving away from the target person.  

	However, in the same field of endeavor of robotic controls Lee discloses:

wherein the fifth action includes the apparatus moving away from the target person.”  (“While a command is received from the first user 10 or a function corresponding to the received command is executed, the electronic device 200 receives a wakeup signal from the second user 20 by using the array of microphones 230 in operation 5. The electronic device 200 may store information on the direction from which the wakeup signal is received, from the second user 20. If a conversation with the first user 10 has ended in operation 6, the electronic device 200 rotates the head 220 in the direction from which the wakeup signal is received from the second user 20, based on the stored information in operation 7. The electronic device 200 receives a command from the second user 20 in operation 8. The electronic device 200 may receive a command by using the microphone 232, which faces the second user 20, of a plurality of microphones provided in the array of microphones 232. The electronic device 200 may select the microphone 232 based on the stored information. The electronic device 200 executes a function corresponding to the command received from the second user 20 in operation 9.” Lee [0048] lines 1-20 wherein the robot rotating a head away from the conversation is equivalent to moving away from the target person).

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida and Howard with the conversation termination of Lee because one of ordinary skill would have been motivated to make this modification in order to provide a means for interacting with additional users when a first user task is satisfied (Lee [0008]).

claim 6 Sumida in view of Lee and Howard and Sanchez and Roman disclose all of the limitations of claim 2 and further discloses:

	The apparatus according to Claim 2, wherein the fifth action includes the apparatus turning 180 degrees with a direction of a force of gravity taken as an axis.  (“As shown in FIG. 12B, the robot R rotates to the requested position so that the robot R faces in the direction of the audio source 1201. Upon recognizing the completion of the rotational movement to the requested position, the action inducing parameter setting means 130 sets a flag indicating that the degree of interest is maintained without being changed to "Off" (e.g. flag=0). Then, the parameter change means 140 starts to decrease the degree of interest of the selected rule (rule ID="2"). Initiation or completion of the movement to the requested position may be directly notified from any one of the gesture integration unit 44, the behavior patterning unit 43 and the object data integration unit 42 to the action inducing parameter setting means 130, or may be written in the situation DB by any one of the units 42 to 44. This configuration has the following advantages. If the degree of interest starts to be decreased immediately after starting the movement, the time period in which a posture is kept after completion of the movement is changed depending on a start position of the movement. For example, the robot is rotated by 90 degrees to the requested position in the example shown in FIG. 12B, however, if the robot is to be rotated by 180 degrees to the requested position, a time period in which a posture is kept after completion of the movement becomes shorter. In the embodiment, since the value of the degree of interest can be maintained for a while after the command for the movement is output, and then the degree of interest is decreased, a predetermined time in which a posture is kept after reaching the requested position can be made relatively constant.” Sumida [0133] lines 1-28).

	Regarding claim 7 Sumida in view of Lee and Howard and Sanchez and Roman disclose all of the limitations of claim 2 and further discloses: 

	The apparatus according to Claim 2, … and a predetermined interrupt-disable condition is set in the apparatus, (“In accordance with the embodiment, the robot R can perform diverse actions during charging without interrupting execution of a predetermined task since the robot R is triggered to start a process for detecting a change in the situation database when the robot R is connected to a battery charger.” Sumida [0144] lines 1-6) the processor causes the apparatus to execute the fifth action, and the predetermined interrupt-disable condition includes a condition about a predetermined time zone and a condition about a place of the target person.  (“The rule DB storage means 33 stores scenarios (acting scripts) corresponding to various behavior patterns, rules corresponding to various situations (rule DB), and specific action or speech contents for executing the rules (action DB). The rules define generation of actions expressed by the robot R. The scenarios include those regarding actions of, for example, stopping 1 meter ahead of a person or an obstacle (i.e. target object) when encountering this target object while walking, or lifting the arm R2 up to a predetermined position 10 seconds after stopping, as well as those regarding speech. The rule database storage means 33 stores scenarios predefined for specifying a gesture as a physical behavior of moving at least one of the head R1, the arms R2, the legs R3 and the body R4 when the robot R performs a predetermined speech. The action DB and rule DB stored in the rule DB storing means 33.” Sumida [0087] lines 1-16).

	Sumida does not appear to disclose:

	wherein when no sound is acquired by the microphone after the execution of at least one of the third action and the second action

	However, in the same field of endeavor of robotic controls Lee discloses:

	“wherein when no sound is acquired by the microphone after the execution of at least one of the third action and the second action” (“In operation 730, the electronic device determines whether the first command received from the first direction has been completely processed. For example, if receiving a stop command from the first direction or not receiving a command from the first direction for a specific time period, the electronic device may determine that the first command has been completely processed. The electronic device may perform operation 740 after waiting until the first command is completely processed.” Lee [0113] lines 1-9)
	
	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida and Howard with the conversation termination when there is no detected sound of Lee because one of ordinary skill would have been motivated to make this modification in order to provide a means for interacting with additional users when a first user task is satisfied (Lee [0008]).

	Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Sumida and Howard and Sanchez and Roman as applied to claim 1 above, and further in view of Miro et al. (US Pre-Granted Application No. US 2005/0154265 A1 hereinafter “Miro”).

	Regarding claim 10 Sumida in view of Howard and Sanchez and Roman disclose all of the limitations of claim 1 and further discloses: 
	
	The apparatus according to Claim 1 wherein the processor causes the apparatus to execute the third action as the initial action when an image of the target person is recognized from the captured image and a voice of the target person is not recognized from the acquired sounds, (“When executing a guide task, for example, the robot R guides a person H in a predetermined guide area (e.g. movement area such as an office or a hallway). In this example, the robot R irradiates light (e.g. infrared ray, ultraviolet ray, laser beam) and radio waves toward a circumference of the robot R, thereby to detect the person H wearing a tag T in the circumferential region, identify a position of the detected person H and approach to him or her so that the robot R executes a personal identification to find who the person H is, based on the tag H. This tag T receives infrared ray and radio waves transmitted from the robot R for the sake of identifying the position (distance and orientation) of the person H. Based on signals indicating a light-receiving orientation included in the infrared ray and the robot ID included in the received radio waves, the tag T generates a receiving report signal that includes the tag ID number, and sends this receiving report signal back to the robot R. When receiving the receiving report signal, the robot R recognizes the distance and orientation to the person H wearing the tag T, so that the robot R can approach this person H.” Sumida [0039] lines 1-20) … the processor causes the apparatus to execute the second action.  (“The human handling module can greet or talk about the weather in accordance with the situation or a person the robot R is talking to, regardless of whether or not a task is executed.” Sumida [0096] lines 1-4 wherein various processors are utilized to control the robot [0059]).

	Sumida does not appear to disclose:

	and when the processor does not recognize the image of the target person from the captured image and recognizes a voice of the target person from the acquired sounds,

	However, in the same field of endeavor Miro discloses:

	“and when the processor does not recognize the image of the target person from the captured image and recognizes a voice of the target person from the acquired sounds,” (“The patient voice identification 30 is used to identify the patient 26 using voice identification. For example, this may be accomplished with a password trained in advance that also identifies the voice of the speaker. This assures that patient confidentiality (as required by HIPAA standards) is assured. In an alternate embodiment, the patient condition sensors 28 are also used to identify various biometric factors to be used in an authentication technique (e.g. fingerprints, blood DNA analyses, etc.) either along with or instead of voice identification. Specifically, the patient condition sensors 28 include a biometric identification module used to sense a physiological condition or characteristic of the patient 26 (e.g., such as a camera for facial recognition or an electronic scanning pad for fingerprint identification). The sensed physiological characteristic is then used to identify or recognize a given patient 26. If the patient 26 is new to the intelligent nurse robotic system 10, voice and physiological characteristics may be stored in a patient database 40. The patient database 40 is a data store stored on a server within the hospital or retirement home, though the patient database 40 may also be located within the robot 12 itself. The CPU 24 is in direct communication with the transmitter/receiver 38 and is able to access the patient database 40 to recognize the patient 26 after initial voice and physiological characteristics specific to the patient 26 have been stored therein. The patient database 40 may also include various information specific to a patient 26. For example, such information can include the patient's 26 medical history, the patient's 26 dialogue related preferences (e.g., language and style of interaction), and any other relevant medical information. As will be discussed below, access to the patient database 40 allows the intelligent nurse robotic system 10 to have a great degree of specialization when interacting with a given patient 26.” Miro [0014] lines 1-34)

	It would have been obvious for one having ordinary skill in the art prior to the effective filing date to combine the robotic system of Sumida and Howard with the recognition means of Miro because one of ordinary skill would have been motivated to make this modification in order to provide a means to recognize and interact with regular users to allow for customized access to stored information or password protected information (Miro [0014]).

	Regarding claim 11 Sumida in view of Howard and Sanchez and Roman and Miro disclose all of the limitations of claim 10 and further discloses:

The apparatus according to Claim 10, wherein the second action includes the speaker outputting the voice including a name corresponding to the target person.  (“The speech information storing means 35 stores information used for a speech of the robot R. The speech information storing means 35 stores communication information which is determined by scenarios corresponding to various behavior patterns. The communication information includes. for example, a fixed phrase for greeting "Hello, Mr. . . .” and a fixed phrase for confirmation "This is to be sent to Mr . . . . , right?". The speech information storing means 35 stores information of communication contents to be spoken during execution of a rule stored in the rule DB storage means 33. The information of communication contents to be spoken during execution of a rule includes, for example, a reply "Yes" and a fixed phrase indicating a time ". . . o'clock, . . . minutes". The information (communication data) is sent, for example, from the management computer 3.” Sumida [0089] lines 1-15).

	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Sumida and Howard and Sanchez and Roman as applied to claim 15 above, and further in view of Gildert (US Pre-Granted Application No. US 2017/0305014 A1 hereinafter “Gildert”).

	Regarding claim 16 Sumida in view of Howard and Sanchez and Roman discloses all of the limitations of claim 15 but Sumida does not appear to further disclose:

	A non-transitory computer-readable recording medium storing a program that causes the apparatus to execute the method… 

	However, in the same field of endeavor Gildert discloses:

	“A non-transitory computer-readable recording medium storing a program that causes the apparatus to execute the method” (“In another illustrative embodiment, the disclosure describes a robotic system. The system includes at least one processor, an operator controllable device in communication with the at least one processor, an operator interface in communication with the at least one processor, and the operator controllable device via a communication channel, and at least one non -transitory computer-readable storage medium coupled to the at least one processor, and which stores processor-executable instructions thereon. When executed, the processor-executable instructions cause the at least one processor to receive a training set including a first plurality of positions in a first configuration space associated with the operator interface, a second plurality of positions in a second configuration space associated with the operator controllable device, and information that represents a plurality of pairs of positions. A representative pair in the plurality of pairs of positions includes a first representative position in the first configuration space and a second representative position in the second configuration space. When executed, the processor-executable instructions also cause the at least one processor to create information based on the training set that represents a map between a first run-time position in the first configuration space, and a second run-time position in the second configuration space, receive a first run-time position in the first configuration space, and derive a second run-time position in the second configuration space from the information that represents the map and the first run-time position in the first configuration space.” Gildert [0010] lines 1-28). 
	
.

Conclusion

	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 4,809,333 A discloses a means for requesting a user to repeat a command or step
US 2020/0241824 A1 discloses a means for identifying when a user is confused and repeating a task instruction
US 9,724,824 B1 discloses a robotic device that interacts and changes interactions based on the user response


	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kyle T Johnson whose telephone number is (303)297-4339.  The examiner can normally be reached on Monday-Thursday 7:00-5:00 MT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey Burke can be reached on (571) 270-3844.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/K.T.J./Examiner, Art Unit 3664                                                                                                                                                                                                        

/JEFF A BURKE/Supervisory Patent Examiner, Art Unit 3664