DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Response to Arguments

Applicant's arguments filed 11/23/2020 have been fully considered but they are not persuasive because of the newly cited reference. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 2, 4-5, 7, 15, 18 and 29-34  is/are rejected under 35 U.S.C. 103 as being unpatentable over Leblang U.S. Patent No. 10.374.816 B1 in view of Khan U.S. PAP 2016/0155443 A1, further in view of Sumiyoshi U.S. PAP 2018/0108352 A1.
Regarding claim 1 Leblang teaches a method performed by one or more computers, the method comprising:
receiving, by the one or more computers, messages from a plurality of devices, each of the messages indicating a respective voice input detected by the device that sent the message (At block 310, an audio signal can be received and/or processed. One or more voice-enabled devices, 
obtaining, by the one or more computers, an audio signature for each of the voice inputs detected by the plurality of devices (At block 310, an audio signal can be received and/or processed. One or more voice-enabled devices, microphones, or conference devices may transmit an audio signal to the voice based system 200, see col. 15 lines 17-20); 
evaluating, by the one or more computers, the audio signatures for the voice inputs and times that the voice inputs were detected (The devices can also transmit metadata associated with the audio signal. The metadata can include an identifier for the voice-enabled device, a time that the audio input was received, or a time that the audio signal was generated, see col. 15 lines 27-30); 
based on evaluating the audio signatures for the voice inputs and the times that the voice inputs were detected, grouping, by the one or more computers, at least some of the plurality of devices to form a group of multiple devices that are determined based on the valuation, to have detected a same user utterance and that are each configured to respond to the user utterance (the arbitration service 270 may dynamically determine a group of voice-enabled devices on the fly for arbitration purposes. While block 320 is shown after the previous blocks 305, 310, 315, in some embodiments, block 320 may occur before any of those blocks. The arbitration service 270 may dynamically determine the group based on metadata that can include conference call session data, event data, and/or voice identification data, see col. 15 lines 51-60; the voice based system 200 can receive many audio signals from disparate, unrelated voice-enabled devices within a period of time. Following identification of a voice command by the voice based system 200 from a voice-enabled device from a group, the system 200 can check for other audio signals that are also received from different voice-enabled devices from the same group. The other audio signals that are received from the same group can potentially correspond to the same voice command, see col. 26 lines 7-24); 
and managing, by the one or more computers, the multiple devices in the group so that only the selected device outputs a response to the user utterance (block 340, the command is executed. For example, the execution service 252 may execute the command that was associated with a particular device; as described herein, the voice based system 200 can cause the determined voice-enabled device to play media, such as media associated with a conference call or meeting, see col. 17 lines 31-45).  
However Leblang does not teach the plurality of devices including a robot; receiving, by the one or more computers, context data from the robot that describes conditions in an environment of the robot, the context data being based on sensor data captured by one or more sensors of the robot; the group including the robot and selecting, by the one or more computers, a device to respond to the user utterance from among the multiple devices in the group based on (i) device types of the devices in the group (ii) the context data provided by the robot; managing by the one or more computers, the multiple devices in the group so that only the robot outputs a response to the user utterance.
In the same field of endeavor Sumiyoshi teaches a robot interactive communication system includes a robot being configured to interact with a user, an environment sensor being configured to detect an environmental condition of the space. The robot includes a speech information-based interaction unit, a text information-based interaction unit, see abstract. A service robot is expected to act as intended by the service developer and further, to respond appropriately to the situation. The situation includes the condition of the robot itself, the 
It would have been obvious to one of ordinary skill in the art to combine the Leblang invention with the teachings of Sumiyoshi for the benefit of using environmental conditions to provide an appropriate service response from a service robot, see par. [0004].


Regarding claim 2 Leblang teaches the 2. The method of claim 1, further comprising identifying a respective account associated with each of the plurality of devices (master account, see col. 4 lines 1-5);
wherein grouping at least some of the plurality of devices comprises defining the group to include devices that detected the same user utterance and that are associated with a same account (In contrast to the home setting, the voice based system in a large-scale setting can include tools to set up large numbers of devices at once, which can create accounts for the devices and/or link the accounts to a master account, see col. 4 lines 1-5).  
Regarding claim 4 Leblang teaches the method of claim 1, wherein evaluating the audio signatures for the voice inputs and times that the voice inputs were detected comprises:30 Attorney Docket No. 43374-0128001 
comparing the audio signatures to identify audio signatures that differ by less than a threshold amount (same command determined, see col. 16 lines 7-24); 
and comparing times that the voice inputs were detected to identify devices that detected voice inputs that began within threshold amount of time from each other (same command within a threshold of time, see col. 16 lines 7-24).  
claim 5 Leblang teaches the method of claim 1, further comprising obtaining a transcription of the respective voice inputs using an automatic speech recognition system (the voice-enabled device 102 may operate in a low-functionality mode and analyze sound using automatic speech recognition processing, see col. 5 lines 26-32);
wherein grouping at least some of the plurality of devices comprises defining the group to include devices that detected the same user utterance and that detected voice inputs determined to have transcriptions that are within a threshold level of similarity (The ASR system 258 may transcribe received audio data into text data representing the words of the speech contained in the audio data using STT system 266. ASR system 258 may then interpret an utterance based on the similarity between the utterance and pre-established language models stored in an ASR model knowledge base of the storage/memory 254, see col. 13 line 64; In some embodiments, multiple voice-enabled devices may be assigned to a group or session (e.g., a group or session for a conference call) as described herein. The same command can be determined to have been received from the same session within a threshold period of time, see col. 16 lines 7-14).  
Regarding claim 7 Leblang teaches the method of claim 1, wherein the one or more computers comprise a server system;
wherein the messages from the devices are received by the server system over a communication network (the voice based system 200 is configured in a server cluster, server farm, data center, mainframe, cloud computing environment, or a combination thereof, see col. 9 lines 36-40); 
and wherein the method further comprises: 

and generating, by the server system, the audio signatures based on the received audio data of the voice inputs detected by the devices (ASR may recognize human speech in detected audio and transmitted to voice based system, see col. 13, lines 64-67).  
Regarding claim 15 Leblang teaches the method of claim 1, wherein the user utterance is a first user utterance, and wherein the method comprises:
based on evaluating the audio signatures for the voice inputs and the times that the voice inputs were detected, grouping, by the one or more computers, at least some of the plurality of devices to form a second group of multiple devices that detected a same second user utterance, wherein the second user utterance overlaps in time with the first user utterance (The same command can be determined to have been received from the same session within a threshold period of time. For example, the voice based system 200 can receive many audio signals from disparate, unrelated voice-enabled devices within a period of time, see col. 16 lines 7-11); 
selecting, by the one or more computers, a device to respond to the second user utterance from among the multiple devices in the second group (The determination that another device received the same command can be can further be based on receipt by the second voice-enabled device being within a threshold period of time of receipt of the voice command by the first voice-enabled device, see col. 16 lines 36-40; a particular device can be determined to be associated with the command. For example, the arbitration service 270 can use the data from the previous blocks to determine that a particular device is associated with the command, see col. 16 lines 64-67); 

Regarding claim 18 Leblang teaches a system comprising: 
one or more computers (one or more processors, see col. 6 line 47); 
and one or more non-transitory computer-readable media storing instructions that, when executed by the one or more computers (the voice based system 200 may include one or more processors and/or non-transitory computer-readable media, see col. 6 lines 46-48), cause the one or more computers to perform operations comprising: 
receiving, by the one or more computers, messages from a plurality of devices, each of the messages indicating a respective voice input detected by the device that sent the message (At block 310, an audio signal can be received and/or processed. One or more voice-enabled devices, microphones, or conference devices may transmit an audio signal to the voice based system 200, see col. 15, lines 4-16); 
obtaining, by the one or more computers, an audio signature for each of the voice inputs detected by the plurality of devices (At block 310, an audio signal can be received and/or processed. One or more voice-enabled devices, microphones, or conference devices may transmit an audio signal to the voice based system 200, see col. 15 lines 17-20); 
evaluating, by the one or more computers, the audio signatures for the voice inputs and times that the voice inputs were detected (The devices can also transmit metadata associated with the audio signal. The metadata can include an identifier for the voice-enabled device, a time that 
based on evaluating the audio signatures for the voice inputs and the times that the voice inputs were detected, grouping, by the one or more computers, at least some of the plurality of devices to form a group of multiple devices that are determined based on the evaluation, to have detected a same user utterance of a user and that are each configured to respond to the user utterance (the arbitration service 270 may dynamically determine a group of voice-enabled devices on the fly for arbitration purposes. While block 320 is shown after the previous blocks 305, 310, 315, in some embodiments, block 320 may occur before any of those blocks. The arbitration service 270 may dynamically determine the group based on metadata that can include conference call session data, event data, and/or voice identification data, see col. 15 lines 51-60; the voice based system 200 can receive many audio signals from disparate, unrelated voice-enabled devices within a period of time. Following identification of a voice command by the voice based system 200 from a voice-enabled device from a group, the system 200 can check for other audio signals that are also received from different voice-enabled devices from the same group. The other audio signals that are received from the same group can potentially correspond to the same voice command, see col. 26 lines 7-24); 
and managing, by the one or more computers, the multiple devices in the group so that only the selected device outputs a response to the user utterance (block 340, the command is executed. For example, the execution service 252 may execute the command that was associated with a particular device; As described herein, the voice based system 200 can cause the determined voice-enabled device to play media, such as media associated with a conference call or meeting, see col. 17 lines 31-45).  

In the same field of endeavor Khan teaches a method of controlling which electronic device out of topology of interconnected electronic devices responds to a wake phrase, see par. [0005]. The technologies can support a rich mix of device types that can be present in a device topology of a user. For example, phones, tablets, game consoles, wearable computers, desktop, laptop, and the like can be supported, see par. [0103]. A supported approach to controlling which device responds is to choose the device that has been designated as the primary device for the interconnected devices. Responsive to determining that such a device is not available, a fall back list of devices can be used to determine which device is acting primary. The fall back list can be list of devices, list of device types, or list of device designations, see par. [0120]. An example fallback list is as follows: preferred device; the device that is currently active; the device that was most recently used; service provide default device. The list can further continue with wearable device; phone; tablet; laptop; game console; and desktop, see par. [0121]; (a direction of travel of the user with respect to the selected device) a device can record physical activity. Such recorded activity can then be used for device arbitration to select a single device to respond to the user, perform a task, or the like. Such activity can be derived from hardware sensors. For example, physical movement of a device, activity at a touchscreen, keyboard, pointing device, movement visually detected, user visual (e.g., face, skeletal, etc.) recognition, or the like, see par. [0080].

Sumiyoshi teaches the service robot 20 determines the condition of the periphery of the service robot 20 from not only the image information acquired from its own camera 225 but also the image information acquired from the environment cameras 30 to accurately determine the environmental condition of the space where the service robot 20 is provided. The service robot 20 can output image information to the display device 40 provided in the space when the service robot 20 uses the image information for a response to the user, see par. [0038]); user walking towards the robot (Embodiment 2 provides an example where the processing performed by the service robot 20 in Embodiment 1 to recognize the sensor information by image recognition, speech recognition, and/or moving object recognition is performed by the server 70; see par. [0101]).
It would have been obvious to one of ordinary skill in the art to combine the Leblang in view of Khan invention with the teachings of Sumiyoshi for the benefit of using environmental conditions to provide an appropriate service response from a service robot, see par. [0004].

Regarding claim 20 Leblang teaches one or more non-transitory computer-readable media storing instructions that, when executed by one or more computers (the voice based system 200 may include one or more processors and/or non-transitory computer-readable media, see col. 6 lines 46-48), cause the one or more computers to perform operations comprising:
receiving, by the one or more computers, messages from a plurality of devices, each of the messages indicating a respective voice input detected by the device that sent the message (At 
obtaining, by the one or more computers, an audio signature for each of the voice inputs detected by the plurality of devices (At block 310, an audio signal can be received and/or processed. One or more voice-enabled devices, microphones, or conference devices may transmit an audio signal to the voice based system 200, see col. 15 lines 17-20); 
evaluating, by the one or more computers, the audio signatures for the voice inputs and times that the voice inputs were detected (The devices can also transmit metadata associated with the audio signal. The metadata can include an identifier for the voice-enabled device, a time that the audio input was received, or a time that the audio signal was generated, see col. 15 lines 27-30); 
based on evaluating the audio signatures for the voice inputs and the times that the voice inputs were detected, grouping, by the one or more computers, at least some of the plurality of devices to form a group of multiple devices that are determined based on the evaluation to have detected a same user utterance of a user and that are each configured to respond to the user utterance, (the arbitration service 270 may dynamically determine a group of voice-enabled devices on the fly for arbitration purposes. While block 320 is shown after the previous blocks 305, 310, 315, in some embodiments, block 320 may occur before any of those blocks. The arbitration service 270 may dynamically determine the group based on metadata that can include conference call session data, event data, and/or voice identification data, see col. 15 lines 51-60; the voice based system 200 can receive many audio signals from disparate, unrelated voice-enabled devices within a period of time. Following identification of a voice command by the 
selecting, by the one or more computers, a device to respond to the user utterance from among the multiple devices in the group based on at least one of a location of a user, a pose of the user, a gaze direction of the user, a direction of movement of the user, or an interaction of the user with one or more of the devices in the group (a particular device can be selected from the group of devices, see col. 17 lines 1-3; A beacon can independently and/or additionally identify a user at a particular location and/or proximately located near a voice-enabled device, which can be used for arbitration purposes, see col. 26 lines 41-56);
and managing, by the one or more computers, the multiple devices in the group so that only the selected device outputs a response to the user utterance (block 340, the command is executed. For example, the execution service 252 may execute the command that was associated with a particular device; As described herein, the voice based system 200 can cause the determined voice-enabled device to play media, such as media associated with a conference call or meeting, see col. 17 lines 31-45).  
However Leblang does not teach, wherein the multiple devices are determined to be located in the same location; receiving, by the one or more computers, data indicating a request from a particular device in the group, sent by the particular device after the user utterance is spoken, for the particular device to respond to the user utterance; selecting by the one or more 
In the same field of endeavor Khan teaches a method of controlling which electronic device out of topology of interconnected electronic devices responds to a wake phrase, see par. [0005]. The technologies can support a rich mix of device types that can be present in a device topology of a user. For example, phones, tablets, game consoles, wearable computers, desktop, laptop, and the like can be supported, see par. [0103]. A supported approach to controlling which device responds is to choose the device that has been designated as the primary device for the interconnected devices. Responsive to determining that such a device is not available, a fall back list of devices can be used to determine which device is acting primary. The fall back list can be list of devices, list of device types, or list of device designations, see par. [0120]. 
Khan teaches wherein the multiple devices are determined to be located in the same location (there are groups of devices being used by groups of people in the same location, see par. [0049]); receiving, by the one or more computers, data indicating a request from a particular device in the group to respond to the user utterance (devices that heard the user 930 can perform the initial processing to determine whether they should respond. At 940, if a preferred device is available (e.g., recognized the wake phrase), it can respond 950, see par. [0329]; selecting by the one or more computers, the particular device to respond to the user utterance based on the request (The spoken command can then be completed 960 and recognized, see par. [0330]; managing by the one or more computers, the multiple devices in the group so that only the Such a handoff can be to the explicitly specified device, to the preferred device for the scenario, or to the default device for the scenario, see par. [0331]), and (ii sending, to each of the other devices in the group, a message command suppressing response to the user utterance by the other devices (A device can be configured to listen for and accept handoff commands, see par. [0311]; if the device receives information that it is the right device at 840, it can proceed to a full wake up, play an audio prompt and await a voice command at 860. If not, it can standby for an incoming handoff at 850, see par. [0325]; The device can handoff the task to another electronic device as described herein. In such a case, the device can then eventually transition back to a standby, low-power state; [0097]).
It would have been obvious to one of ordinary skill in the art to combine the Leblang invention with the teachings of Khan for the benefit of supporting a rich mix of device types that can be present in a device topology of a user, see par. [0103].
Regarding claim 29 Kahn teaches the one or more non-transitory computer-readable media of claim 20, wherein the one or more computers are configured to perform a selection process to select, from among the group, a device to respond based on device types of the devices in the group ( A supported approach to controlling which device responds is to choose the device that has been designated as the primary device for the interconnected devices. Responsive to determining that such a device is not available, a fall back list of devices can be used to determine which device is acting primary. The fall back list can be list of devices, list of device types, or list of device designations, see par. [0120]); 
user preference indicating a primary device designation for the interconnected electronic devices or recorded activity detected by one or more hardware sensors of the electronic device, see par. [0005]).  
Regarding claim 30 Leblang in view of Kang does not teach The one or more non-transitory computer-readable media of claim 20, wherein the group of multiple device comprises a robot; wherein the particular device sending the request to respond to the user utterance is the robot; wherein the robot begins to output a response to the user utterance before the one or more computers select the particular device to respond to the user utterance.  
In the same field of endeavor Sumiyoshi teaches a robot interactive communication system includes a robot being configured to interact with a user, an environment sensor being configured to detect an environmental condition of the space. The robot includes a speech information-based interaction unit, a text information-based interaction unit, see abstract. A service robot is expected to act as intended by the service developer and further, to respond appropriately to the situation. The situation includes the condition of the robot itself, the condition of the user, and other environmental conditions, see par. [0004]. Sumiyoshi teaches the plurality of devices including a robot (the mobile robot interactive communication system 10 includes service robots 20-a and 20-b, see par. [0029]); receiving, by the one or more computers, context data from the robot that describes conditions in an environment of the robot, the context data being based on sensor data captured by one or more sensors of the robot (The environment cameras 30, the display device 40, and the wireless access point 50 are installed in the working 
It would have been obvious to one of ordinary skill in the art to combine the Leblang invention with the teachings of Sumiyoshi for the benefit of using environmental conditions to provide an appropriate service response from a service robot, see par. [0004].
claim 31 Sumiyoshi teaches the one or more non-transitory computer-readable media of claim 30, wherein the response of the robot to the user utterance includes a movement of the robot (is preferable that environment sensors be installed in the space so that some of the environment sensors can detect the congestion degree at a specific place (A) to which the service robots 20 are directed to move, see par. [0039]).  
Regarding claim 32 Sumiyoshi teaches the one or more non-transitory computer-readable media of claim 30, wherein the robot sends the request based on one or more indications of engagement of the user with the robot that are detected using sensor data captured by the robot, and wherein the request claims an exclusive response to the user utterance (The condition on the attribute is the condition to be satisfied by the attribute assigned to a specific action (interaction) in the later-described scenarios 143. The condition on the environment is the condition to be satisfied by the environmental condition that can be acquired by the service robot 20. The condition is the condition on the type of interaction given in a content list 1432 in the later-described scenarios 143, see par. [0052]).  
Regarding claim 33 Sumiyoshi teaches the system of claim 18, wherein the group of devices comprises a robot; and wherein selecting the device to respond to the user utterance comprises selecting the robot to respond to the user utterance based on a determination, based on sensor data captured by the robot, that the user is walking toward the robot.  
In the same field of endeavor Sumiyoshi teaches a robot interactive communication system includes a robot being configured to interact with a user, an environment sensor being configured to detect an environmental condition of the space. The robot includes a speech information-based interaction unit, a text information-based interaction unit, see abstract. A service robot is expected to act as intended by the service developer and further, to respond 
It would have been obvious to one of ordinary skill in the art to combine the Leblang in view of Khan invention with the teachings of Sumiyoshi for the benefit of using environmental conditions to provide an appropriate service response from a service robot, see par. [0004].

Regarding claim 34 Sumiyoshi teaches the system of claim 18, wherein the selected device is selected further based on context data from a robot that includes indications of engagement of the user with the robot (robot includes a speech information-based interaction unit, see abstract).  



Claims 3, and 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leblang U.S. Patent No. 10.374.816 B1, in view of Khan U.S. PAP 2016/0155443 A1, in view of Sumiyoshi U.S. PAP 2018/0108352 A1, further in view of Carey 2018/0286391 A1.

Regarding claim 3 Leblang in view of Khan in view of Sumiyoshi does not teach the method of claim 1, further comprising determining a location of each of the plurality of devices 
Although Leblang teaches a home network it does not particularly teach wherein grouping at least some of the plurality of devices comprises defining the group to include devices that detected the same user utterance and that are located at a same location.  
In the same field of endeavor Carey teaches As described herein, the voice command communications link may be established in order for multiple user devices 210 to coordinate the execution of a voice command when the voice command is "heard" (e.g., when audio is received via an audio input device/microphone) by the multiple user devices 210 (e.g., in a situation in which the multiple user devices 210 are located in relatively close proximity to each other, such as when the multiple user devices 210 are located in the same room), the communications link may be established in order to improve the recognition of a voice command see par. [0061].
It would have been obvious to one of ordinary skill in the art at the time the invention as filed to combine the Leblang in view of Khan invention with the teachings of Carey for the benefit of establishing a communication link between devices in order to improve the recognition of a voice command, see par. [0061].
Regarding claim 9 Leblang in view of Khan does not teach the method of claim 1, wherein selecting the device to respond to the user utterance from among the multiple devices in the group comprises selecting the device based on noise levels of the detected voice inputs for the devices in the group.31 Attorney Docket No. 43374-0128001  
In the same field of endeavor Carey teaches a computer-implemented method includes exchanging device data, associated with a first participating user device, with the one or more identify illegible portions by comparing voiceprints associated with the audio data to voiceprints of static, background noise, and/or other obstructions that are consistent with illegible audio. Additionally, or alternatively, the combined audio data object generation module 630 may identify audibly illegible portions based on volume levels (e.g., portions of the audio that are less than a particular volume may be considered audibly illegible). In embodiments, the combined audio data object generation module 630 may score portions of the audio data (e.g., score each second or half-second of audio data) based on a level of legibility. The combined audio data object generation module 630 may retain the highest scored portions of audio data received or "heard" across all user devices 210 to form a combined audio data object that includes only the most audibly legible portions of audio, see par. [0074].
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Leblang in view of Khan in view of Sumiyoshi  invention with the 
Regarding claim 10 Leblang in view of Khan does not teach the method of claim 1, wherein selecting the device to respond to the user utterance from among the multiple devices in the group comprises selecting the device based on levels of speech power in the detected voice inputs for the devices in the group.  
In the same field of endeavor Carey teaches a computer-implemented method includes exchanging device data, associated with a first participating user device, with the one or more second participating user devices; receiving audio data associated with a voice command; exchanging the audio data with the one or more second participating user devices; identifying, by the first participating user device, a voice command based on exchanging the audio data; determining which one of the first participating user device or the one or more second participating user devices should respond to the voice command based on details of the voice command and the exchanging the device data, see abstract. The combined audio data object generation module 630 may combine multiple audio data objects or streams into a single combined audio data object by retaining the most audibly legible (e.g., audibly decipherable) portions of audio "heard" across all the user devices 210. In embodiments, the combined audio data object generation module 630 may identify the most audibly legible portions based on audio analysis techniques. More specifically, the combined audio data object generation module 630 may identify illegible portions by comparing voiceprints associated with the audio data to voiceprints of static, background noise, and/or other obstructions that are consistent with illegible audio. Additionally, or alternatively, the combined audio data object generation module 630 may identify audibly illegible portions based on volume levels (e.g., portions of the audio that are less 
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Leblang in view of Khan in view of Sumiyoshi invention with the teachings of Carey for the benefit of selecting the correct device using the highest scoring portions of audio which provide higher accuracy, see par. [0074].

Claims 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leblang U.S. Patent No. 10,374,816 B1., in view of Khan U.S. PAP 2016/0155443 A1, in view of Sumiyoshi U.S. PAP 2018/0108352 A1, further in view of Suzuki U.S. PAP 2019/0258369 A1

Regarding claim 21 Leblang in view of Khan in view of Sumiyoshi does not teach the method of claim 1, wherein selecting the device to respond to the user utterance from among the multiple devices in the group comprises selecting the device based on a pose of the user.  
In the same field of endeavor Suzuki teaches an information processing apparatus including: a display control unit that controls display of an operation object for a device to be operated; and a reference control unit that controls a reference of a location at which the operation object is displayed such that the operation object is able to be visually recognized, on the information processing system 1 may perform the first device selection on the basis of a user’s posture. More specifically, the device selection unit 102 selects a device or devices to be operated that is determined to fall within a range decided from the user's posture in the first device selection as an operation target device or candidate devices, see par. [0155].
It would have been obvious to one of ordinary skill in the art at the time the invention was filed to combine the Leblang in view of Khan in view of Sumiyoshi invention with the teachings of Suzuki for the benefit of operating devices with a sensation of displacing an actual object, see par. [0005].
Claims 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leblang U.S. Patent No. 10.374.816 B1, in view of Khan U.S. PAP 2016/0155443 A1, in view of Sumiyoshi U.S. PAP 2018/0108352 A1, further in view of Li. U.S. Patent No. 9,811,315 B1.

Regarding claim 22 Leblang in view of Khan in view of Sumiyoshi does not teach the method of claim 1, wherein selecting the device to respond to the user utterance from among the multiple devices in the group comprises selecting the device based on a gaze direction of the user.  
In the same field of endeavor Li teaches a device may also have a locating detector to identify a user and measure position of the user who has just generated verbal contents. A The locating detector may also be used to collect voice inputs from a target user only, where the target user may have gazed at a device or may be gazing at the device. Locating a target user becomes critical when multiple users are on site. For instance, a device may be configured to receive and interpret a voice input, identify and locate a user who just gives the voice input using a locating detector, measure the user's gazing direction, and then perform a task extracted from the voice input when the user gazes at the device simultaneously or within a given period of time after the voice input is received. Alternatively, a device may also be configured to monitor a user's gaze direction, measure and obtain position data of the user after the user gazes at the device, calculate a target position of sound source of the user, e.g., a position of the user's head or mouth, receive a voice input, ascertain whether the input comes from the target position, analyze the input if it is from the target position, ascertain whether the input contains a command, and then perform a task derived from the command when the input is received while the user is still gazing at the device or within a given time period after end of the gazing act, see col. 9 lines 29-54.
It would have been obvious to one of ordinary skill in the art at the time the invention as filed to combine the Leblang in view of Khan in view of Sumiyoshi invention with the teachings of Li in order to collect target voice from a target user only using the users gaze, see col. 9 lines 29-54.

Regarding claim 23 Leblang in view of Khan in view of Sumiyoshi does not teach the method of claim 1, wherein selecting the device to respond to the user utterance from among the 

In the same field of endeavor Li teaches device 12 may also include a sensor 20 which functions as a proximity detector, which is well known in the art and well developed too. Sensor 20 may be used to detector an object outside of the device and may have multiple sensing units. It may include a camera-like system to obtain visible images or infrared images and then recognize any movement through image analysis over a period of time. It may also have capability to sense whether device 12 is close to a user's body or whether it is held by a hand. Detection result may be used to determine an environment where a user is in, or the intention of a user, see col. 4 lines 17-28.
It would have been obvious to one of ordinary skill in the art at the time the invention as filed to combine the Leblang in view of Khan in view of Sumiyoshi invention with the teachings of Li in order to determine an environment where a user is in, or the intention of a user, see col. 4 lines 17-28.
Claims 35 is/are rejected under 35 U.S.C. 103 as being unpatentable over Leblang U.S. Patent No. 10.374.816 B1, in view of Khan U.S. PAP 2016/0155443 A1, in view of Sumiyoshi U.S. PAP 2018/0108352 A1, further in view of Amores U.S. PAP 2020/0020333 A1.
Regarding claim 35 Leblang, Kahn and Sumiyoshi do not teach the method of claim 1, wherein the response to the user utterance by the robot includes moving a robotic arm of the robot.
Ina  similar field of endeavor Amores teaches a communication system that allows a robot to interact with users via a single party dialogue strategy, see abstract. As a result of the increased abilities of machines to communicate with humans, machines, such as robots, are the robot 105 includes an example servo-motor 246 to control an appendage of the robot 105 (e.g., an "arm" that can be used to point/gesture), a motor used to propel the robot in a forward, backward, or side direction, etc. In some such examples, the example servo-motor controller 248 controls the servo-motor 246 (or other motor) in cooperation with the multi-party dialogue handler 210 when executing the multi-party dialogue processing algorithm 214.
It would have been obvious to one of ordinary skill in the art to combine the Leblang, Kahn and Sumiyoshi invention with the teachings of Amores for the benefit of assisting humans in public places, see par. [0002].


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711.  The examiner can normally be reached on Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656