DETAILED ACTION
Claims 1-20 are pending.
This communication is in response to the communication filed 1/14/2020.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) was submitted on 4/13/2020, which was before the mailing of a first Office action on the merits.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the 

Claims 1-5, 9-11, 13-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kim (US 20180336905 A1) in view of Edwards (GB 22574803 A) and Mahajan (US 20200082816 A1).
	As per independent claim 1, Kim teaches a computer system, comprising: 
one or more processors (see Kim [0071], which notes peripherals interface 218 is used to couple input and output peripherals of the device to CPU 220 and memory 202. The one or more processors 220 run or execute various software programs and/or sets of instructions stored in memory 202 to perform various functions for device 200 and to process data. In some embodiments, peripherals interface 218, CPU 220, and memory controller 222 are implemented on a single chip, such as chip 204. In some other embodiments, they are implemented on separate chips); and 
one or more computer-readable hardware storage devices having stored thereon computer-executable instructions that are executable by the one or more processors (see Kim [0071], which notes in some examples, a non-transitory computer-readable storage medium of memory 202 is used to store instructions (e.g., for performing aspects of processes described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In other examples, the instructions (e.g., for performing aspects of the processes described below) are stored on a non-transitory computer-readable storage medium (not shown) of the server system 108 or are divided between the non-transitory computer-readable storage medium of memory 202 and the non-transitory computer-readable storage medium of server system 108) to cause the computer system to: 
detect user input comprising a command that the computer system is to perform, the command being a type of command that the computer system is not able to perform natively (see Kim [0019], which notes techniques for far-field extension of digital assistant services by one or more service-extension devices can improve the user-interaction interface. For example, using one or more service-extension devices, a user is no longer required to be in close proximity (e.g., in the same room) with an electronic device for receiving digital assistant services provided by the digital assistant operating on the electronic device. Further, the service-extension devices can flexibly obtain responses to user requests from a device disposed in the vicinity of the user and/or a device disposed remotely, depending on the content of the user request. For example, if the user requests personal information (e.g., calendar events), a service-extension device may obtain a response from a device disposed in the vicinity of the user (e.g., the user's smartphone), rather than a remote device, thereby reducing the time required for providing services to the user. Under some circumstances, obtaining a response from a local device may also alleviate privacy concerns because sensitive or confidential information may be contained in a communicated between local devices. Further, the ability to obtain responses from different devices enhances the capability of a service-extension device to provide responses to a user. For example, if user-requested information cannot be obtained from one device (e.g., the user's smartphone), the service-extension device may obtain the response from another device (e.g., a server). As a result, a service-extension device can dynamically obtain responses from one or more devices, and efficiently extend digital assistant services from multiple devices);
 in response to the user input, parse the user input into a plurality of keywords representative of the command (see Kim [0250], which notes user data 748 includes user-specific information, such as user-specific vocabulary, user preferences, user address, user's default and secondary languages, user's contact list, and other short-term or long-term information for each user. In some examples, natural language processing module 732 uses the user-specific information to supplement the information contained in the user input to further define the user intent/keywords. For example, for a user request “invite my friends to my birthday party,” natural language processing module 732 is able to access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held, rather than requiring the user to provide such information explicitly in his/her request)
identify an external device that is able to perform the command (see Kim [0051], which notes the present disclosure further provides techniques for providing digital assistant services using multiple devices. As described above, providing digital assistant services using multiple devices can mitigate the device capability limitation. In some examples, a first electronic device receives a speech input representing a user request and obtains capability data associated with one or more electronic devices capable of being communicatively coupled to the first electronic device. The capability data can include device capabilities and informational capabilities. In accordance with the capability data, the first electronic device can identify a second electronic device for providing at least a portion of a response to the user request; and cause the second electronic device to provide at least a portion of the response); and
concatenate the sound-based activating keywords to generate a command phrase (see Kim [0295], which notes with reference to FIG. 11C, in some embodiments, user 804 may provide a speech input 1126 representing a user request/command for performing a task. Speech input 1126 may include, for example, “Play the movie Star Wars.” The digital assistant operating on device 810A receives speech input 1126. In some examples, based on speech input 1126, device 810A can provide a representation of a user request 1128 to a device that is disposed in the vicinity of device 810A (e.g., device 840) and not to a remote device (e.g., device 820 such as a server)).

However Edwards does teach a system comprising:
determine a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device (see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);
generate a first soundwave comprising the command phrase (see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);
subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to provide a response to the plurality of keywords included in the first soundwave, receive, from the external device, a second soundwave comprising the response (see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system; and see Edwards page 10, line 30—page 11, line 4, which notes the respective controller on the second/external device is configured to control its respective loudspeaker to emit said pattern with a predetermined timing relative to a portion of the audible content, played out from the respective loudspeaker; and the controller on the first device is configured to detect the predetermined pattern in an acoustic signal received by the microphone based on a comparison with and a reference instance of the pattern, and to thereby determine an average value of said network delay)).
Kim fails to specifically teach a system, comprising: determine a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device; generate a first soundwave comprising the command phrase; subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to provide a response to the plurality of keywords included in the first soundwave, receive, from the external device, a second soundwave comprising the response, and after parsing the response from the second soundwave, play a third soundwave over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.
The combination of Kim and Edwards fails to specifically teach after parsing the response from the second soundwave, play a third soundwave over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input.
However Mahajan does teach a system comprising:
after parsing the response from the second soundwave (see Mahajan FIG. 2, which shows a second soundwave 222/226), play a third soundwave (see Mahajan FIG. 2, which shows a third soundwave 248) over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input (see Mahajan [0017], which notes it should be understood that the ultrasonic wake word 112 and/or the ultrasonic identifier 114 can be within a human's normal hearing range. For example, the wake word can be audible and can serve to advise the user 130 that the television 110 is communicating to the digital assistant 120. The wake word and/or identifier can be played in the range of human hearing but can be hidden as a “fingerprint” within the normal audio stream. For example, the movie audio 116/second soundwave of “Hey, cool bike John!” can be modulated or transformed in such a way that it encodes the identifier without degrading the audio experience for the user 130. The term “ultrasonic” can generally be taken to mean “of a frequency and/or loudness that is humanly imperceptible” as appropriate. Similarly, the term “imperceptible” can mean a sound that is a frequency and/or volume that outside of human hearing or is otherwise unintelligible by humans (such as a fingerprint comprising slight modifications to an audio signal); and see Mahajan [0019], which notes, with reference to FIG. 1, in example environment 100, the digital assistant now has context (e.g., it “knows”/parses [by parsing the encoded movie audio 116] that the bicycle 118 was just presented on screen) for future prompts and/or commands. When the user 130 asks “Hey Assistant, how much is a new bike?” 132, the digital assistant 120 can determine that the user 130 is likely referring to the bicycle 118 on the television 110 or one similar. Without this context, the digital assistant 120 might instead determine incorrectly that the user is inquiring about a motorcycle or a children's bicycle; and see Mahajan [0020], which notes the digital assistant 120 can then provide a contextually/particularly relevant response/third soundwave 122 such as “the bike shown in the movie you're watching is the SprocketRocket 2000 and you can buy it for $250”). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim and Edwards with the smart-speaker’s audio signal directional isolation of Mahajan in order to preserve privacy by blocking or discarding audio from the user (see Mahajan [0061], which notes FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device. 


detecting user input comprising a command that the computer system is to perform, the command being a type of command that the computer system is not able to perform natively (see Kim [0019], which notes techniques for far-field extension of digital assistant services by one or more service-extension devices can improve the user-interaction interface. For example, using one or more service-extension devices, a user is no longer required to be in close proximity (e.g., in the same room) with an electronic device for receiving digital assistant services provided by the digital assistant operating on the electronic device. Further, the service-extension devices can flexibly obtain responses to user requests from a device disposed in the vicinity of the user and/or a device disposed remotely, depending on the content of the user request. For example, if the user requests personal information (e.g., calendar events), a service-extension device may obtain a response from a device disposed in the vicinity of the user (e.g., the user's smartphone), rather than a remote device, thereby reducing the time required for providing services to the user. Under some circumstances, obtaining a response from a local device may also alleviate privacy concerns because sensitive or confidential information may be contained in a communicated between local devices. Further, the ability to obtain responses from different devices enhances the capability of a service-extension device to provide responses to a user. For example, if user-requested information cannot be obtained from one device (e.g., the user's smartphone), the service-extension device may obtain the response from another device (e.g., a server). As a result, a service-extension device can dynamically obtain responses from one or more devices, and efficiently extend digital assistant services from multiple devices);
 in response to the user input, parsing the user input into a plurality of keywords representative of the command (see Kim [0250], which notes user data 748 includes user-specific information, such as user-specific vocabulary, user preferences, user address, user's default and secondary languages, user's contact list, and other short-term or long-term information for each user. In some examples, natural language processing module 732 uses the user-specific information to supplement the information contained in the user input to further define the user intent/keywords. For example, for a user request “invite my friends to my birthday party,” natural language processing module 732 is able to access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held, rather than requiring the user to provide such information explicitly in his/her request); 
identifying an external device that is able to perform the command (see Kim [0051], which notes the present disclosure further provides techniques for providing digital assistant services using multiple devices. As described above, providing digital assistant services using multiple devices can mitigate the device capability limitation. In some examples, a first electronic device receives a speech input representing a user request and obtains capability data associated with one or more electronic devices capable of being communicatively coupled to the first electronic device. The capability data can include device capabilities and informational capabilities. In accordance with the capability data, the first electronic device can identify a second electronic device for providing at least a portion of a response to the user request; and cause the second electronic device to provide at least a portion of the response); and
concatenating the sound-based activating keywords to generate a command phrase (see Kim [0295], which notes with reference to FIG. 11C, in some embodiments, user 804 may provide a speech input 1126 representing a user request/command for performing a task. Speech input 1126 may include, for example, “Play the movie Star Wars.” The digital assistant operating on device 810A receives speech input 1126. In some examples, based on speech input 1126, device 810A can provide a representation of a user request 1128 to a device that is disposed in the vicinity of device 810A (e.g., device 840) and not to a remote device (e.g., device 820 such as a server)).
Kim fails to specifically teach a method, comprising: determining a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device; generating a first soundwave comprising the command phrase; subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to providing a response to the plurality of 
However Edwards does teach a method comprising:
determining a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);  
generating a first soundwave comprising the command phrase (see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);
subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to providing a response to the plurality of keywords included in the first soundwave, receive, from the external device, a second soundwave comprising the response (see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system; and see Edwards page 10, line 30—page 11, line 4, which notes the respective controller on the second/external device is configured to control its respective loudspeaker to emit said pattern with a predetermined timing relative to a portion of the audible content, played out from the respective loudspeaker; and the controller on the first device is configured to detect the predetermined pattern in an acoustic signal received by the microphone based on a comparison with and a reference instance of the pattern, and to thereby determine an average value of said network delay)).
- Page 41 -Docket No. 22112.1.1
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.
The combination of Kim and Edwards fails to specifically teach after parsing the response from the second soundwave, play a third soundwave over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input.
However Mahajan does teach a method comprising:
after parsing the response from the second soundwave (see Mahajan FIG. 2, which shows a second soundwave 222/226), playing a third soundwave (see Mahajan FIG. 2, which shows a third soundwave 248) over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input (see Mahajan [0017], which notes it should be understood that the ultrasonic wake word 112 and/or the ultrasonic identifier 114 can be within a human's normal hearing range. For example, the wake word can be audible and can serve to advise the user 130 that the television 110 is communicating to the digital assistant 120. The wake word and/or identifier can be played in the range of human hearing but can be hidden as a “fingerprint” within the normal audio stream. For example, the movie audio 116/second soundwave of “Hey, cool bike John!” can be modulated or transformed in such a way that it encodes the identifier without degrading the audio experience for the user 130. The term “ultrasonic” can generally be taken to mean “of a frequency and/or loudness that is humanly imperceptible” as appropriate. Similarly, the term “imperceptible” can mean a sound that is a frequency and/or volume that outside of human hearing or is otherwise unintelligible by humans (such as a fingerprint comprising slight modifications to an audio signal); and see Mahajan [0019], which notes, with reference to FIG. 1, in example environment 100, the digital assistant now has context (e.g., it “knows”/parses [by parsing the encoded movie audio 116] that the bicycle 118 was just presented on screen) for future prompts and/or commands. When the user 130 asks “Hey Assistant, how much is a new bike?” 132, the digital assistant 120 can determine that the user 130 is likely referring to the bicycle 118 on the television 110 or one similar. Without this context, the digital assistant 120 might instead determine incorrectly that the user is inquiring about a motorcycle or a children's bicycle; and see Mahajan [0020], which notes the digital assistant 120 can then provide a contextually/particularly relevant response/third soundwave 122 such as “the bike shown in the movie you're watching is the SprocketRocket 2000 and you can buy it for $250”). 
(see Mahajan [0061], which notes FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device. 

As per independent claim 20, Kim teaches one or more hardware storage devices having stored thereon computer-executable-executable instructions that are executable by one or more processors of a computer system (see Kim [0071], which notes in some examples, a non-transitory computer-readable storage medium of memory 202 is used to store instructions (e.g., for performing aspects of processes described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In other examples, the instructions (e.g., for performing aspects of the processes described below) are stored on a non-transitory computer-readable storage medium (not shown) of the server system 108 or are divided between the non-transitory computer-readable storage medium of memory 202 and the non-transitory computer-readable storage medium of server system 108) to cause the computer system to: 
detect user input comprising a command that the computer system is to perform, the command being a type of command that the computer system is not able to perform natively (see Kim [0019], which notes techniques for far-field extension of digital assistant services by one or more service-extension devices can improve the user-interaction interface. For example, using one or more service-extension devices, a user is no longer required to be in close proximity (e.g., in the same room) with an electronic device for receiving digital assistant services provided by the digital assistant operating on the electronic device. Further, the service-extension devices can flexibly obtain responses to user requests from a device disposed in the vicinity of the user and/or a device disposed remotely, depending on the content of the user request. For example, if the user requests personal information (e.g., calendar events), a service-extension device may obtain a response from a device disposed in the vicinity of the user (e.g., the user's smartphone), rather than a remote device, thereby reducing the time required for providing services to the user. Under some circumstances, obtaining a response from a local device may also alleviate privacy concerns because sensitive or confidential information may be contained in a communicated between local devices. Further, the ability to obtain responses from different devices enhances the capability of a service-extension device to provide responses to a user. For example, if user-requested information cannot be obtained from one device (e.g., the user's smartphone), the service-extension device may obtain the response from another device (e.g., a server). As a result, a service-extension device can dynamically obtain responses from one or more devices, and efficiently extend digital assistant services from multiple devices);
 in response to the user input, parse the user input into a plurality of keywords representative of the command (see Kim [0250], which notes user data 748 includes user-specific information, such as user-specific vocabulary, user preferences, user address, user's default and secondary languages, user's contact list, and other short-term or long-term information for each user. In some examples, natural language processing module 732 uses the user-specific information to supplement the information contained in the user input to further define the user intent/keywords. For example, for a user request “invite my friends to my birthday party,” natural language processing module 732 is able to access user data 748 to determine who the “friends” are and when and where the “birthday party” would be held, rather than requiring the user to provide such information explicitly in his/her request); 
identify an external device that is able to perform the command (see Kim [0051], which notes the present disclosure further provides techniques for providing digital assistant services using multiple devices. As described above, providing digital assistant services using multiple devices can mitigate the device capability limitation. In some examples, a first electronic device receives a speech input representing a user request and obtains capability data associated with one or more electronic devices capable of being communicatively coupled to the first electronic device. The capability data can include device capabilities and informational capabilities. In accordance with the capability data, the first electronic device can identify a second electronic device for providing at least a portion of a response to the user request; and cause the second electronic device to provide at least a portion of the response); and
concatenate the sound-based activating keywords to generate a command phrase (see Kim [0295], which notes with reference to FIG. 11C, in some embodiments, user 804 may provide a speech input 1126 representing a user request/command for performing a task. Speech input 1126 may include, for example, “Play the movie Star Wars.” The digital assistant operating on device 810A receives speech input 1126. In some examples, based on speech input 1126, device 810A can provide a representation of a user request 1128 to a device that is disposed in the vicinity of device 810A (e.g., device 840) and not to a remote device (e.g., device 820 such as a server)).
Kim fails to specifically teach a system, comprising: determine a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device; generate a first soundwave comprising the command phrase; subsequent to playing the 
However Edwards does teach a system comprising:
determine a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device (see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);
generate a first soundwave comprising the command phrase (see Edwards page 17, lines 13-20, which notes the incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language The incoming audio processing module 208 is configured to perform audio processing on the separated-out audio components, under the control of the controller 210. This may include a speech recognition algorithm to detect speech commands in the audio components. Speech command comprise words spoken by a user in a natural language. The controller 210 is configured to execute the detected speech commands detected by the speech recognition algorithm, i.e. to trigger one or more functions specified by the speech command. This may comprise triggering one or more actions to be performed via the network interface 212 and the corresponding data network (see Edwards page 17, lines 23-24; see Edwards page 29, lines 1-4, which notes In any of the above uses of the inaudible acoustic channel or other use cases, in embodiments it also possible to create a mesh network whereby remote speakers can monitor other slave speaker audio, for example in other rooms, and communicate back to the central master, using a network over the hidden audio; and see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system);
subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to provide a response to the plurality of keywords included in the first soundwave, receive, from the external device, a second soundwave comprising the response (see Edwards page 17, lines 29-32, which note as another example of executing a speech command, the controller 210 may (again depending on the speech command) control an appliance or system around the home or office, for example to turn on/activate or off, mute or control the volume of a television or home entertainment system; and see Edwards page 10, line 30—page 11, line 4, which notes the respective controller on the second/external device is configured to control its respective loudspeaker to emit said pattern with a predetermined timing relative to a portion of the audible content, played out from the respective loudspeaker; and the controller on the first device is configured to detect the predetermined pattern in an acoustic signal received by the microphone based on a comparison with and a reference instance of the pattern, and to thereby determine an average value of said network delay)).
Kim fails to specifically teach a system, comprising: determine a sound-based activating phrase that, when detected by a microphone of the external device, activates the external device; generate a first soundwave comprising the command phrase; subsequent to playing the first soundwave over a speaker of the computer system, which first soundwave triggers the external device to be activated and to provide a response to the plurality of keywords included in the first soundwave, receive, from the external device, a second soundwave comprising the response, and after parsing the response from the second soundwave, play a third soundwave over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.
The combination of Kim and Edwards fails to specifically teach after parsing the response from the second soundwave, play a third soundwave over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input.
However Mahajan does teach a system comprising:
after parsing the response from the second soundwave (see Mahajan FIG. 2, which shows a second soundwave 222/226), play a third soundwave (see Mahajan FIG. 2, which shows a third soundwave 248) over the speaker, the third soundwave comprising one or more portions of the response such that the third soundwave operates as a particular response to the user input (see Mahajan [0017], which notes it should be understood that the ultrasonic wake word 112 and/or the ultrasonic identifier 114 can be within a human's normal hearing range. For example, the wake word can be audible and can serve to advise the user 130 that the television 110 is communicating to the digital assistant 120. The wake word and/or identifier can be played in the range of human hearing but can be hidden as a “fingerprint” within the normal audio stream. For example, the movie audio 116/second soundwave of “Hey, cool bike John!” can be modulated or transformed in such a way that it encodes the identifier without degrading the audio experience for the user 130. The term “ultrasonic” can generally be taken to mean “of a frequency and/or loudness that is humanly imperceptible” as appropriate. Similarly, the term “imperceptible” can mean a sound that is a frequency and/or volume that outside of human hearing or is otherwise unintelligible by humans (such as a fingerprint comprising slight modifications to an audio signal); and see Mahajan [0019], which notes, with reference to FIG. 1, in example environment 100, the digital assistant now has context (e.g., it “knows”/parses [by parsing the encoded movie audio 116] that the bicycle 118 was just presented on screen) for future prompts and/or commands. When the user 130 asks “Hey Assistant, how much is a new bike?” 132, the digital assistant 120 can determine that the user 130 is likely referring to the bicycle 118 on the television 110 or one similar. Without this context, the digital assistant 120 might instead determine incorrectly that the user is inquiring about a motorcycle or a children's bicycle; and see Mahajan [0020], which notes the digital assistant 120 can then provide a contextually/particularly relevant response/third soundwave 122 such as “the bike shown in the movie you're watching is the SprocketRocket 2000 and you can buy it for $250”). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim and Edwards with the smart-speaker’s audio signal directional isolation of Mahajan in order to preserve privacy by blocking or discarding audio from the user (see Mahajan [0061], which notes FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device. 

	As per claim 2, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
(see Edwards page 4, lines 26-28, which notes the inaudible signal may be an audible-frequency signal emitted at an inaudible power level relative to said portion of audio content played out by the loudspeaker of the smart speaker unit).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).


As per claim 3, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
	Edwards further teaches wherein a frequency of the first soundwave is in an inaudible sound range (see Edwards page 4, line 24, which notes in embodiments the inaudible signal may be an ultrasound signal).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim, Edwards, and Mahajan includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.

As per claim 4, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
	Edwards further teaches wherein a frequency of the first soundwave is in an inaudible sound range (see Edwards page 7, line 27—page 8, line 9, which notes in embodiments the controller on the first device may be configured to include one or more control settings for the second device in the further inaudible signal), a frequency of the second soundwave is in the inaudible sound range (see Edwards, page 11, lines 15-17, which notes the respective controller on the second device is further configured to communicate with the controller on the first device by controlling the loudspeaker of the second device to emit an inaudible signal), and a frequency of the third soundwave is in an audible sound range (see Edwards, page 10, lines 27-29, which notes the controllers on the first and second devices are configured to control their respective loudspeakers to play out said portion of audible content in parallel with one another).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim, Edwards, and Mahajan includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.

As per claim 5, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
	Edwards further teaches wherein the user input is audible input spoken by a user (see Edwards page 9, lines 15-21, which notes in embodiments there may be provided a set of two or more smart-speaker units, each comprising a respective: loudspeaker; microphone for receiving voice inputs from the user; controller configured to submit the voice inputs received by the respective microphone to a speech recognition algorithm to recognize and execute speech commands from the voice inputs).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim and Mahajan with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim, Edwards, and Mahajan includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.

	Kim further teaches wherein the one or more portions of the response are particular keywords, and wherein the computer system concatenates the particular keywords with additional keywords that are selected by the computer system for responding to the user input (see Kim [0055], which notes, with reference to FIG. 1, a block diagram of system 100 according to various examples. In some examples, system 100 implements a digital assistant. The terms “digital assistant,” “virtual assistant,” “intelligent automated assistant,” or “automatic digital assistant” refer to any information processing system that interprets natural language input in spoken and/or textual form/particular keywords to infer user intent/additional keywords, and performs actions based on the inferred user intent. For example, to act on an inferred user intent, the system performs one or more of the following: identifying a task flow with steps and parameters designed to accomplish the inferred user intent, inputting specific requirements from the inferred user intent into the task flow; executing the task flow by invoking programs, methods, services, APIs, or the like; and generating output responses to the user in an audible (e.g., speech) and/or visual form).

As per claim 10, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
	Kim further teaches wherein the command phrase is an interrogatory statement to which the external device is to provide an answer (see Kim [0056], which notes more specifically, a digital assistant is capable of accepting a user request at least partially in the form of a natural language command, request, statement, narrative, and/or inquiry. Typically, the user request seeks either an informational answer or performance of a task by the digital assistant. A satisfactory response to the user request includes a provision of the requested informational answer, a performance of the requested task, or a combination of the two).

As per claim 11, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
	Kim further teaches wherein the command phrase is a declarative statement to which the external device is to respond (see Kim [0056], which notes more specifically, a digital assistant is capable of accepting a user request at least partially in the form of a natural language command, request, statement, narrative, and/or inquiry. Typically, the user request seeks either an informational answer or performance of a task by the digital assistant. A satisfactory response to the user request includes a provision of the requested informational answer, a performance of the requested task, or a combination of the two).


As per claim 13, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above.
 Kim in view of Edwards and Mahajan fails to specifically teach wherein the user input is an interrogatory statement asking the computer system a question, and wherein the third soundwave provides an answer to the question.
 (see Mahajan [0019], which notes, with reference to FIG. 1, in example environment 100, the digital assistant now has context (e.g., it “knows” that the bicycle 118 was just presented on screen) for future prompts and/or commands. When the user 130 asks “Hey Assistant, how much is a new bike?” 132, the digital assistant 120 can determine that the user 130 is likely referring to the bicycle 118 on the television 110 or one similar. Without this context, the digital assistant 120 might instead determine incorrectly that the user is inquiring about a motorcycle or a children's bicycle; and see Mahajan [0020], which notes the digital assistant 120 can then provide a contextually relevant response/third soundwave 122 such as “the bike shown in the movie you're watching is the SprocketRocket 2000 and you can buy it for $250”).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards with the smart-speaker’s audio signal directional isolation of Mahajan in order to preserve privacy by blocking or discarding audio from the user  (see Mahajan [0139], which notes [0061] FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).


As per claim 14, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
Kim in view of Edwards and Mahajan fails to specifically teach wherein the computer system communicates with external devices only by sound.
However, Mahajan does teach wherein the computer system communicates with external devices only by sound (see Mahajan Abstract, which notes various embodiments of systems and methods allow a system to embed an item identifier into a content item. A first/external device can then play an audio trigger that is imperceptible to humans before playing the item identifier. A second device can go into an active listening mode after detecting the audio trigger and record an audio segment contain the embedded item identifier. A system can then decode the item identifier to determine an appropriate context for the second device. The second device can then receive a vocal command or query and respond according to the determined context. In one example, the first device can be a television, and the second device can be a digital assistant (e.g., Amazon Alexa) that detects advertisements played on the television via audio signals embedded in accompanying audio streams. Subsequent user interactions/voice I/O with the digital assistant can then be informed by the context of the recently-heard advertisements).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by (see Mahajan [0139], which notes [0061] FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device.

As per claim 16, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above.
Edwards further teaches wherein a frequency of the first soundwave and a frequency of the second soundwave are both in an inaudible sound range (see Edwards page 20, lines 15-16, which notes with respect to FIG. 2, illustrated embodiments wherein the inaudible acoustic channel is implemented by means of ultrasound, i.e. in a different acoustic frequency band than the audible play-out, so that the inaudible acoustic channel in a specific example is ultrasound, and more generally so that the inaudible acoustic channel is in a different acoustic band than the audible play-out; and see Edwards page 23, lines 18-24, which notes in the case of multiple slave units 102s, the controller 210 on each of the slaves 102s emits its respective inaudible signal in a different time slot (time division multiplexing) such that the master unit 102m can distinguish between the instances of the pattern received from the different slave units 102s. Alternatively any other form of multiplexing could be used, such as by having each slave 102s emit its signal with a different respective ultrasound frequency (frequency division multiplexing), so that a first soundwave has a first frequency that is different from a second frequency of the second soundwave, where the frequency of the first soundwave is in a different acoustic band than the audible play-out, and the frequency of the second soundwave is in a different acoustic band than the audible play-out, and where the acoustic band of the first soundwave is not necessarily ultrasonic, and the acoustic band of the second sound wave can be ultrasonic).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.
Kim in view of Edwards and Mahajan fails to specifically teach and wherein the frequency of the first soundwave is below about 20 Hz and the frequency of the second soundwave is above about 20 KHz.
However, Mahajan does teach and wherein the frequency of the first soundwave is below about 20 Hz and the frequency of the second soundwave is above about 20 KHz (see Mahajan [0064], which notes that FIG. 4 shows an example graph 400 of where identifiers can be encoded into an audio segment. For example, region 402 shows the envelope of human hearing. This region represents the general limit to what a young person can hear. Below the region is too quiet to hear (i.e., the pressure is too low) whereas above the region represents where pain is felt. To the right (e.g., above around 20 kHz) is ultrasonic whereas on the left (e.g., below around 20 Hz, per FIG. 4) is infrasonic. The identifier can be coded in the ultrasonic range such as in region 406. This range can be above 10 kHz, 12.5 kHz, 15 kHz, 20 kHz, etc. and anywhere in between. Because many audio systems have a sampling limit of 41 kHz, the maximum frequency can be 20.5 kHz. The identifier can additionally or alternatively be encoded in the infrasonic range. For example, region 404 shows where the pressure is too low for the frequency and people cannot hear those sounds. The frequencies and pressure used for encoding the identifier can conform to the outside of the envelope of human hearing so that they do not detract from the listening experience).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards with the smart-speaker’s audio signal directional isolation of Mahajan in order to preserve privacy by blocking or discarding audio from the user  (see Mahajan [0139], which notes [0061] FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device.

As per claim 17, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above.
Edwards further teaches wherein a frequency of the first soundwave and a frequency of the second soundwave are both in an inaudible sound range (see Edwards page 20, lines 15-16, which notes with respect to FIG. 2, illustrated embodiments wherein the inaudible acoustic channel is implemented by means of ultrasound, i.e. in a different acoustic frequency band than the audible play-out, so that the inaudible acoustic channel in a specific example is ultrasound, and more generally so that the inaudible acoustic channel is in a different acoustic band than the audible play-out; and see Edwards page 23, lines 18-24, which notes in the case of multiple slave units 102s, the controller 210 on each of the slaves 102s emits its respective inaudible signal in a different time slot (time division multiplexing) such that the master unit 102m can distinguish between the instances of the pattern received from the different slave units 102s. Alternatively any other form of multiplexing could be used, such as by having each slave 102s emit its signal with a different respective ultrasound frequency (frequency division multiplexing), so that a first soundwave has a first frequency that is different from a second frequency of the second soundwave, where the frequency of the first soundwave is in a different acoustic band than the audible play-out, and the frequency of the second soundwave is in a different acoustic band than the audible play-out, and where the acoustic band of the first soundwave can be ultrasonic, and the acoustic band of the second sound wave is not necessarily ultrasonic).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.
Kim in view of Edwards and Mahajan fails to specifically teach and wherein the frequency of the first soundwave is above about 20 KHz and the frequency of the second soundwave is below about 20 Hz.
However, Mahajan does teach and wherein the frequency of the first soundwave is below about 20 Hz and the frequency of the second soundwave is above about 20 KHz (see Mahajan [0064], which notes that FIG. 4 shows an example graph 400 of where identifiers can be encoded into an audio segment. For example, region 402 shows the envelope of human hearing. This region represents the general limit to what a young person can hear. Below the region is too quiet to hear (i.e., the pressure is too low) whereas above the region represents where pain is felt. To the right (e.g., above around 20 kHz) is ultrasonic whereas on the left (e.g., below around 20 Hz, per FIG. 4) is infrasonic. The identifier can be coded in the ultrasonic range such as in region 406. This range can be above 10 kHz, 12.5 kHz, 15 kHz, 20 kHz, etc. and anywhere in between. Because many audio systems have a sampling limit of 41 kHz, the maximum frequency can be 20.5 kHz. The identifier can additionally or alternatively be encoded in the infrasonic range. For example, region 404 shows where the pressure is too low for the frequency and people cannot hear those sounds. The frequencies and pressure used for encoding the identifier can conform to the outside of the envelope of human hearing so that they do not detract from the listening experience).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards with the smart-speaker’s audio signal directional isolation of Mahajan in order to preserve privacy by blocking or discarding audio from the user  (see Mahajan [0139], which notes [0061] FIG. 3 illustrates an example environment 300 demonstrating privacy features according to various embodiments. When the virtual assistant 120 detects an ultrasonic wake word from the television 110 it can direct its microphone(s) to the television 110 (e.g., isolate an audio signal from the television) so that only audio from the television is recorded. For example, audio within direction 310 pointing towards the televisions 110 can be recorded while audio from the user 130 in direction 312 can be blocked or discarded).
The combination of Kim and Edwards with Mahajan includes predictable results, such as directionally isolating audio from a smart-speaker device.

As per claim 18, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above.
(see Edwards page 20, lines 15-16, which notes with respect to FIG. 2, illustrated embodiments wherein the inaudible acoustic channel is implemented by means of ultrasound, i.e. in a different acoustic frequency band than the audible play-out, so that the inaudible acoustic channel in a specific example is ultrasound, and more generally so that the inaudible acoustic channel is in a different acoustic band than the audible play-out; and see Edwards page 23, lines 18-24, which notes in the case of multiple slave units 102s, the controller 210 on each of the slaves 102s emits its respective inaudible signal in a different time slot (time division multiplexing) such that the master unit 102m can distinguish between the instances of the pattern received from the different slave units 102s, so that a first soundwave has a first frequency that is the same as a second frequency of the second soundwave can be distinguished on the basis of the time slot (as compared against frequency division multiplexing, where the first soundwave can be distinguished from the second soundwave on the basis of frequency band).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim with the inaudible channel of Edwards in order to measure a filter effect of the surrounding environment and pre-compensate a sound signal using the inverse function of the filtering effect of the environment (see Edwards, page 24, line 31—page 25, line 11, which notes the master device 102m can also analyse the received inaudible signal for equalization, echo and reverberation effects the results can also be sent to the slave speakers so that pre-compensation can be applied to the audible signal prior to playout from the speaker. In this case the controller 210 on the master 102m is configured to perform a frequency domain transform (e.g. Fourier transform) on the received pattern in the inaudible signal in order to determine it received frequency profile (spectrum), e.g. power spectral density. By comprising with a reference spectrum (i.e. the known transmitted spectrum of the predetermined pattern), the controller 210 on the master 102m can thus determine a filtering effect of the environment, (e.g. room). In the architecture of Figure 3 this may be a function of the correlator 302 (on the master 102m). The controller 210 on the master 102m can then control the controllers 210 on the slaves 102 to apply in inverse of this filtering effect (wherein this control may be via the inaudible channel or may be via the data network)).
The combination of Kim and Edwards with Mahajan includes predictable results, such as using inaudible acoustic signals to control smart speaker devices.

Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Edwards and Mahajan and in further view of Smith (US 20200349935 A1).
As per claim 6, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above. 
Kim in view of Edwards and Mahajan fails to specifically teach wherein concatenating the sound-based activating phrase and the plurality of keywords including appending a time delay between the sound-based activating phrase and the plurality of keywords.
However, Smith does teach wherein concatenating the sound-based activating phrase and the plurality of keywords including appending a time delay between the sound-based activating phrase and the plurality of keywords (see Smith [0158], which notes in some embodiments, the predetermined time can increase or decrease depending on the number of NMDs that detected the wake-word event in voice input from the user. For example, if only one NMD detects the wake word, the predetermined time may be 1 seconds, whereas if two or more NMDs detect the wake word, the predetermined time may be 5 seconds. Such a determination may occur, for example, in conjunction with the selection of the particular NMD for outputting a response, as described above with respect to block 711 of FIGS. 7-9).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards and Mahajan with the ultrasound-based ranging of Smith in order to determine a user’s location (see Smith [0139], which notes in various embodiments, user location information can include or be based on any number of measured values, for example changing signal levels in captured voice input (e.g., increasing volume indicates a user is moving toward the NMD, while decreasing volume over time indicates a user is moving away from the NMD), changing acoustic signatures, detection of signal strength from a wireless proximity beacon (e.g., a Bluetooth low energy (BTLE) transmitter, near-field communication (NFC) transmitter, etc.), or any other suitable technique. For example, a user's smartphone, smartwatch, or other device may be outfitted with one or more wireless proximity beacons, allowing each NMD to independently sense a user's proximity as the user moves about the environment. In some embodiments, an NMD can be configured to emit an ultrasound signal and, based on the detected reflected ultrasound received at the NMD, determine a user's location, as described in U.S. patent application Ser. No. 16/149,992, entitled “Systems and Methods of User Localization,” which is hereby incorporated by reference in its entirety; see Smith Abstract, which notes the selection of one NMD over another for outputting a response can be based at least in part on user location information; see Smith [0031], which notes multiple NMDs may coordinate responsibility for voice control interactions to deliver an improved user experience; and see Smith [0130], which notes multiple NMDs may coordinate to provide the user experience of a persistent VAS interaction across multiple NMDs).
The combination of Kim, Edwards, and Mahajan with Smith includes predictable results, such as using ultrasound signals emitted by an NMD to determine a user’s location.

As per claim 7, Kim in view of Edwards and Mahajan and in further view of Smith teaches all of the limitations of claim 6 above. 
Smith further teaches wherein the time delay is between 1 second and 5 seconds (see Smith [0158], which notes in some embodiments, the predetermined time can increase or decrease depending on the number of NMDs that detected the wake-word event in voice input from the user. For example, if only one NMD detects the wake word, the predetermined time may be 1 seconds, whereas if two or more NMDs detect the wake word, the predetermined time may be 5 seconds. Such a determination may occur, for example, in conjunction with the selection of the particular NMD for outputting a response, as described above with respect to block 711 of FIGS. 7-9).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards and Mahajan with the ultrasound-based ranging of Smith in order to determine a user’s location (see Smith [0139], which notes in various embodiments, user location information can include or be based on any number of measured values, for example changing signal levels in captured voice input (e.g., increasing volume indicates a user is moving toward the NMD, while decreasing volume over time indicates a user is moving away from the NMD), changing acoustic signatures, detection of signal strength from a wireless proximity beacon (e.g., a Bluetooth low energy (BTLE) transmitter, near-field communication (NFC) transmitter, etc.), or any other suitable technique. For example, a user's smartphone, smartwatch, or other device may be outfitted with one or more wireless proximity beacons, allowing each NMD to independently sense a user's proximity as the user moves about the environment. In some embodiments, an NMD can be configured to emit an ultrasound signal and, based on the detected reflected ultrasound received at the NMD, determine a user's location, as described in U.S. patent application Ser. No. 16/149,992, entitled “Systems and Methods of User Localization,” which is hereby incorporated by reference in its entirety; see Smith Abstract, which notes the selection of one NMD over another for outputting a response can be based at least in part on user location information; see Smith [0031], which notes multiple NMDs may coordinate responsibility for voice control interactions to deliver an improved user experience; and see Smith [0130], which notes multiple NMDs may coordinate to provide the user experience of a persistent VAS interaction across multiple NMDs).
The combination of Kim, Edwards, and Mahajan with Smith includes predictable results, such as using ultrasound signals emitted by an NMD to determine a user’s location.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Edwards and Mahajan in further view of Johnson (US 20090004633 A1).

Kim teaches user input comprising a command that the computer system is to perform, the command being a type of command that the computer system is not able to perform natively (see Kim [0057], which notes as shown in FIG. 1, in some examples, a digital assistant is implemented according to a client-server model. The digital assistant includes client-side portion 102 (hereafter “DA client 102”) executed on user device 104 and server-side portion 106 (hereafter “DA server 106”) executed on server system 108. DA client 102 communicates with DA server 106 through one or more networks 110. DA client 102 provides client-side functionalities such as user-facing input and output processing and communication with DA server 106. DA server 106 provides server-side functionalities for any number of DA clients 102 each residing on a respective user device 104.)
	Kim in view of Edwards and Mahahan fails to specifically teach wherein detecting the user input includes first detecting activation of a push button on the computer system and second receiving the user input after the push button has been activated.
	However, Johnson does teach wherein detecting the user input includes first detecting activation of a push button on the computer system and second receiving the user input after the push button has been activated (see Johnson [0062], which notes While certain embodiments have been described herein, it will be understood by one skilled in the art that the methods, systems, and apparatus of the present disclosure may be embodied in other specific forms without departing from the spirit thereof. For example, while the user input (e.g., to the methods of FIGS. 1-2 and system 300 of FIG. 3) has been described in the context of the sound of the person's/user's voice, other signals, such as mouse clicks, can be used to start and stop the speech recognizer. In exemplary embodiments, methods can utilize mouse clicks to signal when sound processing should start and stop. In alternative embodiments, there are alternative valid methods that do not involve mouse clicks, e.g., the speech recognizer starts automatically when a sound input is detected. Other devices could be used such as a push-to-talk microphone, although in general the exemplary embodiment is one where the user clicks or presses a button to indicate that he or she is about to start speaking, since it reduces the possibility that the ASR might be triggered by some extraneous sound).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the push button of the systems and methods as taught by Kim, Edwards, and Mahajan with the push button of Johnson in order to reduce the possibility that the ASR might be triggered by some extraneous sound (see Johnson [0062] While certain embodiments have been described herein, it will be understood by one skilled in the art that the methods, systems, and apparatus of the present disclosure may be embodied in other specific forms without departing from the spirit thereof. For example, while the user input (e.g., to the methods of FIGS. 1-2 and system 300 of FIG. 3) has been described in the context of the sound of the person's/user's voice, other signals, such as mouse clicks, can be used to start and stop the speech recognizer. In exemplary embodiments, methods can utilize mouse clicks to signal when sound processing should start and stop. In alternative embodiments, there are alternative valid methods that do not involve mouse clicks, e.g., the speech recognizer starts automatically when a sound input is detected. Other devices could be used such as a push-to-talk microphone, although in general the exemplary embodiment is one where the user clicks or presses a button to indicate that he or she is about to start speaking, since it reduces the possibility that the ASR might be triggered by some extraneous sound.).
The combination of Kim, Edwards, and Mahajan with Johnson includes predictable results, such as using a single pointing device to both scroll and select menu items.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Edwards and Mahajan and in further view of Wang (US 20200126566 A1).
As per claim 12, Kim in view of Edwards and Mahajan teaches all of the limitations of claim 1 above.
Kim in view of Edwards and Mahajan fails to specifically teach wherein the third soundwave is played using a preselected type of voice-like sound that is selected for responding to a user who provided the user input
	However Wang does teach wherein the third soundwave is played using a preselected type of voice-like sound that is selected for responding to a user who provided the user input (see Wang Abstract, which notes embodiments of the present disclosure provide a method and apparatus for voice interaction. A method may include: acquiring voice information input by a user; determining a response character matching the acquired voice information based on the acquired voice information; and responding to the acquired voice information using a voice recorded in advance for the response character or a voice synthesized based on a voice feature parameter of the response character).
(see Wang [0056], which notes in this implementation, a response text is determined through the voice response logic preset for the response character, so that the response voice is more targeted. In this implementation, the executing body may perform voice recognition on the voice information to obtain text information corresponding to the voice information. Then, various semantic analysis methods (for example, word segmentation, part-of-speech tagging, and named entity recognition) may be used to analyze the text information, thereby obtaining semantics corresponding to the text information, and finally determining the response text matching the semantics. The voice response logic may include a corresponding relationship between the text converted from the acquired voice information and the response text. For example, the response character is a girl having a playful character, and the text converted from the acquired voice information is “Please play a song”, then the response text may be “I guess you want to listen to this song”. The response character is a middle-aged person with a calm personality, and the text converted from the acquired voice information is “Please play a song”, then the response text may be “OK, please listen to this song”).
The combination of Kim, Edwards, and Mahajan with Wang includes predictable results, such as generating output speech for a particular user using a character voice having content targeted based on a semantic analysis of the input.

15 is rejected under 35 U.S.C. 103 as being unpatentable over Kim in view of Edwards and Mahajan and in further view of Zheng (US 20130288563 A1).
As per claim 15, Kim in view of Edwards and Mahajan teaches all the elements of claim 1.
Kim in view of Edwards and Mahajan fails to specifically teach wherein the computer system is included as a part of a plush toy.
However, Zheng does teach wherein the computer system is included as a part of a plush toy (see Zheng (US 20130288563 A1) [0081], which notes as used herein, the term "doll" is not limited solely to a fashion doll or play doll, but encompasses figurines, action figures, toy animals, plush toys, miniature animals, or any miniaturized or toy version of any living creature; see Zheng [0120] which notes the teddy bear 22d in FIGS. 11 and 13 can be modified to function as a base unit or station itself, so that the station 24d can be omitted and the elements of the station 24d can be provided as part of teddy bear 22d; and see Zheng [0175], which notes base station 2180 may comprise a number of devices, exemplary devices include: a computer, a personal computer, an electronic device, a portable electronic device, a laptop computer, a desktop computer, a personal digital assistant, and a hand held electronic device).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed inventions to modify the systems and methods as taught by Kim in view of Edwards and Mahajan with the electronic enhancement of a plush toy of Zheng in order to provide an interactive toy system that provides for interactive play between the system and the user (see Zheng [0082], which notes embodiments of the present invention provide an interactive toy system which allows the user to enact real-life activities of a doll, animal, action-figure or similar creature. More specifically, the present invention provides a toy system 20 which provides for interactive play between the system 20 and the user. The user can select different play programs which will program the doll or toy with certain emotions, responses or characters, and which will allow or direct the user to enact selected real-life activities for the doll or toy).
The combination of Kim, Edwards, and Mahajan with Zheng includes predictable results, such as programming a doll or toy with certain emotions responses or characters.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R HENNINGS whose telephone number is (571) 272-9676. The examiner can normally be reached on Monday-Friday 8:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre-Louis Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications 

/MARK HENNINGS/
Examiner, Art Unit 2659

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659