DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on August 16, 2022. Claims 1-6, 9-13, 16, and 17 are pending and have been examined. Claims 7, 8, 14, and 15 have been cancelled.
All previous objection/rejections not mentioned in the previous office action has been withdrawn by the examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on June 11, 2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments and Amendments
The amendment filed on August 16, 2022 has been entered. Claims 1-6, 9-13, 16, and 17 remain pending in the application. Applicant’s amendments to the claims have overcome the 35 U.S.C. 103 rejection set forth in the previous office action.
The applicant amends claims 1 and 9 by adding the limitations “received from another artificial intelligence robot and a second voice command”, “measure a first strength of the first voice command and a second strength of the second voice command, determine whether the first strength and the second strength are equal to or greater than a reference strength, recognize the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength, recognize a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service”, and “operate the voice recognition function in an activation state when the extracted voice identification information matches the voice identification information stored in the memory”. The applicant’s amendments to the claim have overcome the 35 U.S.C. 103 rejection set forth in the previous office action.
Applicant’s arguments with respect to the 35 U.S.C. 103 rejections for claims 1-6, 9-13, 16, and 17 have been considered but are moot because the arguments are directed towards amended claim language, addressed on new grounds of rejection below.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 2018021165).
Regarding claim 1, Mont-Reynaud teaches an artificial intelligence robot for providing a voice recognition service, the artificial intelligence robot comprising: 
a memory configured to store voice identification information ([0016] - The applications range from intelligent assistants, through speech-enabled devices of all types, including isolated interne of things (IoT) devices, to intelligent spaces with potentially many sensors and effectors, such as auto mobiles, intelligent homes, offices, stores and shopping malls, and humanoid robots. [0253] - FIG. 17B shows a non-transitory computer readable Flash random access memory (RAM) chip medium 1702 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein); 
a microphone configured to receive a first voice command received from another artificial intelligence robot and a second voice command ([0058] - In this disclosure, devices may have numerous sensors and effectors, but there must be (at the bare minimum) one microphone to receive speech… [0064] - In FIG. 4A, the human user 402 interacts with intelligent space 400. The user says the wakeword “Hey Room” to get the attention of the Room agent. Signals captured by sensors 432 (one of several cameras) or sensors 430 (one of several microphones) are input to the local devices 410 [0218] - This can, for example, act as a wake-up indicator when interacting with a robot); 
and a processor configured to ([0252] - FIG . 17A shows a non-transitory computer read able rotating disk medium 1701 that stores computer code that, if executed by a computer processor, would cause the computer processor to perform methods or partial method steps described herein): 
extract voice identification information from the wake-up command included in the first voice command ([0005] - In an Interactive Voice Response (IVR) system, a human user and a virtual assistant communicate over a phone line. They are engaged in conversation: the assistant listens to, and processes or attempts to process) everything the user says. [0009] - That is, the user must "wake up” the agent before another request can be processed. A common way to wake up an agent that is to say a wakeword such as “Hey Siri”, “ OK, Google,” or “Alexa” (which can be a single word or a multi-word phrase). [0010] - Note that the wakeword itself is not part of the request. It only serves to gain the agent's attention. [0071] - Engagement During a man - machine dialog, engagement refers to the willingness (or apparent ability) of an agent to receive a user request and process it. Processing a request, after receiving it as an input, normally includes understanding it, acting on it, and producing an answer as output. An engaged agent processes requests. A disengaged agent does not process requests; whether it actually “hears” them or not is immaterial. In common parlance, an engaged agent is said to be “listening,” and “has its microphone turned on,” and a disengaged agent“ has its microphone turned off," but this wording must not be taken literally, since a disengaged agent's microphone captures signals continuously when it waits for a wakeword. It is the ability to process requests that alone defines engagement. [0186] - In some embodiments, the only input is a microphone. To test user continuity, a voice match is used to see if a new user's voice matches that of the reference user. This is done with voiceprints. Speech audio from the reference user has been stored or used in step 1102 to create the needed reference voiceprint. The beginning and end of a new utterance by a new user are detected by a VAD is used in step 1104, and a new user voiceprint is computed. To test user continuity, evaluation 1106 compares the voiceprint of the new user with that of the reference user),
operate a voice recognition function in a deactivation state when the extracted voice identification information does not match the voice identification information stored in the memory ([0080] - Locked states - Locked states are recurrent, and thus engaged. A locked state is entered following an explicit Lock request. An agent leaves a locked state following an explicit Unlock request or a timeout. When verbal, a Lock request may specify an optional locking condition. If no locking condition is specified, the locking condition has the value True, and the state is locked unconditionally; if a locking condition is specified, it is evaluated at request processing time. After processing the first request, a locked agent remains in the same state and repeatedly processes additional requests, while the locking condition is satisfied, until it detects an explicit Unlock request or a timeout. [0144] - In some embodiments, Unlock indicator 981 is a natural language user request. Examples of possible Unlock requests include, e.g., “We're done” or “Dismiss” or “Break” or “Thanks”. [0215] - In some embodiments, the locking policy of FIG. 9C is applied in a literal way, and an utterance from a new user is entirely ignored if the new user does not match the reference user);
and operate the voice recognition function in an activation state when the extracted voice identification information matches the voice identification information stored in the memory ([0080] - Locked states - Locked states are recurrent, and thus engaged. A locked state is entered following an explicit Lock request. An agent leaves a locked state following an explicit Unlock request or a timeout. When verbal, a Lock request may specify an optional locking condition. If no locking condition is specified, the locking condition has the value True, and the state is locked unconditionally; if a locking condition is specified, it is evaluated at request processing time. After processing the first request, a locked agent remains in the same state and repeatedly processes additional requests, while the locking condition is satisfied, until it detects an explicit Unlock request or a timeout. [0144] - In some embodiments, Unlock indicator 981 is a natural language user request. Examples of possible Unlock requests include, e.g., “We're done” or “Dismiss” or “Break” or “Thanks”. [0215] - In some embodiments, the locking policy of FIG. 9C is applied in a literal way, and an utterance from a new user is entirely ignored if the new user does not match the reference user).
However, Mont-Reynaud does not teach the processor further configured to:
measure a first strength of the first voice command and a second strength of the second voice command,
determine whether the first strength and the second strength are equal to or greater than a reference strength, 
recognize the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength, 
and recognize a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service.
Park does teach the processor further configured to:
measure a first strength of the first voice command and a second strength of the second voice command ([0061] - the received intensity is a first level and if received intensity of the voice input received by another external electronic device is at a second level lower than the first level),
determine whether the first strength and the second strength are equal to or greater than a reference strength ([0061] - the processor 110 may set the threshold of received intensity for the microphone 120 and the received intensity for the microphone included in the external electronic device to values between the first level and the second level), 
recognize the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength ([0063] - the degree of noise measured through the microphone 120), 
recognize a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service ([0062] - the processor 110 may wake up the electronic device 100 only if voice input from the first user is received).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in order to incorporate the teachings of Park in order to implement the processor further configured to: measure a first strength of the first voice command and a second strength of the second voice command, determine whether the first strength and the second strength are equal to or greater than a reference strength, recognize the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength, and recognize a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service. Doing so allows the device to provide an indication depending on the threshold of received intensity in order to guide the user (Park [0061]).
Claims 9, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 2018021165) and further in view of Katagiri (U.S. Patent No. 9123351).
Regarding claim 9, Mont-Reynaud teaches a method of operating an artificial intelligence robot for providing a voice recognition service, the method comprising: 
receiving a voice command ([0016] - The applications range from intelligent assistants, through speech-enabled devices of all types, including isolated interne of things (IoT) devices, to intelligent spaces with potentially many sensors and effectors, such as auto mobiles, intelligent homes, offices, stores and shopping malls, and humanoid robots. [0058] - In this disclosure, devices may have numerous sensors and effectors, but there must be (at the bare minimum) one microphone to receive speech… [0064] - In FIG. 4A, the human user 402 interacts with intelligent space 400. The user says the wakeword “Hey Room” to get the attention of the Room agent. Signals captured by sensors 432 (one of several cameras) or sensors 430 (one of several microphones) are input to the local devices 410 [0218] - This can, for example, act as a wake-up indicator when interacting with a robot);
extracting voice identification information from a wake-up command included in the voice command and used to activate the voice recognition service ([0005] - In an Interactive Voice Response (IVR) system, a human user and a virtual assistant communicate over a phone line. They are engaged in conversation: the assistant listens to, and processes or attempts to process) everything the user says. [0009] - That is, the user must "wake up” the agent before another request can be processed. A common way to wake up an agent that is to say a wakeword such as “Hey Siri”, “OK, Google”, or “Alexa” (which can be a single word or a multi-word phrase)); 
determining whether the extracted voice identification information matches voice identification information does not match the voice identification information stored in the memory ([0009] - That is, the user must "wake up” the agent before another request can be processed. A common way to wake up an agent that is to say a wakeword such as “Hey Siri”, “OK, Google”, or “Alexa” (which can be a single word or a multi-word phrase). [0010] - Note that the wakeword itself is not part of the request. It only serves to gain the agent's attention. [0071] - Engagement During a man - machine dialog, engagement refers to the willingness (or apparent ability) of an agent to receive a user request and process it. Processing a request, after receiving it as an input, normally includes understanding it, acting on it, and producing an answer as output. An engaged agent processes requests. A disengaged agent does not process requests; whether it actually “hears” them or not is immaterial. In common parlance, an engaged agent is said to be “listening,” and “has its microphone turned on,” and a disengaged agent“ has its microphone turned off," but this wording must not be taken literally, since a disengaged agent's microphone captures signals continuously when it waits for a wakeword. It is the ability to process requests that alone defines engagement. [0186] - In some embodiments, the only input is a microphone. To test user continuity, a voice match is used to see if a new user's voice matches that of the reference user. This is done with voiceprints. Speech audio from the reference user has been stored or used in step 1102 to create the needed reference voiceprint. The beginning and end of a new utterance by a new user are detected by a VAD is used in step 1104, and a new user voiceprint is computed. To test user continuity, evaluation 1106 compares the voiceprint of the new user with that of the reference user); 
and operating a voice recognition function in a deactivation state when the extracted voice identification information does not match the voice identification stored in the memory ([0080] - Locked states - Locked states are recurrent, and thus engaged. A locked state is entered following an explicit Lock request. An agent leaves a locked state following an explicit Unlock request or a timeout. When verbal, a Lock request may specify an optional locking condition. If no locking condition is specified, the locking condition has the value True, and the state is locked unconditionally; if a locking condition is specified, it is evaluated at request processing time. After processing the first request, a locked agent remains in the same state and repeatedly processes additional requests, while the locking condition is satisfied, until it detects an explicit Unlock request or a timeout. [0144] - In some embodiments, Unlock indicator 981 is a natural language user request. Examples of possible Unlock requests include, e.g., “We're done” or “Dismiss” or “Break” or “Thanks”. [0215] - In some embodiments, the locking policy of FIG. 9C is applied in a literal way, and an utterance from a new user is entirely ignored if the new user does not match the reference user).
However, Mont-Reynaud does not teach the method comprising:
measuring a first strength of the first voice command and a second strength of the second voice command,
determining whether the first strength and the second strength are equal to or greater than a reference strength, 
recognizing the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength, 
and recognizing a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service.
Park does teach the method comprising:
measuring a first strength of the first voice command and a second strength of the second voice command ([0061] - the received intensity is a first level and if received intensity of the voice input received by another external electronic device is at a second level lower than the first level),
determining whether the first strength and the second strength are equal to or greater than a reference strength ([0061] - the processor 110 may set the threshold of received intensity for the microphone 120 and the received intensity for the microphone included in the external electronic device to values between the first level and the second level), 
recognizing the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength ([0063] - the degree of noise measured through the microphone 120), 
and recognizing a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service ([0062] - the processor 110 may wake up the electronic device 100 only if voice input from the first user is received).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in order to incorporate the teachings of Park in order to implement the method comprising: measuring a first strength of the first voice command and a second strength of the second voice command, determining whether the first strength and the second strength are equal to or greater than a reference strength, recognizing the second voice command as ambient noise, when the first strength is equal to or greater than the reference strength and the second strength is less than the reference strength, and recognizing a wake-up command from the first voice command, wherein the wake- up command is used to activate the voice recognition service. Doing so allows the device to provide an indication depending on the threshold of received intensity in order to guide the user (Park [0061]).
However, Mont-Reynaud in view of Park does not teach the method further comprising:
extracting a power spectrum of a specific frequency band from a voice data of the wake-up command,
determining whether the extracted power spectrum matches a predetermined power spectrum, 
and maintaining the voice recognition function of the artificial intelligence robot in the deactivation state when the extracted power spectrum matches the predetermined power spectrum.
Katagiri does teach the method further comprising:
extracting a power spectrum of a specific frequency band from a voice data of the wake-up command (Col 3, Rows 33-43 - When the estimated power spectrum of the stationary noise matches an actual noise power spectrum, the power spectrum values are all “1” as a result of the aforementioned division. By performing the above processing, the value of the spectral entropy in a segment including the stationary colored noise becomes higher as compared to the spectral entropy value in the speech segment. As a result, a difference between the spectral entropy value in the speech segment and the spectral entropy value in the segment including the stationary colored noise becomes larger, and the accuracy of the speech segment determination is thus improved),
determining whether the extracted power spectrum matches a predetermined power spectrum (Col 3, Rows 33-43 - When the estimated power spectrum of the stationary noise matches an actual noise power spectrum, the power spectrum values are all “1” as a result of the aforementioned division. By performing the above processing, the value of the spectral entropy in a segment including the stationary colored noise becomes higher as compared to the spectral entropy value in the speech segment. As a result, a difference between the spectral entropy value in the speech segment and the spectral entropy value in the segment including the stationary colored noise becomes larger, and the accuracy of the speech segment determination is thus improved),
and maintaining the voice recognition function of the artificial intelligence robot in the deactivation state when the extracted power spectrum matches the predetermined power spectrum (Col 3, Rows 33-43 - When the estimated power spectrum of the stationary noise matches an actual noise power spectrum, the power spectrum values are all “1” as a result of the aforementioned division. By performing the above processing, the value of the spectral entropy in a segment including the stationary colored noise becomes higher as compared to the spectral entropy value in the speech segment. As a result, a difference between the spectral entropy value in the speech segment and the spectral entropy value in the segment including the stationary colored noise becomes larger, and the accuracy of the speech segment determination is thus improved).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in order to incorporate the teachings of Katagiri in order to implement the method further comprising: extracting a power spectrum of a specific frequency band from a voice data of the wake-up command, determining whether the extracted power spectrum matches a predetermined power spectrum, and maintaining the voice recognition function of the artificial intelligence robot in the deactivation state when the extracted power spectrum matches the predetermined power spectrum. Doing so allows the smoothing of the power spectrum in very noisy conditions (Katagiri [Row 3, Cols 25-30]).
Regarding claim 16, Mont-Reynaud in view of Park teaches all of the limitations as in claim 1, above.
However, Mont-Reynaud in view of Park does not teach the artificial intelligence robot, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band.
Katagiri does teach the artificial intelligence robot, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band (Col 1, Rows 12-17 - The power of the signal is the time average of the square of the amplitude of the signal. However, when the level of the signal itself varies, it is difficult to accurately determine the speech segment based on the power of the signal. The level of the signal indicates the scale of the signal. Col 6, Rows 61-65 - At this time, it is desirable that the frequency range used to calculate the spectral entropy be a frequency range in which a speech spectrum is included. The frequency range in which the speech spectrum is included is 25OHZ to 4OOOHZ).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in order to incorporate the teachings of Katagiri in order to implement the artificial intelligence robot, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band. Doing so allows the smoothing of the power spectrum in very noisy conditions (Katagiri [Row 3, Cols 25-30]).
Regarding claim 17, Mont-Reynaud in view of Park in view of Katigiri teaches all of the limitations as in claim 9, above.
However, Mont-Reynaud in view of Park does not teach the method, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band.
Katagiri does teach the method, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band (Col 1, Rows 12-17 - The power of the signal is the time average of the square of the amplitude of the signal. However, when the level of the signal itself varies, it is difficult to accurately determine the speech segment based on the power of the signal. The level of the signal indicates the scale of the signal. Col 6, Rows 61-65 - At this time, it is desirable that the frequency range used to calculate the spectral entropy be a frequency range in which a speech spectrum is included. The frequency range in which the speech spectrum is included is 25OHZ to 4OOOHZ).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in order to incorporate the teachings of Katagiri in order to implement the method, wherein the extracted power spectrum is based on an amplitude squared value of a non-audible frequency band. Doing so allows the smoothing of the power spectrum in very noisy conditions (Katagiri [Row 3, Cols 25-30]).
Claims 2,3, and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 20180211665) and further in view of Tai (U.S. Publication No. 20200098380).
	Regarding claim 2, Mont-Reynaud in view of Park teaches all of the limitations as in claim 1, above.
However, Mont-Reynaud in view of Park does not teach the artificial intelligence robot according to claim 1, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command.
Tai does teach the artificial intelligence robot according to claim 1, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices. [0027] - FIG. 1 illustrates a system for encoding and decoding audio watermarks according to embodiments of the present disclosure. As illustrated in FIG. 1, a system 100 may include one or more devices 110, such as a first speech controlled device 110a and a second speech controlled device 110b (e.g., voice-enabled devices 110) . While FIG. 1 illustrates each of the devices 110 being a speech controlled device, the disclosure is not limited thereto and the system 100 may include any smart device capable of connecting to a wireless network).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park to incorporate the teachings of Tai in order to implement the artificial intelligence robot according to claim 1, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
	Regarding claim 3, Mont-Reynaud in view of Park in view of Tai teaches all of the limitations as in claim 2, above.
However, Mont-Reynaud in view of Park does not teach the artificial intelligence robot according to claim 2, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state.
Tai does teach the artificial intelligence robot according to claim 2, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state ([0031] - To prevent nearby devices from sending audio data to the server(s) 120, in some examples the system 100 may embed output audio data with an audio watermark to perform wakeword suppression. For example, if a representation of the wakeword is included in output audio data being sent to the first device 110a, the system 100 may embed the audio watermark in the output audio data. Thus, the second device 110b may detect the representation of the wakeword but may also detect the audio watermark instructing the second device 110b to ignore the wakeword).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park to incorporate the teachings of Tai in order to implement the artificial intelligence robot according to claim 2, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
Regarding claim 6, Mont-Reynaud in view of Park teaches all of the limitations as in claim 1, above.
However, Mont-Reynaud in view of Park does not teach the artificial intelligence robot according to claim 1, wherein the processor receives the voice command from another artificial intelligence robot, 
and wherein the voice command is a guidance message for inducing activation of the voice recognition service.
Tai does teach the artificial intelligence robot according to claim 1, wherein the processor receives the voice command from another artificial intelligence robot ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices)), 
and wherein the voice command is a guidance message for inducing activation of the voice recognition service ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices)).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park to incorporate the teachings of Tai in order to implement the artificial intelligence robot according to claim 1, wherein the artificial intelligence robot according to claim 1, wherein the processor receives the voice command from another artificial intelligence robot, and wherein the voice command is a guidance message for inducing activation of the voice recognition service. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
Claims 10 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 20180211665) in view of Katagiri (U.S. Patent No. 9123351) and further in view of Tai (U.S. Publication No. 20200098380).
Regarding claim 10, Mont-Reynaud in view of Park in view of Katagiri teaches all of the limitations as in claim 9, above.
However, Mont-Reynaud in view of Park in view of Katagiri does not teach the method according to claim 9, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command.
Tai does teach the method according to claim 9, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices. [0027] - FIG. 1 illustrates a system for encoding and decoding audio watermarks according to embodiments of the present disclosure. As illustrated in FIG. 1, a system 100 may include one or more devices 110, such as a first speech controlled device 110a and a second speech controlled device 110b (e.g., voice-enabled devices 110) . While FIG. 1 illustrates each of the devices 110 being a speech controlled device, the disclosure is not limited thereto and the system 100 may include any smart device capable of connecting to a wireless network).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Katagiri to incorporate the teachings of Tai in order to implement the method according to claim 9, wherein the voice identification information is information for identifying voice of another artificial intelligence robot and is a watermark inserted into the voice data corresponding to the wake-up command. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
Regarding claim 13, Mont-Reynaud in view of Park in view of Katagiri teaches all of the limitations as in claim 9, above.
However, Mont-Reynaud in view of Park in view of Katagiri does not teach the method according to claim 9, wherein the voice command is received from another artificial intelligence robot, 
and wherein the voice command is a guidance message for inducing activation of the voice recognition service.
Tai does teach the method according to claim 9, wherein the voice command is received from another artificial intelligence robot ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices)), 
and wherein the voice command is a guidance message for inducing activation of the voice recognition service ([0026] - To enable unique functionality between nearby devices, devices, systems and methods are disclosed that embed audio watermark (s) in output audio data and detect a presence of audio watermark (s) in input audio data. While an encoding algorithm and a decoding algorithm may be used to embed audio watermarks within any content, they enable the audio watermark (s) to be detected despite the presence of reverberation caused by sound wave transmission (e.g., when watermarked audio data is output by a loudspeaker and recaptured by a microphone). Thus, neighboring devices may embed audio watermarks to instruct other devices to perform an action, enabling local signal transmission and/or wakeword suppression (e.g., avoid cross-talk between devices)).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Katagiri to incorporate the teachings of Tai in order to implement the method according to claim 9, wherein the voice command is received from another artificial intelligence robot, and wherein the voice command is a guidance message for inducing activation of the voice recognition service. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
Claims 4 is rejected under 35 U.S.C. 103 as being unpatentable over over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 20180211665) in view of Tai (U.S. Publication No. 20200098380) and further in view of Yoshioka (U.S. Publication No. 20200349230).
	Regarding claim 4, Mont-Reynaud in view of Park in view of Tai teaches all of the limitations as in claim 3, above.
 However, Mont-Reynaud in view of Park in view of Tai does not teach the artificial intelligence robot according to claim 3, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data.
Yoshioka does teach the artificial intelligence robot according to claim 3, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data ([0031] - In further embodiments, an audio watermark is generated by one or more of the user devices. The audio watermark comprises the audio signature or the audio sig nature may be separately detected. The audio watermark may be a sound pattern having a frequency above the normal hearing range of a user, such as 20 Khz or higher, or may just be a sound that is inconspicuous to users so as not to interfere with the conversation).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Tai to incorporate the teachings of Yoshioka in order to implement the artificial intelligence robot according to claim 3, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data. Doing so allows the sound to be inconspicuous to users so that it does not interfere with conversations (Yoshioka [0031]).
Claim 11 is rejected over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 20180211665) in view of Katagiri (U.S. Patent No. 9123351) in view of Tai (U.S. Publication No. 20200098380) and further in view of Yoshioka (U.S. Publication No. 20200349230).
	Regarding claim 11, Mont-Reynaud in view of Park in view of Katagiri in view of Tai teaches all of the limitations as in claim 10, above.
However, Mont-Reynaud in view of Park in view of Katagiri does not teach the method according to claim 10, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state, 
and the watermark is inserted into a non-audible frequency band of a frequency band of the voice data.
Tai does teach the method according to claim 10, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state ([0031] - To prevent nearby devices from sending audio data to the server(s) 120, in some examples the system 100 may embed output audio data with an audio watermark to perform wakeword suppression. For example, if a representation of the wakeword is included in output audio data being sent to the first device 110a, the system 100 may embed the audio watermark in the output audio data. Thus, the second device 110b may detect the representation of the wakeword but may also detect the audio watermark instructing the second device 110b to ignore the wakeword).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Katagiri to incorporate the teachings of Tai in order to implement the method according to claim 10, wherein the watermark includes a signal indicating that the voice recognition function needs to be maintained in the deactivation state. Doing so allows audio watermarks to be detected despite reverberation caused by sound wave transmission which allows for local signal transmission and/or wakeword suppression (Tai [0026]).
However, Mont-Reynaud in view of Park in view of Katagiri in view of Tai does not teach the method according to claim 10, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data.
Yoshioka does teach the method according to claim 10, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data ([0031] - In further embodiments, an audio watermark is generated by one or more of the user devices. The audio watermark comprises the audio signature or the audio sig nature may be separately detected. The audio watermark may be a sound pattern having a frequency above the normal hearing range of a user, such as 20 Khz or higher, or may just be a sound that is inconspicuous to users so as not to interfere with the conversation).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Katagiri in view of Tai to incorporate the teachings of Yoshioka in order to the method according to claim 10, wherein the watermark is inserted into a non-audible frequency band of a frequency band of the voice data. Doing so allows the sound to be inconspicuous to users so that it does not interfere with conversations (Yoshioka [0031]).
Claims 5 is rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 2018021165) and further in view of Kang (U.S. Publication No. 20150026580).
Regarding claim 5, Mont-Reynaud in view of Park teaches all of the limitations as in claim 1, above.
However, Mont-Reynaud in view of Park does not teach the artificial intelligence robot according to claim 1, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band.
Kang does teach the artificial intelligence robot according to claim 1, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band ([0015] - In a further feature, the control unit comprises a sensor hub connected with the microphone of the first device, and an application processor, wherein the sensor hub compares the voice representative information with pre-stored reference Voice representative information, and Switches a sleep mode of the application processor to a wake-up mode in response to the comparison, and wherein the application processor in the wake-up mode controls the communication unit to establish a communication link with the second device by using the connection information of the second device. The communication unit comprises a short range communication unit for receiving the Voice representative information and the connection information of the second device via short range communication. When a similarity between the voice representative information and the reference voice representative information is equal to or greater than a predetermined value, the control unit establishes the communication link with the second device by using the connection information of the second device. When the similarity between the voice representative information and the reference voice representative information is less than the predetermined value, the control unit controls the communication unit to broadcast the Voice representative information received from the second device. The communication unit receives, from the second device, control information that is extracted from the voice representative information, and wherein the control unit performs a function that corresponds to the control information. [0061] - The sound communication method means a communication method of transmitting and receiving data by using a sound signal. For example, the second device 200 may broadcast data to the outside by inserting the data into an inaudible range or audible range (e.g., into music or announcement broadcasting) of the Sound signal. Also, the second device 200 may down-convert a voice signal having a high frequency band into a voice signal having a relatively low frequency band (e.g., a band equal to or less than 16kHz), and may broadcast the down-converted voice signal).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park to incorporate the teachings of Kang in order to implement the artificial intelligence robot according to claim 1, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band. Doing so allows audio signals to be processed by devices that can only process signals between a specific frequency threshold (Kang [0070]).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Mont-Reynaud (U.S. Publication No. 20180301151) in view of Park (U.S. Publication No. 2018021165) in view of Katagiri (U.S. Patent No. 9123351) and further in view of Kang (U.S. Publication No. 20150026580).
Regarding claim 12, Mont-Reynaud in view of Park in view of Katagiri teaches all of the limitations as in claim 9, above.
However, Mont-Reynaud in view of Park in view of Katagiri does not teach the method according to claim 9, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band.
Kang does teach the method according to claim 9, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band ([0015] - In a further feature, the control unit comprises a sensor hub connected with the microphone of the first device, and an application processor, wherein the sensor hub compares the voice representative information with pre-stored reference Voice representative information, and Switches a sleep mode of the application processor to a wake-up mode in response to the comparison, and wherein the application processor in the wake-up mode controls the communication unit to establish a communication link with the second device by using the connection information of the second device. The communication unit comprises a short range communication unit for receiving the Voice representative information and the connection information of the second device via short range communication. When a similarity between the voice representative information and the reference voice representative information is equal to or greater than a predetermined value, the control unit establishes the communication link with the second device by using the connection information of the second device. When the similarity between the voice representative information and the reference voice representative information is less than the predetermined value, the control unit controls the communication unit to broadcast the Voice representative information received from the second device. The communication unit receives, from the second device, control information that is extracted from the voice representative information, and wherein the control unit performs a function that corresponds to the control information. [0061] - The sound communication method means a communication method of transmitting and receiving data by using a sound signal. For example, the second device 200 may broadcast data to the outside by inserting the data into an inaudible range or audible range (e.g., into music or announcement broadcasting) of the Sound signal. Also, the second device 200 may down-convert a voice signal having a high frequency band into a voice signal having a relatively low frequency band (e.g., a band equal to or less than 16kHz), and may broadcast the down-converted voice signal).
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to have modified Mont-Reynaud in view of Park in view of Katagiri to incorporate the teachings of Kang in order to implement the method according to claim 9, wherein the voice identification information includes data obtained by converting a frequency band of the voice data corresponding to the wake-up command into a specific frequency band. Doing so allows audio signals to be processed by devices that can only process signals between a specific frequency threshold (Kang [0070]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Maisonnier (U.S. Publication No. 20130218339) teaches a humanoid robot equipped with a natural dialogue interface and a method for controlling the robot and corresponding program. Mitchell (U.S. Patent No. 10997971) teaches wakeword detection using a secondary microphone. Nakadai (U.S. Publication No. 20170053662) teaches acoustic processing apparatus and acoustic processing method. Rifkin (U.S. Publication No. 20040225498) teaches speaker recognition using local models. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658