DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments and amendments in the Amendment filed January 27, 2022 (herein “Amendment”), with respect to the objection to claim 3 have been fully considered and are persuasive.  The objection to claim 3 has been withdrawn. 
Applicant’s arguments and amendments in the Amendment filed January 27, 2022 (herein “Amendment”), with respect to the objections to the drawings and the Specification (title specifically) have been fully considered and are persuasive.  The objections to the drawings and the Specification have been withdrawn. 
Applicant's arguments and amendments in the Amendment regarding the rejection of claims 1-3 and 5-13 under 35 U.S.C. §§102-103 have been fully considered but they are not persuasive. It is noted though that claims 1, 2 and 10, previously rejected under 35 U.S.C. 102 are now rejected under 35 U.S.C. 103 in view of the amendment to claim 1 which incorporates the limitations of former claim 4, which was previously rejected under 35 U.S.C. 103. In this way, the grounds of rejection have changed, though, they are changed because of Applicant’s amendment.
First, Applicant argues on page 9 of the Amendment in characterizing col. 5, lines 45-47 of Mok that “Mok only outputs the second output data in response to the second command represented in the second input audio data,” which does not teach or suggest the claimed “based on the second wakeup word corresponding to a conversation mode being recognized after the first wakeup word is recognized, output a response corresponding to a third command received after the second wakeup word is recognized.” However, the Non-Final Office Action dated 10/27/2021 (herein “office received subsequent to the input audio data including the speech corresponding to the keyword. That is, the “first input audio data” of Mok is mapped to the claimed “a second wakeup word” as it is recognized after a first wakeup word (Mok’s input audio data including the speech corresponding to the keyword) is recognized. Then, Mok in col. 4, line 37 – col. 5, line 6 teaches that from the first output data corresponding to the first command which is determined from the first input audio data, the device 110 “is caused to” (mapped to “based on” as claimed – thus based on the first input audio which as given above, is mapped to the claimed “second wakeup word”) send second input audio data corresponding to captured audio to the server without first detecting the presence of a keyword in the input audio data (operate in an interaction mode since when the device sends input audio data without first detecting the presence of a keyword, it is operating in an interaction mode). Then, Mok’s “second input audio data” corresponds to the mapped “one or more commands” as it is processed by a server to determine whether there is a command (second command), and if so, second output data is determined responsive to a second command. See
Addressing then the limitations of claim 4, now amended into claim 1, Mok col. 5, lines 51-56 teaches that the device is again (a second time) instructed to send the server third input audio data without first detecting presence of a keyword (so the conversation mode continues – which corresponds to the claimed “based on the second wakeup word corresponding to a conversation mode being recognized after the first wakeup word is recognized). Mok further states in col. 5, lines 50-51 that the process described in fig. 1 is performed more than once in a row – therefore, in response to the server being sent the third input audio data, steps 138-146 of fig. 1 would be repeated, thus processing the third input audio data to output a response to third command data. Given that the third input audio data occurs after the second input audio data (corresponding to the “second wakeup word” since it causes the device to be/remain in the “process input audio without needing a wakeup word mode” (corresponding to claimed “interactive mode”), then the third input audio data when processed for the command contained therein (step 143) to find the output (step 144) will be providing that output after the second input data (second wakeup word) is recognized. 
Regarding the claimed “fourth command,” Mok in col. 5, lines 50-56 teaches that the instruction that causes the device to continue to process input audio without first detecting the keyword can be performed more than once in a row, and that this “mode” can continue forwarding subsequent input without requiring keyword detection and providing output steps a maximum number of times. Accordingly, a fourth input would be another performance of the forward subsequent input without requiring keyword detection step, as Mok teaches that this process of fig. 1 can be performed more than once. Given that a fourth command from a fourth input audio would come after a third 
Therefore, when all of the citations and rationale provided regarding the teachings of Mok is considered, Mok does not teach “only output[ting] second output data in response to the second command represented in the second input audio data,” as Applicant contends.
Applicant next argues on page 10 of the Amendment that the rationale provided in the office action on pages 8-9 regarding the correspondence of the output from the fourth command being related to the response of the third command is “not the same as ‘based on a new fourth command being received after the response corresponding to the third command is output, output a response corresponding to the new fourth command and related to at least one of the third command of the response corresponding to the third command,” in that Mok does not provide teachings of outputting a response to the latest command in consideration of the context of the previous conversation. However, claim 4 did not previously, and claim 1 does not presently, recite limitations directed towards the fourth command being “new” or having its response be generated in consideration of the context of the previous conversation, as Applicant remarks. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns
Applicant finally argues on page 10 that “performing a number of times of processing input audio without detecting a keyword to output data corresponding to the current input is also different than based on the second wakeup word corresponding to a conversation mode being recognized after the first wakeup word is recognized, output a response corresponding to a third command received after the second wakeup word is recognized.” However, as discussed above, each time Mok’s device continues in processing input audio without detecting a keyword, it is because of the instruction that is sent by the server along with the previous command’s/input audio’s response/output. Accordingly, each subsequent input audio corresponds to the second audio input (mapped to second wakeup word) since that second audio input initially prompted output that included from the server, the instruction for the device to continue processing/forwarding input audio to the server without detection of a keyword. It is noted here that to the extent Mok applies to the claims as they are currently written, it is because of the claim language reciting “based on” and “corresponding” which permits a broadest reasonable interpretation of any kind of basis (based on) or any kind of correspondence. For example, the broadest reasonable interpretation does not require the “corresponding” to mean “considering context of the previous conversation” which Applicant argues on the top of page 10, and addressed earlier herein.
Therefore, while all of Applicant’s arguments and amendments have been fully considered, they are not persuasive and the rejection in view of Mok against claims 1-3, and 5-13 is constructively maintained, now under 35 U.S.C. 103.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-10 are rejected under 35 U.S.C. 103 as being unpatentable over Mok et al., (US 10,847,149 B1, herein “Mok”).
Regarding claim 1, Mok teaches an artificial intelligence device comprising (Mok fig. 14, device 110, col. 28, ll. 11-21, col. 29, ll. 1-40, and col. 30, ll. 17-21, machine learning techniques including neural networks used to operate the components and steps disclosed, where disclosed components include user device 110 and server 120, and can be included as components of a larger device or system): 
an input interface configured to receive a speech input (Mok fig. 14, col. 29, ll. 31-53, I/O device interface with input for audio capture from a microphone); and 
a processor configured to (Mok fig. 14, col. 30, ll. 1-7, and col. 29, ll. 21-30, controller/processor which executes instructions for operating each device): 
operate in an interaction mode based on a second wakeup word for setting an operation mode being recognized after a first wakeup word for calling the artificial intelligence device is recognized (Mok fig. 1, col. 4, l. 19 – col. 5, l. 6, once the device performs steps 130 and 132, and thus determines input audio data includes speech corresponding to the keyword (first wakeup word), the device sends first input audio data to be processed to determine a first command (second wakeup word), and sends an instruction causing the device to (in an interaction mode), send second input audio data to the server without requiring the user to speak a keyword to input a further command, which may also deactivate the wakeword detection component); 
process one or more commands received after the second wakeup word according to the operation mode indicated by the second wakeup word (Mok col. 5, ll. 7-49, it is determine that the second input audio data (after the keyword/second wakeup word) is directed to the system and has a second command, from which then second output data is determined (so the second command is processed) and the second output data is sent to the device for output to the user);
Mok fig. 1, col. 5, ll. 22-59, while in an operational flow as shown in fig. 1 (conversation mode), the steps of sending subsequent input audio to the server for processing without detecting the presence of a keyword is repeated up to a maximum number of times, which would then obviously include a third input audio with a third command that is processed and for which output audio is returned, and a fourth input audio with a fourth command that is processed and for which output audio is returned, and where the fourth input audio would not be forwarded until after or during the output of the previous processed input audio’s output (see col. 5, ll. 17-21), therefore the output (response) corresponding to the fourth input audio corresponds (at least temporally – it follows in time) to the response of the processed third input audio’s output).
Although Mok teaches that the steps of forwarding received input audio and processing it for commands without detecting a wakeword can be performed more than once in a row and up to a maximum number of times, Mok does not necessarily teach that this number of times can be at least 2 (thus providing a third and fourth audio input processing)). However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have a number of times to see MPEP 2143(I)(A).
Regarding claim 2, Mok teaches wherein the processor is further configured to operate in the interactive mode based on the second wakeup word being continuously recognized after the first wakeup word is recognized (Mok col. 4, ll. 7-10, and col. 4, l. 65 – col. 5, l. 56, the device sends the second input data which is captured after or during (continuously) the output of the first content, and then processed without first detecting a wakeword, and where the process in fig. 1 is performed more than once in a row – thus continuous recognition of further input audio data without detecting presence of a keyword).
Regarding claim 3, Mok teaches wherein the processor is further configured to: operate in a normal mode to process a first command based on the first command being received after the first wakeup word is recognized; and process a second command based on the first wakeup word being recognized after the first command is processed and the second command is received after the first wakeup word is recognized (Mok col. 26, l. 30-50, col. 4, l. 29-37, after processing input audio to determine the audio includes a keyword (after the first wakeup word is recognized), speech processing is performed on the input audio data to determine a command and whether the command is one that is likely to be followed by a subsequent command, if it is not (then in a normal mode), the output audio corresponding to the command is determined and sent to the device which outputs it, and the server refrains from sending an instruction that would cause the device to send further audio data without first determining whether the audio data includes speech corresponding to a wakeword, and therefore, upon further audio being input, the processing would include determining if the audio has a keyword/wakeword before processing the next further audio).
Although Mok teaches both determining whether the input audio contains a keyword, and determining input audio processing to not send an instruction that causes further audio data without requiring the speech to have a wakeword, Mok discloses these two functionalities in two different figures. Therefore, Mok does not necessarily teach that the determination to refrain from sending the instruction to send further data without determining that there is a keyword in it, is performed with the initial detection of a keyword in input audio. However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the two teachings of Mok to arrive at a method that determines first input audio for a keyword and then later refrains from sending an instruction that would cause the device to send further audio data without first determining whether the audio data includes speech corresponding to a wakeword, at least because doing so would reduce system/user friction, thereby providing a better user experience (see Mok col. 4, ll. 4-6).
Regarding claim 5, Mok teaches wherein, in the conversation mode, based on the fourth command being received within a speech waiting period after the third command is processed, the processor is further configured to process the fourth command without input of the first wakeup word (Mok col. 5, ll. 50-59, col. 6, ll. 23-40, and col. 21, ll. 56-61 any subsequent commands while the system operates for a number of times without detecting a keyword/wakeword, are subject to being first determined to have voice activity, within a threshold amount of time (speech waiting period)), wherein the speech waiting period in the conversation mode is longer than a speech waiting period in the normal mode (Mok col. 6, ll. 31-35, col. 23, ll. 5-7, the threshold amount of time is configurable (such as 30 seconds) and can be a user preference stating how long to stay connected to the server – thus some length of time, whereas col. 4, ll. 19-33, teaches that in a mode where the keyword/wakeword has to be detected first before forwarding on input audio for processing to the server, the server is not sent the input audio automatically, so there is no speech waiting period, and hence the threshold amount of time is always longer).
Although Mok teaches that the steps of forwarding received input audio and processing it for commands without detecting a wakeword can be performed more than once in a row and up to a maximum number of times, Mok does not necessarily teach that this number of times can be at least 2 (thus providing a third and fourth audio input processing)). However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have a number of times to perform the processing of input audio without detecting a keyword to be at least 2, because doing so would reduce system/user friction, thereby providing a better user experience (see Mok col. 4, ll. 4-6), and as well would be combining prior art elements according to known methods to yield predictable results. see MPEP 2143(I)(A)
Regarding claim 6, Mok teaches wherein, based on the second wakeup word corresponding to a multiple request mode being recognized after the first wakeup word is recognized (Mok col. 4, l. 19- col. 5, l. 21, after processing input audio to determine the audio includes a keyword (after the first wakeup word is recognized), speech processing is performed on the input audio data to determine a command and whether the command is one that is likely to be followed by a subsequent command, if it is, then the process continues in a mode (multiple request mode) where additional input audio is forwarded without detecting a wakeword/keyword) and a fifth command and a sixth command are received after the second wakeup word is recognized, the processor is further configured to output a response corresponding to the fifth command and a response corresponding to the sixth command after the fifth command and the sixth command are received (Mok fig. 1, col. 5, ll. 22-59, while in an operational flow as shown in fig. 1 (multiple request mode), the steps of sending subsequent input audio to the server for processing without detecting the presence of a keyword is repeated up to a maximum number of times, which would then obviously include a fifth input audio with a fifth command that is processed and for which output audio is returned, and a sixth input audio with a sixth command that is processed and for which output audio is returned, and where the sixth input audio would not be forwarded until after or during the output of the previous processed input audio’s output (see col. 5, ll. 17-21), therefore the output (response) corresponding to the sixth input audio is output after the fifth and sixth input audio with the respective commands are sent to the server and processed (received)).
Although Mok teaches that the steps of forwarding received input audio and processing it for commands without detecting a wakeword can be performed more than once in a row and up to a maximum number of times, Mok does not necessarily teach that this number of times can be at least 2 (thus providing a fifth and sixth audio input processing)). However, it would have been obvious to one of ordinary skill in the art see MPEP 2143(I)(A).
Regarding claim 7, Mok teaches wherein the processor is further configured to distinguish the fifth command from the sixth command by using a first identification command included in the fifth command and a second identification command included in the sixth command (Mok fig. 6, col. 21, ll. 47-67, a user profile containing executable commands enabled with respect to a user preference, the commands being provided in the user profile, and thus distinguished from each other by each command having its own row, and where the user preference specifies whether a device should remain connected to the server and continue sending audio to the server after content responsive to the command is output by the device).
Regarding claim 8, Mok teaches wherein, based on the processor further receiving an additional response request including the first identification command (Mok fig. 6, col. 21, ll. 47-67, col. 4, ll. 34-45, a user profile containing executable commands enabled with respect to a user preference, the commands being provided in the user profile and when received by the system, are associated with how long the device should be connected to the server prior to the device re-entering sleep mode and requiring the user to speak a keyword to enter a command), the processor is further configured to output an additional response corresponding to the additional response request and related to the fifth command by using the stored fifth command (Mok col. 5, l. 45-59, additional requests are processed and have output generated for them, where the additional response/request will be related to another command (fifth command) in the sequential set of commands if it is requested within the maximum number of times that the system will process further input audio and commands, and where the fifth command can be a command stored in the user profile).
Although Mok teaches that the steps of forwarding received input audio and processing it for commands without detecting a wakeword can be performed more than once in a row and up to a maximum number of times, Mok does not necessarily teach that this number of times can be at least 2 (thus providing a fifth and sixth audio input processing)). However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have a number of times to perform the processing of input audio without detecting a keyword to be at least 2, because doing so would reduce system/user friction, thereby providing a better user experience (see Mok col. 4, ll. 4-6), and as well would be combining prior art elements according to known methods to yield predictable results. see MPEP 2143(I)(A).
Regarding claim 9, Mok teaches wherein, based on a seventh command being received in the multiple request mode within a speech waiting period after the fifth command and the sixth command are processed, the processor is further configured to process the seventh command without input of the first wakeup word (Mok fig. 1, col. 5, ll. 22-59, while in an operational flow as shown in fig. 1 (multiple request mode), the steps of sending subsequent input audio to the server for processing without detecting the presence of a keyword is repeated up to a maximum number of times, which would then obviously include a seventh input audio with a seventh command that is processed and for which output audio is returned, and where the seventh input audio would not be forwarded until after or during the output of the previous processed input audio’s output – the sixth and fifth command’s processed output (see col. 5, ll. 17-21)), 
wherein the speech waiting period in the multiple request mode is longer than a speech waiting period in the normal mode (Mok col. 6, ll. 31-35, col. 23, ll. 5-7, the threshold amount of time is configurable (such as 30 seconds) and can be a user preference stating how long to stay connected to the server – thus some length of time, whereas col. 4, ll. 19-33, teaches that in a mode where the keyword/wakeword has to be detected first before forwarding on input audio for processing to the server, the server is not sent the input audio automatically, so there is no speech waiting period, and hence the threshold amount of time is always longer).
Although Mok teaches that the steps of forwarding received input audio and processing it for commands without detecting a wakeword can be performed more than once in a row and up to a maximum number of times, Mok does not necessarily teach that this number of times can be at least 3 (thus providing a seventh audio input processing)). However, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have a number of times to perform the processing of input audio without detecting a keyword to be at least 3, because doing so would reduce system/user friction, thereby providing a better user experience (see Mok col. 4, ll. 4-6), and as well would be combining prior art elements according to known methods to yield predictable results. see MPEP 2143(I)(A).
Regarding claim 10, Mok teaches wherein, based on an end command being received in the interaction mode or a speech waiting period is ended, the processor is further configured to Mok col. 6, lines 23-33, if the server does not determine voice activity in the input audio data within a threshold amount of time (a speech waiting period is ended), the device is instructed to re-enter a sleep mode which results in the device thereafter requiring third input audio data including a keyword prior to the device sending the third input audio data to the server).
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Mok, as set forth above regarding claim 1 from which claim 11 depends, further in view of Olson et al., (US 2020/0312318 A1, herein “Olson”).
Regarding claim 11, Mok teaches wherein, based on the second wakeup word corresponding to a specific device call mode being recognized after the first wakeup word is recognized, the processor is further configured to (Mok col. 8, ll. 8-58, once the wakeword is detected (after the first wakeup word is recognized), audio data is sent to the server for processing and a command such as “call mom” is identified to determine that the user intends to activate a telephone (specific device call mode) in his/her device and to initiate a call to the entity “mom”).
Mok does not explicitly teach enable a speech agent of a home appliance indicated by the second wakeup word; and disable a speech agent of a home appliance other than the home appliance indicated by the second wakeup word.
Olson teaches enable a speech agent of a home appliance indicated by the second wakeup word; and disable a speech agent of a home appliance other than the home appliance indicated by the second wakeup word (Olson paras. [0084]-[0086], after a wakeword wakes the voice agent of both a smart washer W and a smart dryer D, with the voice command that follows (second wakeup word) being how much time is left on the smart dryer D, the voice agent of the smart washer W determines the inquiry is not intended for the smart washer W and transitions its voice agent back to the sleep mode (disables) but the smart dryer voice agent determines the inquiry was intended for it, and so goes into a thinking mode (enable) to determine an output to provide to the user).
Therefore, taking the teachings of Mok and Olson together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the processing of wakewords and commands as disclosed in Mok with enabling and disabling of speech agents for particular applications as disclosed in Olson at least because doing so would reduce a likelihood of user speech activating an unintentional or undesired action on the electronic device (see Olson para. [0042]).
Regarding claim 12, Mok does not explicitly teach the limitations of claim 12. Olson teaches wherein the second wakeup word corresponding to the specific device call mode includes a plurality of device call words set by a user and respectively corresponding to a plurality of home appliances (Olson paras. [0032], [0041]-[0043], user can use an interface to configure parameters of the system (set by a user), where such parameters include the vocabulary for commands to interpret a user utterance as being directed towards actions of a particular electronic device, where para. [0087] teaches the electronic devices can be a plurality of devices such as a washer and dryer (home appliances)).
Therefore, taking the teachings of Mok and Olson together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the .
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Mok, as set forth above regarding claim 1 from which claim 13 depends, further in view of Leblang et al., (US 2019/0073998 A1, herein “Leblang”).
Regarding claim 13, Mok does not explicitly teach the limitations of claim 13. Leblang teaches wherein, based on the second wakeup word corresponding to a secret mode being recognized after the first wakeup word is recognized, the processor is further configured to delete a conversation record of a user (Leblang fig. 6, paras. [0042] and [0046], in a voice activated system operated by wakeword, a voice input is analyzed to determine that it represents a command to enter a private mode, the analysis applying to a phrase (second wakeup word) following a wake word (first wakeup word), and when determined, a private session is started and ends after a period of time, and the service provider environment automatically invokes a deletion function to forget any data generated and stored while in the private session (thus including a conversation record)).
Therefore, taking the teachings of Mok and Leblang together as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the processing of wakewords and commands as disclosed in Mok with a private mode and the forgetting of data generated during the .


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908. The examiner can normally be reached Monday-Friday, 09:30-18:30 EDT/EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MICHELLE M. KOETH
Primary Examiner
Art Unit 2656



/MICHELLE M KOETH/Primary Examiner, Art Unit 2656