DETAILED ACTION
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/20/2021 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-24 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 3-6, 8 -12, 14-16, 19, 21-24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S. PAP 2016/0077794 A1 in view of Wang U.S. PAP 2019/0027138 A1, further in view of Ma U.S. PAP 2017/0242478 A1.
Regarding claim 1 Kim teaches a method for passive wakeup of a user interaction device by configuring a wakeup time for the user interaction device, the method comprising: 
identifying, by the configuring device, a probability of initiating at least one voice interaction by a user with the user interaction device based on detecting the occurrence of at least one of the at least one non-voice event (virtual assistant client module 264 can utilize the various sensors, subsystems, and peripheral devices to gather additional information from the surrounding environment of user device 102 to establish a context associated with a user, the current user interaction, and/or the current user input. In some examples, virtual assistant client module 264 can provide the contextual information or a subset thereof with the user input to the virtual assistant server to help infer the user's intent, see par. [0036]; confidence level or score for the likelihood that sampled audio includes a portion of a spoken trigger can be determined at block 306 in a variety of ways. In one example, sampled audio can be compared to audio files or other digital representations of accepted speech triggers, and a score or level can be determined based on how well the sampled audio matches the speech trigger representation, see par. [0045]).
However Kim does not teach detecting, by a configuring device, an occurrence of at least one non-voice event associated with at least one device present in an Internet of Things (IoT) environment; and configuring, by the configuring device, the wakeup time to switch the user interaction device to a wakeup state in which the user interaction device is awake for the 
In the same field of endeavor Wang teaches a voice command module for a residential environment which includes a wake-up detection module which determines if an individual 102 is located in proximity to the zone where a device is located (non-speech event). The wakeup detection module wakes up the hub by determining the individual’s location relative to the hub and compares the location to the physical area to determine if the individual is in a particular zone, see par. [0022]. The intent recognition module recognize a user’s “intent to take some actions” such as speech recognition, a user’s intent is an action that the user is likely to take within a predetermined time period. For example, if a user intent to customize a wakeup utterance, the intent module recognizes that the user is going to request it before the user interacts with the device, see par. [0023].
It would have been obvious to one of ordinary skill in the art to combine the teachings of Kim with the Wang reference in order to recognize a user request before the user interacts with the device, see par. [0023].
However Kim in view of Wang does not teach in response to detecting the occurrence of the at least one non-voice event, identifying by the configuring device, a probability of at least one voice interaction with the user interaction device being initiated by a user at a future time based on the detected at least one non-voice event.
In the same field of endeavor Ma teaches a more natural way of initiating machine interaction without wasting processing power or compromising 3 seconds), processor transitions the device into interactive mode. In the interactive mode, the microphone and voice input engine are triggered and the device waits for a voice command from the user (future time interaction), see par. [0031].
It would have been obvious to one of ordinary skill in the art to combine the King in view of Wang invention with the teachings of Ma for the benefit of providing a more natural way of initiating machine interaction without wasting processing power or compromising accuracy, see par. [0004].
Regarding claim 3 Kim teaches the method of claim 1, wherein the identifying of the probability includes: 

extracting meaningful patterns of sequence of events from the history of voice command interactions of the user and the history of the sequence of events (virtual assistant client module 264 can selectively provide information stored on user device 102 in response to requests from virtual assistant server 114, see par. [0038]); 
performing a correlation of the meaningful patterns of sequence of events with the voice command interactions of the user derived from the history (various aspects can be personalized for a particular user. The various processes discussed herein can be modified according to user preferences, contacts, text, usage history, profile data, demographics, or the like. In addition, such preferences and settings can be updated over time based on user interactions, see par. [0082]); 

Regarding claim 4 Kim teaches the method of claim 1, wherein the configuring the wakeup time based on the probability includes: 
comparing the probability with the pre-defined threshold value, wherein the pre-defined threshold value is estimated based on a conversation frequency of the user with the user interaction device (the predetermined level can be lowered over time for users who interact frequently using speech triggers and can be raised over time for users who interact infrequently using speech triggers, see par. [0079]); 
and configuring the wakeup time based on determining that the probability is above a predefined threshold value, wherein the wakeup time is configured based on at least one of the at least one successive event or a user context (, the amount by which a speech trigger threshold is raised or lowered can vary according to the perceived event causing the adjustment or a variety of other factors, see par. [0080]). 
claim 5 Kim teaches the method of claim 1, further comprising sending, by the configuring device, a passive wakeup command to the user interaction device for switching to the wakeup state, wherein the wakeup command includes information about the configured wakeup time (the speech trigger confidence level threshold used at decision block 308 can be dynamically adjusted in response to perceived events. In some examples, the threshold can be lowered, which can increase the sensitivity of the trigger, thus increasing the likelihood that audio input will be recognized as a trigger. In other examples, the threshold can be raised, which can decrease the sensitivity of the trigger, thus decreasing the likelihood that audio input will be recognized as a trigger, see par. [0050]). 
Regarding claim 6 Kim teaches a configuring device comprising: an event detection unit configured to: 
Identify a probability of initiating at least one voice interaction by a user with the user interaction device based on detecting the occurrence of at least one of the at least one first event or the at least one successive event (virtual assistant client module 264 can utilize the various sensors, subsystems, and peripheral devices to gather additional information from the surrounding environment of user device 102 to establish a context associated with a user, the current user interaction, and/or the current user input. In some examples, virtual assistant client module 264 can provide the contextual information or a subset thereof with the user input to the virtual assistant server to help infer the user's intent, see par. [0036]; confidence level or score for the likelihood that sampled audio includes a portion of a spoken trigger can be determined at block 306 in a variety of ways. In one example, sampled audio can be compared to audio files or 
However Kim does not teach detecting, by a configuring device, an occurrence of at least one non-voice event associated with at least one device present in an Internet of Things (IoT) environment; and configuring, by the configuring device, the wakeup time to switch the user interaction device to a wakeup state in which the user interaction device is awake for the configured wake time, based on determining that the probability is above a pre-defined threshold value. 
In the same field of endeavor Wang teaches a voice command module for a residential environment which includes a wake-up detection module which determines if an individual 102 is located in proximity to the zone where a device is located (non-speech event). The wakeup detection module wakes up the hub by determining the individual’s location relative to the hub and compares the location to the physical area to determine if the individual is in a particular zone, see par. [0022]. The intent recognition module recognize a user’s “intent to take some actions” such as speech recognition, a user’s intent is an action that the user is likely to take within a predetermined time period. For example, if a user intent to customize a wakeup utterance, the intent module recognizes that the user is going to request it before the user interacts with the device, see par. [0023].
It would have been obvious to one of ordinary skill in the art to combine the teachings of Kim with the Wang reference in order to recognize a user request before the user interacts with the device, see par. [0023].

In the same field of endeavor Ma teaches a more natural way of initiating machine interaction without wasting processing power or compromising accuracy, see par. [0004]. It includes a method of transitioning an input engine from sleep mode to interactive mode. The method includes identifying a user eye, determining a direction of user's visual attention based on movement of the eye (non-voice event); and activating an input engine to receive input if the visual attention is in a predefined direction for a minimum visual contact period, see par. [0006]. Upon determining that there is a user in proximity, the sensor locates the user's eye(s) and determines the direction of the user's visual attention or gaze. If the user looks in the direction of the target area for a preset minimum visual contact period (e.g., 3 seconds), processor transitions the device into interactive mode. In the interactive mode, the microphone and voice input engine are triggered and the device waits for a voice command from the user (future time interaction), see par. [0031].

Regarding claim 8 Kim teaches the configuring device of claim 6, wherein the contextual probability estimation unit is further configured to: 
determine a context using at least one context parameter, wherein the at least one parameter includes at least one of a user context, a user personal language modeling data, a device context, a history of voice command interactions of the user, or a history of a sequence of events associated with the at least one device (the contextual information that accompanies the user input can include sensor information, such as lighting, ambient noise, ambient temperature, images or videos of the surrounding environment, distance to another object, and the like. The contextual information can further include information associated with the physical state of user device 102 (e.g., device orientation, device location, device temperature, power level, speed, acceleration, motion patterns, cellular signal strength, etc.) or the software state of user device 102 (e.g., running processes, installed programs, past and present network activities, background services, error logs, resources usage, etc., see par. [0037]);
extract meaningful patterns of sequence of events from the history of voice command interactions of the user and the history of the sequence of events (virtual assistant client module 
perform a correlation of the meaningful patterns of sequence of events with the voice command interactions of the user derived from the history (various aspects can be personalized for a particular user. The various processes discussed herein can be modified according to user preferences, contacts, text, usage history, profile data, demographics, or the like. In addition, such preferences and settings can be updated over time based on user interactions, see par. [0082]); 
and predict a confidence value based on the correlation, the at least one non-speech event and the at least one successive event associated with the at least one device, wherein the confidence value indicates the probability of the at least one voice interaction with the user interaction device being initiated by the user at a future time
 (the speech trigger confidence level threshold used at decision block 308 can be dynamically adjusted in response to perceived events. In some examples, the threshold can be lowered, which can increase the sensitivity of the trigger, thus increasing the likelihood that audio input will be recognized as a trigger. In other examples, the threshold can be raised, which can decrease the sensitivity of the trigger, thus decreasing the likelihood that audio input will be recognized as a trigger, see par. [0050]). 
Regarding claim 9 Kim teaches the configuring device of claim 6, wherein the wakeup time configuring unit is further configured to: 

and configure the wakeup time based on determining that the probability is above a predefined threshold value, wherein the wakeup time is configured based on at least one of the at least one successive event or a user context (, the amount by which a speech trigger threshold is raised or lowered can vary according to the perceived event causing the adjustment or a variety of other factors, see par. [0080]). 
Regarding claim 10 Kim teaches the configuring device of claim 6, wherein the wakeup time configuring unit is further configured to send a wakeup command to the user interaction device for switching to the passive wakeup state, wherein the wakeup command includes information about the configured wakeup time (the speech trigger confidence level threshold used at decision block 308 can be dynamically adjusted in response to perceived events. In some examples, the threshold can be lowered, which can increase the sensitivity of the trigger, thus increasing the likelihood that audio input will be recognized as a trigger. In other examples, the threshold can be raised, which can decrease the sensitivity of the trigger, thus decreasing the likelihood that audio input will be recognized as a trigger, see par. [0050]). 
Regarding claim 11 Kim teaches a voice assistant device, comprising: 

a processor(one or more processors 204, see par. [0028]); 
and a memory communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which, on execution (a memory interface 202 coupled to memory 250, see par. [0032]), cause the processor to: 
identify intent associated with the at least one voice input from the user (determines the user's intent, see par. [0008]); 
determine probability of issuance of a subsequent voice input from the at least one user based on at least one of the intent, historic data and one or more contextual factors (With a sufficiently high confidence level that a trigger phrase was spoken, the virtual assistant can be triggered to receive the subsequent audio input as a command, see par. [0049]);
identify an extended wake-up duration of the voice assistant device, when the probability is greater than a predefined threshold value (The threshold can be dynamically raised and/or lowered in response to a variety of perceived events, conditions, situations, and the like. For example, the threshold can be lowered to increase the likelihood of triggering when perceived events suggest that a user is likely to utter a speech trigger, see par. [0050]); 
and extend duration of the wake-up mode, for the extended wake-up duration to receive the subsequent voice input from the at least one user (The time to wait for a spoken command 
However Kim does not teach detecting, by a configuring device, an occurrence of at least one non-voice event associated with at least one device present in an Internet of Things (IoT) environment; and configuring, by the configuring device, the wakeup time to switch the user interaction device to a wakeup state in which the user interaction device is awake for the configured wake time, based on determining that the probability is above a pre-defined threshold value. 
In the same field of endeavor Wang teaches a voice command module for a residential environment which includes a wake-up detection module which determines if an individual 102 is located in proximity to the zone where a device is located (non-speech event). The wakeup detection module wakes up the hub by determining the individual’s location relative to the hub and compares the location to the physical area to determine if the individual is in a particular zone, see par. [0022]. The intent recognition module recognize a user’s “intent to take some actions” such as speech recognition, a user’s intent is an action that the user is likely to take within a predetermined time period. For example, if a user intent to customize a wakeup utterance, the intent module recognizes that the user is going to request it before the user interacts with the device, see par. [0023].
It would have been obvious to one of ordinary skill in the art to combine the teachings of Kim with the Wang reference in order to recognize a user request before the user interacts with the device, see par. [0023].
However Kim in view of Wang does not teach in response to receiving at least one voice input determine probability of issuance of a subsequent voice input from the at least one user based on the intent historic data and one or more contextual factors.
In the same field of endeavor Ma teaches a more natural way of initiating machine interaction without wasting processing power or compromising accuracy, see par. [0004]. It includes a method of transitioning an input engine from sleep mode to interactive mode. The method includes identifying a user eye, determining a direction of user's visual attention based on movement of the eye (non-voice event); and activating an input engine to receive input if the visual attention is in a predefined direction for a minimum visual contact period, see par. [0006]. The device includes a high definition camera, a microphone array, actuators, and speakers to automatically determine and learn the security status of a location (contextual factors) based on past history and trigger words. For example, the device can learn that a desired word (e.g., help, danger) or loud noises (e.g., a sound above a predefined decibel threshold) are indicators for investigation, and switches into a tracking mode, see par. [0077].
It would have been obvious to one of ordinary skill in the art to combine the King in view of Wang invention with the teachings of Ma for the benefit of providing a more natural way of initiating machine interaction without wasting processing power or compromising accuracy, see par. [0004].
Regarding claim 12 Kim teaches the voice assistant device as claimed in claim 11, wherein the predefined threshold value is determined by analysis of the historic data, through devices connected to the voice assistance device (The various processes discussed herein can be modified according to user preferences, contacts, text, usage history, see par. [0082]). 
Regarding claim 14 Kim teaches the voice assistant device as claimed in claim 11, wherein the intent associated with the at least one voice input is determined by performing Natural-Language Understanding (NLU) on the at least one voice input (devices or systems using natural language in spoken and/or text forms, see par. [0003]). 
Regarding claim 15 Kim teaches the voice assistant device as claimed in claim 11, wherein the one or more contextual factors comprises at least one of user related factors (user preferences), time related factors data (such preferences and settings can be updated over time based on user interactions) and environment related factors (information data can include demographic data, location-based data, see par. [0082]). 
Regarding claim 16 Kim teaches the voice assistant device as claimed in claim 11, wherein the extended wake-up duration is estimated to be directly proportional to the probability of issuance of the subsequent voice input The time to wait for a spoken command can vary based on the notification, such as listening for commands for a longer duration when multiple notifications are received or notifications are complex, see par. [0057]). 
Regarding claim 19 Kim teaches the voice assistant device as claimed in claim 11, further comprises the processor configured to: 

Regarding claim 21 Kim teaches the method of claim 1, wherein the wakeup time is configured based on the probability (process 300 can loop to listen for a speech trigger for a certain non-zero time period following an event (e.g., power on, screen wake, alarm sounding, notification receipt, etc.). In yet other examples, process 300 can loop to listen for a speech trigger based on a variety of other factors (e.g., while a device is connected to a power source, while a user is engaged with a device, while a screen is on, during a scheduled calendar meeting, outside of scheduled calendar meetings, etc., see par. [0047]).  
Regarding claim 22 Wang teaches the method of claim 1, wherein the user interaction device and the configuring device are integrated together (command hub 104, see figure 1).  
Regarding claim 23 Wang teaches the configuring device of claim 6, wherein the wakeup time is configured based on the probability (process 300 can loop to listen for a speech trigger for a certain non-zero time period following an event (e.g., power on, screen wake, alarm sounding, notification receipt, etc.). In yet other examples, process 300 can loop to listen for a speech trigger based on a variety of other factors (e.g., while a device is connected to a power 
Regarding claim 24 Wang teaches the configuring device of claim 6, wherein the user interaction device and the configuring device are integrated together (command hub 104, see figure 1).  


Claim 2, 7, 13, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S. PAP 2016/0077794 A1, in view of Wang U.S. PAP 2019/0027138 A1, in view of Ma U.S. PAP 2017/0242478, A1 further in view of Lee U.S. PAP 2016/0240194 A1.

Regarding claim 2 Kim in view of Wang in view of Ma does not teach the method of claim 1, further comprising switching, by the configuring device, the user interaction device to a sleep state based on determining that the probability is not above the predefined threshold value. 
In a similar field of endeavor Lee teaches a voice recognition device which transitions between power modes, see abstract. In Step 570, the audio processing module transfers the audio signal, e.g., voice signal, to the main processor. The audio processing module performs voice recognition on the audio signal transferred from the audio input module in real time and, when the voice recognition is successful, buffers the audio signal during the time that the main processor is activated. When the main processor is activated, the audio processing module transfers the audio signal to the main processor in real time. Meanwhile, if in Step 550 it is determined that the voice recognition fails, the audio processing module switches the operational mode to the sleep mode in Step 590, see par. [0095-0096]. the electronic device includes a voice 
It would have been obvious to one of ordinary skill in the art to combine the Kim in view of Wang in view of Ma invention with the teachings of Lee for the benefit of lowering power consumption, see par. [0099].
Regarding claim 7 Kim in view of Wang in view of Ma does not teach the configuring device of claim 6, wherein the wakeup time configuring unit is further configured to switch the user interaction device to a sleep state based on determining the probability is not above the predefined threshold value. 
In a similar field of endeavor Lee teaches a voice recognition device which transitions between power modes, see abstract. In Step 570, the audio processing module transfers the audio signal, e.g., voice signal, to the main processor. The audio processing module performs voice recognition on the audio signal transferred from the audio input module in real time and, when the voice recognition is successful, buffers the audio signal during the time that the main processor is activated. When the main processor is activated, the audio processing module transfers the audio signal to the main processor in real time. Meanwhile, if in Step 550 it is determined that the voice recognition fails, the audio processing module switches the operational mode to the sleep mode in Step 590, see par. [0095-0096]. the electronic device includes a voice recognition module, and the voice recognition module initially operates in the sleep mode in Step 610. The voice recognition module implements a low power chip to reduce current consumption 
It would have been obvious to one of ordinary skill in the art to combine the Kim in view of Wang in view of Ma invention with the teachings of Lee for the benefit of lowering power consumption, see par. [0099].
Regarding claim 13 Kim in view of Wang in view of Ma does not teach the voice assistant device as claimed in claim 11, further comprises the processor configured to: determine the probability to be lesser than a predefined threshold value; and configure to be operated in sleep-mode until a trigger to be operated in the wake-up mode is detected. 
In a similar field of endeavor Lee teaches a voice recognition device which transitions between power modes, see abstract. In Step 570, the audio processing module transfers the audio signal, e.g., voice signal, to the main processor. The audio processing module performs voice recognition on the audio signal transferred from the audio input module in real time and, when the voice recognition is successful, buffers the audio signal during the time that the main processor is activated. When the main processor is activated, the audio processing module transfers the audio signal to the main processor in real time. Meanwhile, if in Step 550 it is determined that the voice recognition fails, the audio processing module switches the operational mode to the sleep mode in Step 590, see par. [0095-0096]. the electronic device includes a voice recognition module, and the voice recognition module initially operates in the sleep mode in Step 610. The voice recognition module implements a low power chip to reduce current consumption and may restrictively operate, i.e., operate only for the voice recognition function, see par. [0099].

Regarding claim 20 Lee teaches the voice assistant device as claimed in claim 11, further comprises the processor configured to: configure the voice assistant device to be in sleep-mode if absence of the subsequent voice input is detected during the extended wake-up duration. (the audio processing module 130 may switch the operational mode to the sleep mode when the voice recognition fails, see par. [0037]) 
Claims 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kim U.S. PAP 2016/0077794 A1, in view of Wang U.S. PAP 2019/0027138 A1,  in view of Ma U.S. PAP 2017/0242478, further in view of Kannan U.S. PAP 2019/0132451 A1.

Regarding claim 17 Kim in view of Wang in view of Ma does not teach the voice assistant device as claimed in claim 11, wherein the processor is configured to determine the probability of issuance of the subsequent voice input by: 
extracting one or more keywords from plurality of words in the at least one voice input; 

computing domain matching scores for the identified domain with plurality of pre-stored domains associated with the user;
and determining a probability of issuance of the subsequent voice input to the voice assistant device, based on the domain matching scores. 
In the same field of endeavor Kannan teaches a method and apparatus for facilitating agent conversations with customers of an enterprise, see par. [0002].
extract one or more keywords from plurality of words in the at least one voice input (the confidence score is computed based on a degree of correlation between the customer input and a stored prediction from among a plurality of stored predictions. More specifically, a plurality of probable customer intents may be stored in the database 250. Each stored intent may be tagged with several keywords related to the intent, see par. [0062]); 
identify domain associated with the at least one voice input based on the one or more keywords ( The words in the customer input and the variations thereof may be correlated with tagged keywords to determine a degree of correlation there between, see par. [0062]); 
compute domain matching scores for the identified domain with plurality of pre-stored domains associated with the user (The degree of correlation may configure a degree of confidence in correct prediction of the intent, which in turn, signifies a degree of confidence in the VA's ability to provide an effective response to the input, see par. [0062]);

It would have bene obvious to one of ordinary skill in the art to combine the Kim in view of Wang in view of Ma invention with the teachings of Kannan for the benefit of facilitating conversations with customers of an enterprise, see par. [0002].
Regarding claim 18 Kannan teaches the voice assistant device as claimed in claim 17, wherein extraction of the one or more keywords is performed by: 
assigning weightage to each of the plurality of words of the at least one voice input (A confidence score corresponding to each intent is computed, see abstract); 
and identifying one or more keywords from the plurality of words, with weightage greater than a predefined weightage value, to be the one or more keywords (the confidence score corresponding to each intent with a predefined threshold score, see par. [0008]). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pertinent prior art available on form 892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Ortiz-Sanchez whose telephone number is (571)270-3711.  The examiner can normally be reached on Monday- Friday 9AM-6PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/MICHAEL ORTIZ-SANCHEZ/Primary Examiner, Art Unit 2656